web scrapping node js using puppeteer

javascript

Puppeteer is a Node.js library that provides a high-level API for controlling headless Chrome or Chromium browser to perform automated tasks such as web scraping, testing, and more. Here’s an example of how to perform web scraping in Node.js using Puppeteer:

const puppeteer = require('puppeteer');

async function scrape() {
  // Launch a headless browser
  const browser = await puppeteer.launch();
  
  // Create a new page in the browser
  const page = await browser.newPage();
  
  // Navigate to the URL to be scraped
  await page.goto('https://www.example.com');
  
  // Extract data from the page
  const title = await page.title();
  const body = await page.$eval('body', el => el.textContent);

  // Print the extracted data
  console.log('Title:', title);
  console.log('Body:', body);

  // Close the browser
  await browser.close();
}

scrape();

In this example, we first import the puppeteer library. We then define an async function called scrape that performs the web scraping.

Inside the scrape function, we launch a new headless browser using puppeteer.launch(), create a new page in the browser using browser.newPage(), and navigate to the URL to be scraped using page.goto().

We then extract the data we’re interested in from the page using page.title() and page.$eval(). page.title() returns the page’s title, and page.$eval() evaluates a function on the page and returns the result. In this case, we’re selecting the body element using a CSS selector and returning its text content.

Finally, we print the extracted data to the console and close the browser using browser.close(). Note that all the Puppeteer methods that return a Promise are async and need to be awaited to resolve their results.

This is a simple example, but Puppeteer offers many more capabilities for web scraping, such as clicking on elements, filling out forms, taking screenshots, and more.