Home / Blog / How To Scrape Amazon with Playwright and Bright Data
How To Scrape Amazon with Playwright and Bright Data

How To Scrape Amazon with Playwright and Bright Data

Daniel Kelly
Daniel Kelly
Updated: January 2nd 2025
This entry is part 2 of 2 in the series Web Scraping for Developers That Just Works

Web scraping Amazon can be challenging due to their sophisticated anti-bot measures. This guide will show you how to build a reliable Amazon scraper using Playwright, a modern automation library and Bright Data the internet’s most trusted web data platform.

Prefer learning with video material? Checkout our in-depth course: Web Scraping for Developers that Just Works. In it we do everything below but dive deeper into other topics like selector best practices and utilizing AI when writing scrapers. Best of all it’s FREE!

Prerequisites

  • Node.js installed on your system (LTS)
  • Basic understanding of JavaScript and async/await
  • A Bright Data scraping browser account (optional but recommended to avoid getting blocked)

Setup Your Environment for Web Scraping

First, install Playwright:

npm install playwright

Also if you're using Bright Data's scraping browser (recommended to avoid blocks), you'll need your authentication credentials.

The Architecture of the Amazon Scraper

Our scraper will consist of three main components:

  1. A main function that orchestrates the scraping process
  2. A product extraction function that pulls data from search results
  3. A pagination handler that navigates through multiple pages
async function main(){
    //...
    // visit the site and wait for the data to load

  // pagination handler to extract the data for all result pages
    await paginateResults(page, async ()=> {

        // product extraction function to get the data 
        await getBooks(page)
        // ...
  })
}

Setup the Automated Browser

To prevent getting blocked by Amazon you can connect via the Bright Data proxy.

const browser = await pw.chromium.connectOverCDP(SBR_CDP);

Or you can use a local browser (though this is more likely to get blocked):

const browser = await pw.chromium.launch({ headless: false });

Using Bright Data's solution provides several advantages:

  • Automatic proxy rotation
  • Pre-configured browser fingerprints
  • Better success rates

Use Playwright in the Main Function to Visit the Initial Page

async function main() {
 try {
    const page = await browser.newPage();
    const booksSearch = '"live on mars" books'
    await page.goto(`https://amazon.com/s?k=${encodeURIComponent(booksSearch)}`,{ timeout: 2 * 60 * 1000 });
    await page.waitForSelector('[data-component-type="s-search-result"]')

    // scraping will happen here ...

 } finally {
    await browser.close();
 }
}

if (require.main === module) {
  main().catch(err => {
        console.error(err.stack || err);
        process.exit(1);
    });
 }

Notice how in the code block above, we do as little work as possible to get to the data we want to visit. Instead using the scraper to visit the homepage and then interact with the search input to get to books about living on mars, we go directly to the search results page relying on the k query variable that’s highly unlikely to change.

Extract Product Data from Search Results in Web Scraper

Next, we’ll define a getBooks function and use Playwright to target the HTML elements with the data that we want to extract. Selectors are determined simply by inspecting the page with the browser dev tools.

async function getBooks(page){
    const books = await page.$$('[data-component-type="s-search-result"]')
    const results = [];
    for (let i = 0; i < (books.length); i++) {
        const titleElement = await books[i].$('h2 a span');
        const title = titleElement ? await titleElement.innerText() : '';

        const priceWholeElement = await books[i].$('span.a-price-whole');
        const priceWhole = !priceWholeElement ? '' : (await priceWholeElement.innerText()).replace('.', '');

        const priceFractionElement = await books[i].$('span.a-price-fraction')
        const priceFraction = !priceFractionElement ? '' : await (priceFractionElement).innerText();

        const book = {
            title,
            price: Number(`${priceWhole.trim()}.${priceFraction.trim()}`) || null
        }

        results.push(book);
    }
    return results;
}

This function demonstrates several important scraping techniques:

  1. Use reliable selectors that are unlikely to change and semantically describe what the element is. This selector is more stable than using classes or IDs that might change.
[data-component-type="s-search-result"]
  1. Handle missing data gracefully and by providing fallback data that’s the same data type as the desired data
const priceWhole = !priceWholeElement ? '' : (await priceWholeElement.innerText());
  1. Clean data after scraping so that it’s consistent and of a reasonable data type.
Number(`${priceWhole.trim()}.${priceFraction.trim()}`) || null

Using the getBooks function to scrape the data for the first page then is straightforward.

// all the same above ...
await page.waitForSelector('[data-component-type="s-search-result"]')
await getBooks(page)

Pagination Management When Scraping the Web

Finally, in order, to get data for all the results, we must handle pagination. This is done with a paginateResults function that looks like this:

async function paginateResults(page, processPage) {
    let currentPage = 1;
    let hasNextPage = true;

    while (hasNextPage) {
        console.log(`\nScraping page ${currentPage}...`);

        // Wait for results to load
        await page.waitForSelector(`[aria-label="Current page, page ${currentPage}"]`);

        // Execute the callback function for this page
        await processPage();

        // Check for next page button
        const nextButton = await page.$('a.s-pagination-next');
        if (!nextButton) {
            console.log('\nReached the last page.');
            hasNextPage = false;
        } else {
            await nextButton.click();
            currentPage++;
        }
    }

    return currentPage;
}

There are several key takeaways from this function.

  1. Handling paginated data with a scraper emulates a real user journey
  2. Using a processPage callback function makes the paginateResults function more flexible re-usable.
  3. Just like on page load, it’s important to wait for particular selectors to ensure the proper data has loaded before scraping

To put it to use we call alter the main function slightly:

// all the same above ...
await page.waitForSelector('[data-component-type="s-search-result"]')
// provide an array to store the data from ALL the pages in
let resultsForAllPages = []; 
await paginateResults(page, async ()=> {
    // move the call to getBooks inside the callback function
    // so it will work per page
    const resultsPerPage = await getBooks(page)

    // push all the parsed data into the resultsForAllPages array
    resultsForAllPages = [...resultsForAllPages, ...resultsPerPage]
})

Conclusion

This implementation provides a robust foundation for scraping Amazon product data. By following these practices and using tools like Playwright and Bright Data's scraping browser, you can build reliable and scalable scraping solutions while minimizing the risk of blocks and errors.

If you’d like a more step by step walkthrough of this project, plus some more in depth guidance and best practices about how to scrape the web, you can dive deeper with our course Web Scraping for Developers That Just Works.

Finally, here is the full scraper code in one go:

const pw = require('playwright');
const AUTH = "YOUR_BRIGHT_DATA_AUTH_STRING";
const SBR_CDP = `wss://${AUTH}@brd.superproxy.io:9222`;
async function main() {
    console.log('Connecting to Scraping Browser...');
    const browser = await pw.chromium.connectOverCDP(SBR_CDP);
    // const browser = await pw.chromium.launch({ headless: false })
    try {
        console.log('Connected! Navigating...');
        const page = await browser.newPage();
        const booksSearch = '"live on mars" books'
        await page.goto(`https://amazon.com/s?k=${encodeURIComponent(booksSearch)}`,{ timeout: 2 * 60 * 1000 });
        await page.waitForSelector('[data-component-type="s-search-result"]')

        let resultsForAllPages = [];
        await paginateResults(page, async ()=> {
            const resultsPerPage = await getBooks(page)
            resultsForAllPages = [...resultsForAllPages, ...resultsPerPage]
        })

        console.log(resultsForAllPages)
        await page.screenshot({ path: './page.png', fullPage: true });
    } finally {
        await browser.close();
    }
}
if (require.main === module) {
    main().catch(err => {
    console.error(err.stack || err);
    process.exit(1);
    });
}

async function getBooks(page){
    const books = await page.$$('[data-component-type="s-search-result"]')
    const results = [];
    for (let i = 0; i < (books.length); i++) {
        const titleElement = await books[i].$('h2 a span');
        const title = titleElement ? await titleElement.innerText() : '';

        const priceWholeElement = await books[i].$('span.a-price-whole');
        const priceWhole = !priceWholeElement ? '' : (await priceWholeElement.innerText()).replace('.', '');

        const priceFractionElement = await books[i].$('span.a-price-fraction')
        const priceFraction = !priceFractionElement ? '' : await (priceFractionElement).innerText();

        const book = {
            title,
            price: Number(`${priceWhole.trim()}.${priceFraction.trim()}`) || null
        }

        results.push(book);
    }
    return results;
}

async function paginateResults(page, processPage) {
    let currentPage = 1;
    let hasNextPage = true;

    while (hasNextPage) {
        console.log(`\nScraping page ${currentPage}...`);

        // Wait for results to load
        await page.waitForSelector(`[aria-label="Current page, page ${currentPage}"]`);

        // Execute the callback function for this page
        await processPage();

        // Check for next page button
        const nextButton = await page.$('a.s-pagination-next');
        if (!nextButton) {
            console.log('\nReached the last page.');
            hasNextPage = false;
        } else {
            await nextButton.click();
            currentPage++;
        }
    }

    return currentPage;
}

Related Courses

Start learning Vue.js for free

Daniel Kelly
Daniel Kelly
Daniel is the lead instructor at Vue School and enjoys helping other developers reach their full potential. He has 10+ years of developer experience using technologies including Vue.js, Nuxt.js, and Laravel.

Comments

Latest Vue School Articles

How to Prefetch a Vue.js Component

How to Prefetch a Vue.js Component

Component preloading might be the boost your Vue.js app needs. Master Vite prefetching and avoid the waterfall effect.
Daniel Kelly
Daniel Kelly
The Ultimate Guide to Vue Performance &#8211; A Comprehensive Course for Building Lightning Fast Applications

The Ultimate Guide to Vue Performance – A Comprehensive Course for Building Lightning Fast Applications

Learn essential Vue.js performance optimization techniques in this comprehensive course. Master code splitting, component optimization, efficient data fetching, and debugging tools to build lightning-fast applications.
Daniel Kelly
Daniel Kelly

Our goal is to be the number one source of Vue.js knowledge for all skill levels. We offer the knowledge of our industry leaders through awesome video courses for a ridiculously low price.

More than 200.000 users have already joined us. You are welcome too!

Follow us on Social

© All rights reserved. Made with ❤️ by BitterBrains, Inc.