Home / Blog / Web Scraping for Developers That Just Works
Web Scraping for Developers That Just Works

Web Scraping for Developers That Just Works

Daniel Kelly
Daniel Kelly
Updated: December 31st 2024
This entry is part 1 of 2 in the series Web Scraping for Developers That Just Works

We're excited to announce the launch of our latest course, "Web Scraping that Just Works". This course is designed for JavaScript developers to bridge the gap between basic web scraping concepts and real-world implementation. Oh, and did we forget to mention, it’s 100% FREE courtesy of our partner Bright Data.

Why This Course?

In today's data-driven world, the ability to effectively collect and process web data has become an invaluable skill. Whether you're training AI models, conducting market research, or building data-rich applications, web scraping is often the key to accessing the information you need.

What Makes This Course Different

Unlike traditional web scraping courses that focus solely on basic concepts, this curriculum combines theory with practical implementation. You'll work with industry-standard tools like Playwright while leveraging Bright Data's powerful infrastructure to scrape the online retail giant Amazon for real world e-commerce data.

What You'll Learn

How to Build Production Ready Scrapers That Don’t Get Blocked

Learn how to work with Bright Data to write scrapers that don’t get blocked. With zero extra effort, build scrapers that leverage Bright Data’s multiple proxy networks (including residential IPs) and manage the full unblocking of every page including custom headers, fingerprinting, CAPTCHA solving, and more.

With your way cleared from typical blocking measures you can focus on writing robust scraping logic with Playwright.

How to Use Playwright to Target and Select Web Scraping Data

Playwright is a JavaScript library that’s traditionally used for End-to-End testing. The same features that make it great for testing, also make it great for web scraping! During the course, learn the basics of:

  • programmatically navigating to a page
  • waiting for elements to exist
  • selecting html elements
  • reading the text within HTML elements
  • Triggering events like typing in an input and clicking buttons to simulate user interaction

If you can write JavaScript, you can scrape the web! To prove it to you, take a look at this simple example we use during the course to scrape e-commerce data from Amazon on “books about mars”

await page.goto('https://amazon.com',{ timeout: 2 * 60 * 1000 });
await page.fill('#twotabsearchtextbox', "books about mars")
await page.click("#nav-search-submit-button")
await page.waitForSelector('[data-component-type="s-search-result"]')

const books = await page.$$('[data-component-type="s-search-result"]')

for (let i = 0; i < (books.length); i++) {
        const titleElement = await books[i].$('h2 a span');
        if (titleElement) {
                const title = await titleElement.innerText();
                console.log(`${i + 1}. ${title}`);
        }
}

Web Scraping Best Practices

Writing scrapers with Playwright isn’t that difficult but knowing best practices to ensure your scrapers are hardened to the test of time is a different story. Learn best practice techniques for writing your selectors, visiting the right pages, handling pagination, and more.

Could you find a way to improve the code snippet in the last section of this article? If not, then this section is definitely for you!

HINT: Amazon stores search queries in the URL!

Data Extraction, Cleaning, and Storage

Picking the data you need is only part of the process. You’ll also need to structure that data in a parsable format and clean it up so that it’s free from invalid data. During the course you’ll see some practical examples of exactly this including the cleaning and formatting of product price data from amazon.

price: Number(`${priceWhole.trim()}.${priceFraction.trim()}`) || null

Finally, once the data is organized you’ll need someplace to persist it indefinitely. This could include an SQL database like SQLite or Postgres or a NoSQL database like MongoDB. Bright data has several solution available to help you with this step.

How to Utilize AI for Web Scraping

Web scraping is foundational in training AI models. Where do you think they get most of that training data from?

Once you’ve completed the course, you’ll be able to write scrapers to collect data for training your own AI models! Not only that though, I’ll show you how to utilize existing AI tools to help you write your scrapers more quickly and efficiently.

Screenshot of Claude AI chat used to help write an web scraper of Amazon.com

How to Take Advantage of Existing Scraped Data Sets

Scraping can start off simple but the more data you want the scrape the more complicated it can get. Bright Data has a product called the Web Scraper API that’s already done a lot of the heavy lifting for you on 100+ popular sites. Learn how to search their ready-to-rock REST APIs for the parsed, cleaned, and organized data you’re looking for without having to write your own scraper from scratch.

Screenshot of the Web Scraper AI listings page from Bright Data

Get Started Scraping the Web!

Whether you're a developer looking to expand your toolkit or a data professional seeking efficient data collection methods, this course will equip you with the skills needed to succeed in web scraping.

Related Courses

Start learning Vue.js for free

Daniel Kelly
Daniel Kelly
Daniel is the lead instructor at Vue School and enjoys helping other developers reach their full potential. He has 10+ years of developer experience using technologies including Vue.js, Nuxt.js, and Laravel.

Comments

Latest Vue School Articles

How to Prefetch a Vue.js Component

How to Prefetch a Vue.js Component

Component preloading might be the boost your Vue.js app needs. Master Vite prefetching and avoid the waterfall effect.
Daniel Kelly
Daniel Kelly
The Ultimate Guide to Vue Performance &#8211; A Comprehensive Course for Building Lightning Fast Applications

The Ultimate Guide to Vue Performance – A Comprehensive Course for Building Lightning Fast Applications

Learn essential Vue.js performance optimization techniques in this comprehensive course. Master code splitting, component optimization, efficient data fetching, and debugging tools to build lightning-fast applications.
Daniel Kelly
Daniel Kelly

Our goal is to be the number one source of Vue.js knowledge for all skill levels. We offer the knowledge of our industry leaders through awesome video courses for a ridiculously low price.

More than 200.000 users have already joined us. You are welcome too!

Follow us on Social

© All rights reserved. Made with ❤️ by BitterBrains, Inc.