How to Scrape Data from a Website: JavaScript vs Low-Code

Web scraping with JavaScript and n8n low-code approach

AdvancedIntermediate 16 min

How to Scrape Data from a Website: JavaScript vs Low-Code

Web scraping with JavaScript and n8n low-code approach

By Mihai Farcas
2024
Web ScrapingJavaScriptPuppeteerAutomationData Extraction

Overview

Web scraping is the automated process of extracting structured data from websites. This tutorial compares two approaches: custom JavaScript with Puppeteer for maximum control, and n8n's low-code workflows for rapid development. Learn when to use each method and how to implement both.

What is Web Scraping?

Web scraping automates data collection from websites, converting unstructured web content into structured data you can analyze, store, or integrate with other systems.

  • Monitor price changes on e-commerce sites
  • Collect news articles and blog posts
  • Gather product reviews and ratings
  • Track job postings and market trends
  • Extract contact information from directories
  • Aggregate data from multiple sources

Legal and Ethical Considerations

  • Always check website's terms of service
  • Respect robots.txt file
  • Implement rate limiting to avoid overwhelming servers
  • Don't scrape personal or copyrighted data without permission
  • Use public APIs when available
  • Add user-agent headers to identify your bot

Method 1: JavaScript with Puppeteer

Full control approach using Node.js and Puppeteer for browser automation. Best for complex scraping tasks requiring JavaScript execution.

Prerequisites for JavaScript Method

  • Node.js installed (v14 or higher)
  • Basic JavaScript knowledge
  • Understanding of HTML and CSS selectors
  • Text editor or IDE

Step 1: Project Setup

Initialize your Node.js project and install Puppeteer.

bash
# Create project directory
mkdir amazon-price-tracker
cd amazon-price-tracker

# Initialize npm project
npm init -y

# Install Puppeteer
npm install puppeteer

Step 2: Create Scraping Script

Build the core scraping function with Puppeteer.

javascript
const puppeteer = require('puppeteer');

const checkAmazonPrice = async () => {
  // Launch browser
  const browser = await puppeteer.launch({
    headless: true,
    args: ['--no-sandbox']
  });
  
  const page = await browser.newPage();
  
  // Set user agent to avoid detection
  await page.setUserAgent(
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
  );
  
  // Navigate to product page
  const productURL = 'https://www.amazon.com/dp/PRODUCT_ID';
  await page.goto(productURL, { waitUntil: 'networkidle2' });
  
  // Take screenshot for debugging
  await page.screenshot({ path: 'screenshot.png' });
  
  // Extract price
  const element = await page.waitForSelector('.a-price-whole');
  const priceText = await element.evaluate(el => el.innerText);
  const price = parseFloat(priceText.replace(/[^0-9.]/g, ''));
  
  console.log(`Current price: $${price}`);
  
  await browser.close();
  return price;
};

// Run the scraper
checkAmazonPrice().catch(console.error);

Step 3: Add Scheduling

Schedule the scraper to run automatically.

javascript
// Check immediately
checkAmazonPrice();

// Check every hour
setInterval(checkAmazonPrice, 1000 * 60 * 60);

// Or use node-cron for more control
const cron = require('node-cron');

// Run daily at 9 AM
cron.schedule('0 9 * * *', () => {
  console.log('Running scheduled price check...');
  checkAmazonPrice();
});

JavaScript Method: Pros and Cons

  • ✅ Full control over scraping logic
  • ✅ Can execute JavaScript on pages
  • ✅ Handle complex interactions (clicks, forms, etc.)
  • ✅ Access to full Puppeteer API
  • ❌ Requires programming knowledge
  • ❌ More code to write and maintain
  • ❌ Need to manage infrastructure

Method 2: n8n Low-Code Scraping

Visual workflow approach using n8n nodes. Best for standard scraping tasks and rapid development.

Prerequisites for n8n Method

  • n8n instance (cloud or self-hosted)
  • Basic understanding of HTML structure
  • Target website URL
  • Email account (optional, for notifications)

n8n Workflow: Book Price Scraper

Build a complete scraping workflow to extract book data from a website.

Step 1: HTTP Request Node

Fetch the HTML content from the target website.

json
{
  "method": "GET",
  "url": "https://books.toscrape.com/",
  "responseFormat": "string",
  "options": {
    "timeout": 10000
  }
}

Step 2: HTML Extract Node

Extract specific data using CSS selectors.

json
{
  "extractionValues": {
    "title": {
      "cssSelector": "article.product_pod h3 a",
      "attribute": "title",
      "returnArray": true
    },
    "price": {
      "cssSelector": "article.product_pod .price_color",
      "attribute": "text",
      "returnArray": true
    },
    "rating": {
      "cssSelector": "article.product_pod .star-rating",
      "attribute": "class",
      "returnArray": true
    }
  }
}

Step 3: Process and Transform

Clean and structure the extracted data.

javascript
// Process extracted data
const books = [];
const titles = $json.title;
const prices = $json.price;
const ratings = $json.rating;

for (let i = 0; i < titles.length; i++) {
  books.push({
    title: titles[i],
    price: parseFloat(prices[i].replace('£', '')),
    rating: ratings[i].split(' ')[1], // Extract rating from class
    scrapedAt: new Date().toISOString()
  });
}

return books.map(book => ({ json: book }));

Step 4: Save Results

Export data to various formats or systems.

  • Convert to CSV and email as attachment
  • Save to Google Sheets for analysis
  • Store in Airtable or database
  • Send to Slack for team notifications
  • Trigger alerts for price changes

Advanced n8n Scraping

Combine scraping with AI for intelligent data processing.

  • Use AI to summarize scraped articles
  • Classify products into categories with ChatGPT
  • Extract entities (names, dates, prices) with NLP
  • Generate insights from collected data
  • Translate scraped content automatically

Best Practices

  • Always include delays between requests (1-3 seconds)
  • Rotate user agents to appear more natural
  • Handle errors gracefully with try-catch
  • Cache results to avoid duplicate scraping
  • Monitor website structure changes
  • Log scraping activities for debugging
  • Use proxy servers for large-scale scraping

Common Challenges and Solutions

  • Dynamic content → Use Puppeteer or browser automation
  • CAPTCHAs → Use CAPTCHA solving services or reduce request rate
  • IP blocking → Implement proxy rotation
  • Rate limiting → Add delays and respect limits
  • Changing HTML structure → Use robust selectors, monitor for changes

Conclusion

Choose JavaScript with Puppeteer when you need maximum control, must handle complex interactions, or scrape JavaScript-heavy sites. Choose n8n for rapid development, easier maintenance, and when you want to integrate scraping with other workflows. Many teams use both: Puppeteer for complex scraping and n8n for orchestration and data processing.

Next Steps

  • Build a price monitoring system with alerts
  • Create a news aggregator from multiple sources
  • Scrape job postings and send to applicant tracking system
  • Monitor competitor websites for changes
  • Combine scraping with sentiment analysis
  • Build a product review aggregation dashboard
  • Implement distributed scraping with queues