How to Scrape Data from a Website: JavaScript vs Low-Code

Web scraping with JavaScript and n8n low-code approach

AdvancedIntermediate 16 min

How to Scrape Data from a Website: JavaScript vs Low-Code

Web scraping with JavaScript and n8n low-code approach

By Mihai Farcas

2024

Web ScrapingJavaScriptPuppeteerAutomationData Extraction

Overview

Web scraping is the automated process of extracting structured data from websites. This tutorial compares two approaches: custom JavaScript with Puppeteer for maximum control, and n8n's low-code workflows for rapid development. Learn when to use each method and how to implement both.

What is Web Scraping?

Web scraping automates data collection from websites, converting unstructured web content into structured data you can analyze, store, or integrate with other systems.

Monitor price changes on e-commerce sites
Collect news articles and blog posts
Gather product reviews and ratings
Track job postings and market trends
Extract contact information from directories
Aggregate data from multiple sources

Legal and Ethical Considerations

Always check website's terms of service
Respect robots.txt file
Implement rate limiting to avoid overwhelming servers
Don't scrape personal or copyrighted data without permission
Use public APIs when available
Add user-agent headers to identify your bot

Method 1: JavaScript with Puppeteer

Full control approach using Node.js and Puppeteer for browser automation. Best for complex scraping tasks requiring JavaScript execution.

Prerequisites for JavaScript Method

Node.js installed (v14 or higher)
Basic JavaScript knowledge
Understanding of HTML and CSS selectors
Text editor or IDE

Step 1: Project Setup

Initialize your Node.js project and install Puppeteer.

bash

# Create project directory
mkdir amazon-price-tracker
cd amazon-price-tracker

# Initialize npm project
npm init -y

# Install Puppeteer
npm install puppeteer

Step 2: Create Scraping Script

Build the core scraping function with Puppeteer.

javascript

const puppeteer = require('puppeteer');

const checkAmazonPrice = async () => {
  // Launch browser
  const browser = await puppeteer.launch({
    headless: true,
    args: ['--no-sandbox']
  });
  
  const page = await browser.newPage();
  
  // Set user agent to avoid detection
  await page.setUserAgent(
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
  );
  
  // Navigate to product page
  const productURL = 'https://www.amazon.com/dp/PRODUCT_ID';
  await page.goto(productURL, { waitUntil: 'networkidle2' });
  
  // Take screenshot for debugging
  await page.screenshot({ path: 'screenshot.png' });
  
  // Extract price
  const element = await page.waitForSelector('.a-price-whole');
  const priceText = await element.evaluate(el => el.innerText);
  const price = parseFloat(priceText.replace(/[^0-9.]/g, ''));
  
  console.log(`Current price: $${price}`);
  
  await browser.close();
  return price;
};

// Run the scraper
checkAmazonPrice().catch(console.error);

Step 3: Add Scheduling

Schedule the scraper to run automatically.

javascript

// Check immediately
checkAmazonPrice();

// Check every hour
setInterval(checkAmazonPrice, 1000 * 60 * 60);

// Or use node-cron for more control
const cron = require('node-cron');

// Run daily at 9 AM
cron.schedule('0 9 * * *', () => {
  console.log('Running scheduled price check...');
  checkAmazonPrice();
});

JavaScript Method: Pros and Cons

✅ Full control over scraping logic
✅ Can execute JavaScript on pages
✅ Handle complex interactions (clicks, forms, etc.)
✅ Access to full Puppeteer API
❌ Requires programming knowledge
❌ More code to write and maintain
❌ Need to manage infrastructure

Method 2: n8n Low-Code Scraping

Visual workflow approach using n8n nodes. Best for standard scraping tasks and rapid development.

Prerequisites for n8n Method

n8n instance (cloud or self-hosted)
Basic understanding of HTML structure
Target website URL
Email account (optional, for notifications)

n8n Workflow: Book Price Scraper

Build a complete scraping workflow to extract book data from a website.

Step 1: HTTP Request Node

Fetch the HTML content from the target website.

json

{
  "method": "GET",
  "url": "https://books.toscrape.com/",
  "responseFormat": "string",
  "options": {
    "timeout": 10000
  }
}

Step 2: HTML Extract Node

Extract specific data using CSS selectors.

json

{
  "extractionValues": {
    "title": {
      "cssSelector": "article.product_pod h3 a",
      "attribute": "title",
      "returnArray": true
    },
    "price": {
      "cssSelector": "article.product_pod .price_color",
      "attribute": "text",
      "returnArray": true
    },
    "rating": {
      "cssSelector": "article.product_pod .star-rating",
      "attribute": "class",
      "returnArray": true
    }
  }
}

Step 3: Process and Transform

Clean and structure the extracted data.

javascript

// Process extracted data
const books = [];
const titles = $json.title;
const prices = $json.price;
const ratings = $json.rating;

for (let i = 0; i < titles.length; i++) {
  books.push({
    title: titles[i],
    price: parseFloat(prices[i].replace('£', '')),
    rating: ratings[i].split(' ')[1], // Extract rating from class
    scrapedAt: new Date().toISOString()
  });
}

return books.map(book => ({ json: book }));

Step 4: Save Results

Export data to various formats or systems.

Convert to CSV and email as attachment
Save to Google Sheets for analysis
Store in Airtable or database
Send to Slack for team notifications
Trigger alerts for price changes

Advanced n8n Scraping

Combine scraping with AI for intelligent data processing.

Use AI to summarize scraped articles
Classify products into categories with ChatGPT
Extract entities (names, dates, prices) with NLP
Generate insights from collected data
Translate scraped content automatically

Best Practices

Always include delays between requests (1-3 seconds)
Rotate user agents to appear more natural
Handle errors gracefully with try-catch
Cache results to avoid duplicate scraping
Monitor website structure changes
Log scraping activities for debugging
Use proxy servers for large-scale scraping

Common Challenges and Solutions

Dynamic content → Use Puppeteer or browser automation
CAPTCHAs → Use CAPTCHA solving services or reduce request rate
IP blocking → Implement proxy rotation
Rate limiting → Add delays and respect limits
Changing HTML structure → Use robust selectors, monitor for changes

Conclusion

Choose JavaScript with Puppeteer when you need maximum control, must handle complex interactions, or scrape JavaScript-heavy sites. Choose n8n for rapid development, easier maintenance, and when you want to integrate scraping with other workflows. Many teams use both: Puppeteer for complex scraping and n8n for orchestration and data processing.

Next Steps

Build a price monitoring system with alerts
Create a news aggregator from multiple sources
Scrape job postings and send to applicant tracking system
Monitor competitor websites for changes
Combine scraping with sentiment analysis
Build a product review aggregation dashboard
Implement distributed scraping with queues

Catégories

Populaire

Parcourir

How to Scrape Data from a Website: JavaScript vs Low-Code

How to Scrape Data from a Website: JavaScript vs Low-Code

Overview

What is Web Scraping?

Legal and Ethical Considerations

Method 1: JavaScript with Puppeteer

Prerequisites for JavaScript Method

Step 1: Project Setup

Step 2: Create Scraping Script

Step 3: Add Scheduling

JavaScript Method: Pros and Cons

Method 2: n8n Low-Code Scraping

Prerequisites for n8n Method

n8n Workflow: Book Price Scraper

Step 1: HTTP Request Node

Step 2: HTML Extract Node

Step 3: Process and Transform

Step 4: Save Results

Advanced n8n Scraping

Best Practices

Common Challenges and Solutions

Conclusion

Next Steps

Entreprise

Ressources

Catégories

Populaire

Parcourir

How to Scrape Data from a Website: JavaScript vs Low-Code

How to Scrape Data from a Website: JavaScript vs Low-Code

Overview

What is Web Scraping?

Legal and Ethical Considerations

Method 1: JavaScript with Puppeteer

Prerequisites for JavaScript Method

Step 1: Project Setup

Step 2: Create Scraping Script

Step 3: Add Scheduling

JavaScript Method: Pros and Cons

Method 2: n8n Low-Code Scraping

Prerequisites for n8n Method

n8n Workflow: Book Price Scraper

Step 1: HTTP Request Node

Step 2: HTML Extract Node

Step 3: Process and Transform

Step 4: Save Results

Advanced n8n Scraping

Best Practices

Common Challenges and Solutions

Conclusion

Next Steps

Related Tutorials

How to Push Code to GitHub: 3 Techniques