Web scraping with JavaScript and n8n low-code approach
Web scraping with JavaScript and n8n low-code approach
Web scraping is the automated process of extracting structured data from websites. This tutorial compares two approaches: custom JavaScript with Puppeteer for maximum control, and n8n's low-code workflows for rapid development. Learn when to use each method and how to implement both.
Web scraping automates data collection from websites, converting unstructured web content into structured data you can analyze, store, or integrate with other systems.
Full control approach using Node.js and Puppeteer for browser automation. Best for complex scraping tasks requiring JavaScript execution.
Initialize your Node.js project and install Puppeteer.
# Create project directory
mkdir amazon-price-tracker
cd amazon-price-tracker
# Initialize npm project
npm init -y
# Install Puppeteer
npm install puppeteerBuild the core scraping function with Puppeteer.
const puppeteer = require('puppeteer');
const checkAmazonPrice = async () => {
// Launch browser
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox']
});
const page = await browser.newPage();
// Set user agent to avoid detection
await page.setUserAgent(
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
);
// Navigate to product page
const productURL = 'https://www.amazon.com/dp/PRODUCT_ID';
await page.goto(productURL, { waitUntil: 'networkidle2' });
// Take screenshot for debugging
await page.screenshot({ path: 'screenshot.png' });
// Extract price
const element = await page.waitForSelector('.a-price-whole');
const priceText = await element.evaluate(el => el.innerText);
const price = parseFloat(priceText.replace(/[^0-9.]/g, ''));
console.log(`Current price: $${price}`);
await browser.close();
return price;
};
// Run the scraper
checkAmazonPrice().catch(console.error);Schedule the scraper to run automatically.
// Check immediately
checkAmazonPrice();
// Check every hour
setInterval(checkAmazonPrice, 1000 * 60 * 60);
// Or use node-cron for more control
const cron = require('node-cron');
// Run daily at 9 AM
cron.schedule('0 9 * * *', () => {
console.log('Running scheduled price check...');
checkAmazonPrice();
});Visual workflow approach using n8n nodes. Best for standard scraping tasks and rapid development.
Build a complete scraping workflow to extract book data from a website.
Fetch the HTML content from the target website.
{
"method": "GET",
"url": "https://books.toscrape.com/",
"responseFormat": "string",
"options": {
"timeout": 10000
}
}Extract specific data using CSS selectors.
{
"extractionValues": {
"title": {
"cssSelector": "article.product_pod h3 a",
"attribute": "title",
"returnArray": true
},
"price": {
"cssSelector": "article.product_pod .price_color",
"attribute": "text",
"returnArray": true
},
"rating": {
"cssSelector": "article.product_pod .star-rating",
"attribute": "class",
"returnArray": true
}
}
}Clean and structure the extracted data.
// Process extracted data
const books = [];
const titles = $json.title;
const prices = $json.price;
const ratings = $json.rating;
for (let i = 0; i < titles.length; i++) {
books.push({
title: titles[i],
price: parseFloat(prices[i].replace('£', '')),
rating: ratings[i].split(' ')[1], // Extract rating from class
scrapedAt: new Date().toISOString()
});
}
return books.map(book => ({ json: book }));Export data to various formats or systems.
Combine scraping with AI for intelligent data processing.
Choose JavaScript with Puppeteer when you need maximum control, must handle complex interactions, or scrape JavaScript-heavy sites. Choose n8n for rapid development, easier maintenance, and when you want to integrate scraping with other workflows. Many teams use both: Puppeteer for complex scraping and n8n for orchestration and data processing.