Deploy a Puppeteer App on Leapcell
You can deploy a Puppeteer service on Leapcell to generate screenshots, PDFs, crawl Single Page Apps, or automate testing of your frontend code.
You’ll need a GitHub account to proceed. If you don’t have one, you can create on the GitHub website.
1. Fork the puppeteer-crawler on GitHub
Repo: puppeteer-crawler
Here’s a simple Puppeteer script to generate a screenshot of a webpage:
const express = require('express');
const app = express();
// Import Puppeteer for browser automation
const puppeteer = require('puppeteer');
const bodyParser = require('body-parser');
const base64 = require('base64-js');
// Set EJS as the template engine
app.set('view engine', 'ejs');
// Set the directory for views
app.set('views', __dirname + '/views');
// Use body-parser to parse form data
app.use(bodyParser.urlencoded({ extended: true }));
// Handle GET requests and render the initial page
app.get('/', (req, res) => {
res.render('success', {
url: 'https://news.ycombinator.com',
screenshot_base64: '',
links: [],
page_title: null,
});
});
// Handle POST requests to take a screenshot
app.post('/', async (req, res) => {
// Get the URL from the form, default to Hacker News
let url = req.body.url || 'https://news.ycombinator.com';
// Add 'https://' if the URL doesn't start with 'http'
if (!url.startsWith('http')) {
url = 'https://' + url;
}
let browser;
try {
// Launch a headless Chrome browser with specific arguments
browser = await puppeteer.launch({
headless: true, // Run the browser in headless mode
args: [
'--single-process',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-gpu',
'--no-zygote',
'--disable-dev-shm-usage',
],
// Use the stable version of Chrome
// use specific path to the Chrome executable, because the default path may not work.
// we have to download the Chrome executable and put it in the project directory.
executablePath: './google-chrome-stable',
});
// Create a new browser page
const page = await browser.newPage();
// Navigate to the specified URL and wait until the network is idle
await page.goto(url, { waitUntil: 'networkidle2', timeout: 0 });
// Take a screenshot of the page
const screenshot = await page.screenshot();
// Get the page title
const page_title = await page.title();
// Extract all <a> tags' links and text content
const links_and_texts = await page.evaluate(() => {
const anchors = document.querySelectorAll('a');
return Array.from(anchors).map((anchor) => {
const text = anchor.textContent.replace(/<[^>]*>/g, '').trim();
return {
href: anchor.href,
text: text,
};
});
});
// Convert the screenshot to a base64 string
const screenshot_base64 = base64.fromByteArray(screenshot);
// Render the success page with relevant data
res.render('success', {
url,
page_title,
screenshot_base64,
links: links_and_texts,
});
} catch (e) {
// Close the browser if an error occurs
if (browser) {
await browser.close();
}
// Render the error page with the error message
res.render('error', { error_message: e.message });
} finally {
// Ensure the browser is closed after all operations
if (browser) {
await browser.close();
}
}
});
// Set the port, use environment variable PORT or default to 8080
const port = process.env.PORT || 8080;
// Start the server
app.listen(port, () => {
console.log(`Server is running on port ${port}`);
});
Prerequisites
Before running the application, you need to prepare the Puppeteer environment. To do so, execute the following script:
sh prepare_puppeteer_env.sh
This will:
- Install Puppeteer and its dependencies (without downloading Chromium, as we will use Google Chrome).
- Install Google Chrome on your environment.
- Set up the necessary dependencies for running Puppeteer.
Project Structure
.
├── LICENSE # License file for the project
├── package.json # Contains metadata and dependencies for the Node.js project
├── prepare_puppeteer_env.sh # Script for setting up the Puppeteer environment
└── src
├── app.js # Main application entry point using Express and Puppeteer
└── views
├── error.ejs # Error page template displayed when something goes wrong
├── partials
│ └── header.ejs # Header template shared across pages
└── success.ejs # Success page template, showing the scraped links
Running the Application
Once you've prepared the environment, you can start the web service with the following command:
npm start
The service will be available on http://localhost:8080
, and you can input the URL of the page you want to scrape. It will return a list of all links on that page.
Explanation of prepare_puppeteer_env.sh
This script is responsible for setting up the environment necessary for Puppeteer to run. Here's a breakdown of what each line does:
#!/bin/sh
# Install puppeteer and its dependencies
# Skip Chromium download as we'll use Google Chrome later
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true npm install puppeteer
# Install Google Chrome
# Update the package list
apt-get update \
# Install wget and gnupg for key management
&& apt-get install -y wget gnupg \
# Add Google's signing key
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
# Add Google Chrome repository to apt sources
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
# Update the package list again after adding the new repository
&& apt-get update \
# Install Google Chrome Stable and necessary fonts and libraries
# --no-install-recommends skips installing recommended but non-essential packages
&& apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1 \
--no-install-recommends \
# Remove the apt cache to save disk space
&& rm -rf /var/lib/apt/lists/*
# Copy google-chrome-stable to the current directory
# Find the path of google-chrome-stable
chrome_path=$(which google-chrome-stable)
# Check if google-chrome-stable is found
if [ -n "$chrome_path" ]; then
# Move the Chrome executable to the current directory
mv "$chrome_path" .
echo "google-chrome-stable moved to current directory."
else
# Print an error message if Chrome is not found
echo "not found google-chrome-stable"
# Exit the script with an error code
exit 1
fi
-
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true npm install puppeteer
: This installs Puppeteer without downloading Chromium, as Google Chrome will be used instead. -
The subsequent commands update the system package list, install the necessary tools (like
wget
andgnupg
), and add Google's signing key and repository for installing Google Chrome. -
apt-get install -y google-chrome-stable
: This installs Google Chrome along with necessary fonts and libraries to ensure Puppeteer runs properly with the browser. -
The script then finds and moves the installed
google-chrome-stable
executable to the current directory for Puppeteer to use.
2. Create a Service in the Leapcell Dashboard and Connect Your Repo
Go to the Leapcell Dashboard and click the New Service button.
On the "New Service" page, select the repository you just forked.
To access your repositories, you’ll need to connect Leapcell to your GitHub account.
Follow these instructions to connect to GitHub.
Once connected, your repositories will appear in the list.
3. Fill in the Following Values During Service Creation
Since Puppeteer requires a headless Chromium browser, you need to install dependencies. It’s recommended to run the installation command separately.
Here’s the final installation command:
sh prepare_puppeteer_env.sh && npm install
For this example, we use an Express app to control Puppeteer operations. The start command is npm run start
.
Field | Value |
---|---|
Runtime | Node.js (Any version) |
Build Command | sh prepare_puppeteer_env.sh && npm install |
Start Command | npm run start |
Port | 8080 |
Enter these values in the corresponding fields.
4. Access Your App
Once deployed, you’ll see a URL like foo-bar.leapcell.dev
on the Deployment page. Visit the domain to test your application.
Continuous Deployments
Every push to the linked branch automatically triggers a build and deploy. Failed builds are safely canceled, and the current version remains live until the next successful deployment.
Learn more about Continuous Deployments.