Deploy a Puppeteer App on Leapcell
You can deploy a Puppeteer service on Leapcell to generate screenshots, PDFs, crawl Single Page Apps, or automate testing of your frontend code.
You’ll need a GitHub account to proceed. If you don’t have one, you can create on the GitHub website.
1. Fork the puppeteer-crawler on GitHub
Repo: puppeteer-crawler
Here’s a simple Puppeteer script to generate a screenshot of a webpage:
const express = require('express');
const app = express();
// Import Puppeteer for browser automation
const puppeteer = require('puppeteer');
const bodyParser = require('body-parser');
const base64 = require('base64-js');
// Set EJS as the template engine
app.set('view engine', 'ejs');
// Set the directory for views
app.set('views', __dirname + '/views');
// Use body-parser to parse form data
app.use(bodyParser.urlencoded({ extended: true }));
// Handle GET requests and render the initial page
app.get('/', (req, res) => {
res.render('success', {
url: 'https://news.ycombinator.com',
screenshot_base64: '',
links: [],
page_title: null,
});
});
// Handle POST requests to take a screenshot
app.post('/', async (req, res) => {
// Get the URL from the form, default to Hacker News
let url = req.body.url || 'https://news.ycombinator.com';
// Add 'https://' if the URL doesn't start with 'http'
if (!url.startsWith('http')) {
url = 'https://' + url;
}
let browser;
try {
// Launch a headless Chrome browser with specific arguments
browser = await puppeteer.launch({
headless: true, // Run the browser in headless mode
args: [
'--single-process',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-gpu',
'--no-zygote',
'--disable-dev-shm-usage',
],
// Use the stable version of Chrome
// use specific path to the Chrome executable, because the default path may not work.
// we have to download the Chrome executable and put it in the project directory.
executablePath: './google-chrome-stable',
});
// Create a new browser page
const page = await browser.newPage();
// Navigate to the specified URL and wait until the network is idle
await page.goto(url, { waitUntil: 'networkidle2', timeout: 0 });
// Take a screenshot of the page
const screenshot = await page.screenshot();
// Get the page title
const page_title = await page.title();
// Extract all <a> tags' links and text content
const links_and_texts = await page.evaluate(() => {
const anchors = document.querySelectorAll('a');
return Array.from(anchors).map((anchor) => {
const text = anchor.textContent.replace(/<[^>]*>/g, '').trim();
return {
href: anchor.href,
text: text,
};
});
});
// Convert the screenshot to a base64 string
const screenshot_base64 = base64.fromByteArray(screenshot);
// Render the success page with relevant data
res.render('success', {
url,
page_title,
screenshot_base64,
links: links_and_texts,
});
} catch (e) {
// Close the browser if an error occurs
if (browser) {
await browser.close();
}
// Render the error page with the error message
res.render('error', { error_message: e.message });
} finally {
// Ensure the browser is closed after all operations
if (browser) {
await browser.close();
}
}
});
// Set the port, use environment variable PORT or default to 8080
const port = process.env.PORT || 8080;
// Start the server
app.listen(port, () => {
console.log(`Server is running on port ${port}`);
});
Prerequisites
Before running the application, you need to prepare the Puppeteer environment. To do so, execute the following script:
sh prepare_puppeteer_env.sh
This will:
- Install Puppeteer and its dependencies (without downloading Chromium, as we will use Google Chrome).
- Install Google Chrome on your environment.
- Set up the necessary dependencies for running Puppeteer.
Project Structure
.
├── LICENSE # License file for the project
├── package.json # Contains metadata and dependencies for the Node.js project
├── prepare_puppeteer_env.sh # Script for setting up the Puppeteer environment
└── src
├── app.js # Main application entry point using Express and Puppeteer
└── views
├── error.ejs # Error page template displayed when something goes wrong
├── partials
│ └── header.ejs # Header template shared across pages
└── success.ejs # Success page template, showing the scraped links
Running the Application
Once you've prepared the environment, you can start the web service with the following command:
npm start
The service will be available on http://localhost:8080
, and you can input the URL of the page you want to scrape. It will return a list of all links on that page.
Explanation of prepare_puppeteer_env.sh
This script is responsible for setting up the environment necessary for Puppeteer to run. Here's a breakdown of what each line does:
#!/bin/bash
# Exit immediately if a command exits with a non-zero status
set -e
# --- 1. Common Setup ---
# Install Puppeteer without downloading its bundled Chromium
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true npm install puppeteer
# Update apt list and install common fonts and libraries required by both browsers
echo "INFO: Installing common fonts and libraries..."
apt-get update
apt-get install -y \
fonts-ipafont-gothic \
fonts-wqy-zenhei \
fonts-thai-tlwg \
fonts-kacst \
fonts-freefont-ttf \
libxss1 \
--no-install-recommends
# --- 2. Install Browser Based on Architecture ---
ARCH=$(dpkg --print-architecture)
echo "INFO: Detected architecture: $ARCH"
if [ "$ARCH" = "amd64" ]; then
# For amd64 (x86_64) architecture, install Google Chrome
echo "INFO: Installing Google Chrome for amd64..."
apt-get install -y wget gnupg
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" > /etc/apt/sources.list.d/google.list
apt-get update
apt-get install -y google-chrome-stable --no-install-recommends
BROWSER_EXEC="google-chrome-stable"
elif [ "$ARCH" = "arm64" ]; then
# For arm64 architecture, install Chromium
# Google Chrome is not available for arm64, so we install the open-source version, Chromium
echo "INFO: Installing Chromium for arm64..."
apt-get install -y chromium --no-install-recommends
BROWSER_EXEC="chromium"
else
echo "ERROR: Unsupported architecture: $ARCH" >&2
exit 1
fi
# --- 3. Cleanup and Verification ---
# Clean up apt cache to reduce image size
echo "INFO: Cleaning up apt cache..."
rm -rf /var/lib/apt/lists/*
# Find the path of the installed browser executable
chrome_path=$(which "$BROWSER_EXEC")
# Verify if the browser was installed successfully and move the executable
if [ -n "$chrome_path" ]; then
echo "INFO: Browser executable found at: $chrome_path"
# --- START: MODIFICATION ---
# On arm64, rename 'chromium' to 'google-chrome-stable' for compatibility with the JS code.
# On amd64, this just moves 'google-chrome-stable' to the current directory.
mv "$chrome_path" ./google-chrome-stable
echo "INFO: Moved executable to ./google-chrome-stable"
# --- END: MODIFICATION ---
else
echo "ERROR: Browser executable '$BROWSER_EXEC' not found in PATH." >&2
exit 1
fi
echo "✅ Setup complete. The browser executable is now available at ./google-chrome-stable"
-
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true npm install puppeteer
: This installs Puppeteer without downloading Chromium, as Google Chrome will be used instead. -
The subsequent commands update the system package list, install the necessary tools (like
wget
andgnupg
), and add Google's signing key and repository for installing Google Chrome. -
apt-get install -y google-chrome-stable
: This installs Google Chrome along with necessary fonts and libraries to ensure Puppeteer runs properly with the browser. -
The script then finds and moves the installed
google-chrome-stable
executable to the current directory for Puppeteer to use.
2. Create a Service in the Leapcell Dashboard and Connect Your Repo
Go to the Leapcell Dashboard and click the New Service button.
On the "New Service" page, select the repository you just forked.
To access your repositories, you’ll need to connect Leapcell to your GitHub account.
Follow these instructions to connect to GitHub.
Once connected, your repositories will appear in the list.
3. Fill in the Following Values During Service Creation
Since Puppeteer requires a headless Chromium browser, you need to install dependencies. It’s recommended to run the installation command separately.
Here’s the final installation command:
sh prepare_puppeteer_env.sh && npm install
For this example, we use an Express app to control Puppeteer operations. The start command is npm run start
.
Field | Value |
---|---|
Runtime | Node.js (Any version) |
Build Command | sh prepare_puppeteer_env.sh && npm install |
Start Command | npm run start |
Port | 8080 |
Enter these values in the corresponding fields.
4. Access Your App
Once deployed, you’ll see a URL like foo-bar.leapcell.dev
on the Deployment page. Visit the domain to test your application.
Continuous Deployments
Every push to the linked branch automatically triggers a build and deploy. Failed builds are safely canceled, and the current version remains live until the next successful deployment.
Learn more about Continuous Deployments.