Creating PDFs with Node.js - Part 1
The ability to generate PDFs is a common business need. There are many PDF libraries for a variety of programming languages and use cases available to help with this task. Some are proprietary and some are open-source.
In one of my recent projects I needed to create a PDF generation service using Node.js. My requirements were:
- PDF generation should be quick (less than a minute)
- Use only open-source libraries
- Layout the PDF contents using HTML and CSS (as opposed to a library-specific API)
These requirements led me to Puppeteer. Puppeteer is an API for manipulating Chromium, which is an open-source version of the Chrome browser. The feature most relevant to my PDF generator project was the ability to create a PDF from a web page.
Here is the PDF creation example from the repo’s readme:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://news.ycombinator.com', {waitUntil: 'networkidle2'});
await page.pdf({path: 'hn.pdf', format: 'A4'});
await browser.close();
})();
The steps are essentially create a new browser instance, tell that instance to load a URL, then save the rendered page as a PDF. Easy!
Here’s how I used Puppeteer in my project:
renderer.js:
const puppeteer = require('puppeteer');
/**
* Turn HTML into PDF as Buffer
* @param {string} markup - HTML string to render to PDF
* @returns {Promise<Buffer>} - PDF Buffer
*/
module.exports.renderHTMLtoPDF = async (markup) => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
try {
console.log('Loading HTML into page.');
await page.setContent(markup);
console.log('Rendering PDF.');
var PDF = await page.pdf();
await browser.close();
return PDF;
}
catch(e){
await browser.close();
throw new Error('Error during PDF rendering: ' + e.message);
}
};
Instead of getting the HTML into Chromium via a URL, I wanted to provide the HTML as a string. With Puppeteer you can set a page’s HTML using page.setContent()
. After the page’s HTML is set, it gets saved to a PDF with page.pdf()
.
index.js:
const renderer = require('./renderer');
const fs = require('fs-extra');
const path = require('path');
(async () => {
var destinationFolder = 'output';
try {
var html = await fs.readFile('markup.html', 'utf8');
var PDFBuffer = await renderer.generatePDFFromHTML(html);
console.log('Got buffer of PDF data');
if (!fs.existsSync(destinationFolder)) {
fs.mkdirSync(destinationFolder);
}
await fs.writeFile(path.join(destinationFolder, 'moduletest.pdf'), PDFBuffer);
console.log('Saved PDF');
}
catch(e){
console.log(e.message);
console.log(e.stack);
}
})();
The index script reads HTML from a file (markup.html
) and passes the HTML as a string to the renderer’s generatePDFFromHTML()
method, defined above. That method returns the PDF as a Buffer
, which is then written to a file.