Consolidation: How to Combine HTML Content in NodeJS

By: Morpheus Data

[header image]

Sometimes when using Node.js you may find you need to combine specific content within two or more HTML files. While in some cases this can be as simple as reading the contents of a file (such as a template that only has in it the HTML code you need), there are also times when you need to grab only certain pieces of content from a full HTML page and to include it in another.

In such a case, you can do so by parsing the HTML on your own or by using a Node module to simplify the parsing and allow you to grab the content you need easily.

Using fs.readFile

In Node.js, a file can be read with the file system module (fs) by using the readFile function. To do so, you include the fs module and pass the readFile function the arguments it needs. The first argument is the path to the file, the second (optional) is the encoding to use, and the third is a callback function to run when the file read is complete. The code below shows an example of using fs.readFile:

var fs = require(‘fs’);
fs.readFile(‘../assets/html/products.html’, ‘utf8’, function(content) {
console.log(content);
}

Given that there is a file at the provided location, the text of that file will be logged to the console (in this case, the file’s HTML markup and text).

While this can be used to easily combine HTML files by concatenating their content (this could work with template files and such that do not have HTML or head elements), it likely will be difficult if you need to parse out only specific pieces of the content. This is where using a module for parsing HTML can be quite helpful.

Using a Parsing Package

Through NPM (Node Package Manager), you can find a number of HTML parsers, such as cheerio or htmlParser2. These packages allow you to get to the parts of the HTML you need in order to combine pieces of content the way you would like.

For this article, cheerio will be used. It allows you to parse the HTML and gives you a DOM to use or alter using jQuery syntax. With the familiar syntax, pulling out pieces of content can be done quickly and effectively.

Get HTML with cheerio

To install cheerio, you need to install it into your project with npm:

>npm install cheerio

Once installed, you should be able to require the package in your Node files:

var cheerio = require(‘cheerio’);

To load HTML from files into cheerio, you will need to have some HTML files with some content. As a simple example, two small HTML files with needed content will be used to populate and output a new document with the combined content.

First, a file that will be named pizzapost.html:

<html>
<head>
<title>My Favorite Food</title>
</head>
<body>
<h1>My Favorite Food</h1>
<div class=’post-content’>
<p>I just wanted to tell everyone reading right now’ my favorite food is pizza! That’s right, a big supreme pizza is the best.</p>
</div>
</body>
</html>

Second, a file that will be named morefoods.html:

<html>
<head>
<title>More favorite foods</title>
</head>
<body>
<h1>More of my favorite foods!</h1>
<div class=’post-content’>
<p>Actually, there are some other foods I like quite well also: hamburgers, sandwiches, and chocolate of any kind!
</div>
</body>
</html>

Suppose you want to combine the content of these food posts into a single file that displays all of the favorite foods in one place. Unfortunately, the post content is way down in an element with the CSS class of ‘post-content’. Combining both files as-is would result in doubling up the HTML, head, body, and h1 elements in a new document. What you would like is to simply get the post content from each file and place it in a new HTML document that has the title, heading, etc. that you need.

This can be done by loading the files into cheerio. In your node script, do the following:

‘use strict’;
var fs = require(‘fs’);
var cheerio = require(‘cheerio’);
var $pizzapost = {};
var $morepost = {};
var outputHTML = ‘<html><head><title>All Favorite Foods</title></head><body><h1>All Foods</h1>’;
fs.readFile(‘../assets/html/pizzapost.html’, ‘utf8’, function(content) {
$pizzapost = cheerio.load(content);
fs.readFile(‘../assets/html/morefoods.html’, ‘utf8’, function(morecontent) {
$morepost = cheerio.load(morecontent);
outputHTML += $pizzapost(‘.post-content’).html();
outputHTML += $morepost(‘.post-content’).html();
outputHTML += ‘</body></html>’;
fs.writeFile(‘../assets/html/allfoods.html’, outputHTML, function (err) {
if (err) {
return console.log(err);
}
console.log(‘The new HTML file has been created!’);
}
}
}

As you can see, the needed packages are loaded, and the start of the HTML to output is defined. Next, this reads the first file asynchronously, loads that output DOM into a variable via cheerio, then reads the second file and does the same for it. When this is done, cheerio allows you to use the same type of selectors you would in jQuery.

In this case, you only want the content inside of elements that have the ‘.post-content’ class, so you can use that as the selector and get the HTML code using the html() function. The HTML from both content blocks is added to the output HTML, and then the closing HTML is appended to the output HTML.

The last thing to do is to write the combined HTML to its own file, which is done using fs.writeFile. When this is complete, it logs a success message to the console and you have a new HTML file with the combined content!