Search
Close this search box.
Search
Close this search box.

Parsing HTML with the Simple HTML DOM Library

The first thing you’ll need to do is download a copy of the simpleHTMLdom library, freely available from sourceforge.

There are several files in the download, but the only one you need is the simple_html_dom.php file; the rest are examples and documentation. Include it in your php code so that it's classes become available:

include("./inc/simple_html_dom.php");

Loading HTML

You can create your initial object by loading HTML from a file. Loading a file can be done either via URL, or via your local file system.

$request_url = './html_files_to_be_edited/news.html';
$html = file_get_html($request_url);

Accessing Information

Once you have your DOM object, you can start to work with it by using find() and creating collections. A collection is a group of objects found via a selector – the syntax is quite similar to jQuery.

$element = $html->find('#news',0)->innertext = 'My new text!';   

Using the find() method always returns a collection (array) of tags unless you specify that you only want the nth child, as a second parameter.

Saving HTML

One simple function:

$html->save($request_url);

Preserving white space to your saved HTML file

If you try the code above, the resulted HTML will have no linebreaks, spaces, etc. In order to avoid this, just make sure you pass the $stripRN var as false (the default is true):

$html = file_get_html($request_url, NULL, NULL, NULL, NULL, NULL, NULL, NULL, false);