html2text -------------------- ** html2text ** *Introduction* The html2text PHP script renders HTML as text. Originally designed to construct text based email content from HTML pages, but useful for wherever text representations of HTML are required. Especially good when used with makeMIME. This is similar to how text browsers like Lynx render Web pages, but this script is designed for more restrictive environments, where documents cannot respond to forms or hyperlinks, and features such as text underlines, bold fonts and colours are not available. Version 2 is a complete rewrite using a totally different approach. The output is compatible with version 1, but with significantly improved functionality. Instead of relying on rudimentary pattern matching, it now uses a proper HTML parser to load and interpret the HTML, then a text rendering engine to determine the layout. The main function may still be called in exactly the same way, meaning that in most cases, the new version can simply be dropped directly in as a replacement for the old one. Though significantly more complex and a much larger code size, it outperforms version 1 in all respects. *Benefits of version 2+* · Twice as fast as version 1, as it does not have to run regular expressions against long strings. · Easily configurable and extendable with support for new elements. · Stylable with content before and after the element, element margins and preformatting. · Proper margin collapsing model avoids unwanted gaps between elements, and allows preformatted elements to have multiple blank lines. · Proper collapsing of whitespace avoids odd indents and gaps inside elements. · Better support for invalid HTML (based on PHP's DOMDocument::loadHTML), which also copes with HTML fragments. · No longer suffers from various replacement loops and lockups. · Support for base href. · Support for numbered lists. · Basic support for indenting nested lists. · Basic HTML 5 support. · Default styles for more HTML 4 elements, including several important phrasing elements. · Better default styling for headings. · Optional basic XHTML parsing mode (using an XML parser). · Optional loading directly from file instead of a string. · Optional processing of pre-prepared DOMDocument objects. · Better support for unicode characters. *Limitations of version 2+* · Support depends on PHP 5's DOMDocument. It is normally installed by default, but some installations may need this feature to be specifically enabled (such as the Fedora/CentOS distribution needing the "php-xml" package installed; yum install php-xml). · PHP 4 does not support HTML parsing, and even with XML parsing, its DOM support uses non-standard method calls and property names. As a result, it cannot be made to use version 2 of this script. PHP 4 installations will need to use the now-unsupported version 1. · Support for encodings is at the mercy of PHP and libxml. They have many limitations, discussed below. · Support for HTML parsing is at the mercy of libxml. It cannot currently cope with certain constructs, such as "hasAttribute('author') ) { $author = html2text_cleanspace($element->getAttribute('author')); } if( $element->hasAttribute('url') ) { $url = html2text_resolve($element->getAttribute('url'),$element); } return '['.$author.($url?(' '.$url):'').(($author||$url)?"\r\n":''); }