Archive for January, 2012

Little useful frameworks - JSoup

Tuesday, January 10th, 2012

In several cases, I needed to parse html pages and extract data from specific tags.

For instance, I had to build a wiki migration, or to transform and import massively pages to a CMS.

JSoup, a Java framework, makes easier these operations.

Based on html5 elements, JSoup parses an Url, a String or a file with CSS selectors, or DOM transversal and gives facilities to manipulate the result found: you can easily replace some content, wrap with HTML tags.