Index of /ports_www/p5-HTML-ExtractContent
Name Last modified Size
Parent Directory -
Makefile 11-Feb-2010 08:28 750
distinfo 28-May-2011 14:15 155
pkg-descr 06-Mar-2009 01:55 377
pkg-plist 06-Mar-2009 01:55 243
HTML::ExtractContent is a module for extracting content from HTML with
scoring heuristics.
It guesses which block of HTML looks like content according to scores
depending on the amount of punctuation marks and the lengths of non-tag
texts.
It also guesses whether content end in the block or continue to the next
block.
WWW: http://search.cpan.org/dist/HTML-ExtractContent/