jar

de.l3s.boilerpipe : boilerpipe

Maven & Gradle

Nov 03, 2010
33 usages

Boilerpipe -- Boilerplate Removal and Fulltext Extraction from HTML pages · The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page. The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings. Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate. Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0. The algorithms used by the library are based on (and extending) some concepts of the paper "Boilerplate Detection using Shallow Text Features" by Christian Kohlschütter et al., presented at WSDM 2010 -- The Third ACM International Conference on Web Search and Data Mining New York City, NY USA.

Table Of Contents

Latest Version

Download de.l3s.boilerpipe : boilerpipe JAR file - Latest Versions:

All Versions

Download de.l3s.boilerpipe : boilerpipe JAR file - All Versions:

Version Vulnerabilities Size Updated
1.1.x
1.0.x

View Java Class Source Code in JAR file

  1. Download JD-GUI to open JAR file and explore Java source code file (.class .java)
  2. Click menu "File → Open File..." or just drag-and-drop the JAR file in the JD-GUI window boilerpipe-1.1.0.jar file.
    Once you open a JAR file, all the java classes in the JAR file will be displayed.

de.l3s.boilerpipe.filters.simple

├─ de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter.class - [JAR]

├─ de.l3s.boilerpipe.filters.simple.InvertedFilter.class - [JAR]

├─ de.l3s.boilerpipe.filters.simple.LabelToBoilerplateFilter.class - [JAR]

├─ de.l3s.boilerpipe.filters.simple.LabelToContentFilter.class - [JAR]

├─ de.l3s.boilerpipe.filters.simple.MarkEverythingContentFilter.class - [JAR]

├─ de.l3s.boilerpipe.filters.simple.MinClauseWordsFilter.class - [JAR]

├─ de.l3s.boilerpipe.filters.simple.MinWordsFilter.class - [JAR]

├─ de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter.class - [JAR]

de.l3s.boilerpipe.sax

├─ de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.class - [JAR]

├─ de.l3s.boilerpipe.sax.BoilerpipeHTMLParser.class - [JAR]

├─ de.l3s.boilerpipe.sax.BoilerpipeSAXInput.class - [JAR]

├─ de.l3s.boilerpipe.sax.CommonTagActions.class - [JAR]

├─ de.l3s.boilerpipe.sax.DefaultTagActionMap.class - [JAR]

├─ de.l3s.boilerpipe.sax.HTMLDocument.class - [JAR]

├─ de.l3s.boilerpipe.sax.HTMLFetcher.class - [JAR]

├─ de.l3s.boilerpipe.sax.HTMLHighlighter.class - [JAR]

├─ de.l3s.boilerpipe.sax.InputSourceable.class - [JAR]

├─ de.l3s.boilerpipe.sax.TagAction.class - [JAR]

├─ de.l3s.boilerpipe.sax.TagActionMap.class - [JAR]

de.l3s.boilerpipe.util

├─ de.l3s.boilerpipe.util.UnicodeTokenizer.class - [JAR]

de.l3s.boilerpipe.extractors

├─ de.l3s.boilerpipe.extractors.ArticleExtractor.class - [JAR]

├─ de.l3s.boilerpipe.extractors.ArticleSentencesExtractor.class - [JAR]

├─ de.l3s.boilerpipe.extractors.CanolaExtractor.class - [JAR]

├─ de.l3s.boilerpipe.extractors.CommonExtractors.class - [JAR]

├─ de.l3s.boilerpipe.extractors.DefaultExtractor.class - [JAR]

├─ de.l3s.boilerpipe.extractors.ExtractorBase.class - [JAR]

├─ de.l3s.boilerpipe.extractors.KeepEverythingExtractor.class - [JAR]

├─ de.l3s.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor.class - [JAR]

├─ de.l3s.boilerpipe.extractors.LargestContentExtractor.class - [JAR]

├─ de.l3s.boilerpipe.extractors.NumWordsRulesExtractor.class - [JAR]

de.l3s.boilerpipe.labels

├─ de.l3s.boilerpipe.labels.ConditionalLabelAction.class - [JAR]

├─ de.l3s.boilerpipe.labels.DefaultLabels.class - [JAR]

├─ de.l3s.boilerpipe.labels.LabelAction.class - [JAR]

de.l3s.boilerpipe.document

├─ de.l3s.boilerpipe.document.TextBlock.class - [JAR]

├─ de.l3s.boilerpipe.document.TextDocument.class - [JAR]

├─ de.l3s.boilerpipe.document.TextDocumentStatistics.class - [JAR]

de.l3s.boilerpipe

├─ de.l3s.boilerpipe.BoilerpipeExtractor.class - [JAR]

├─ de.l3s.boilerpipe.BoilerpipeFilter.class - [JAR]

├─ de.l3s.boilerpipe.BoilerpipeInput.class - [JAR]

├─ de.l3s.boilerpipe.BoilerpipeProcessingException.class - [JAR]

de.l3s.boilerpipe.estimators

├─ de.l3s.boilerpipe.estimators.SimpleEstimator.class - [JAR]

de.l3s.boilerpipe.conditions

├─ de.l3s.boilerpipe.conditions.TextBlockCondition.class - [JAR]

de.l3s.boilerpipe.filters.heuristics

├─ de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion.class - [JAR]

├─ de.l3s.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier.class - [JAR]

├─ de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter.class - [JAR]

├─ de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter.class - [JAR]

├─ de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor.class - [JAR]

de.l3s.boilerpipe.filters.english

├─ de.l3s.boilerpipe.filters.english.DensityRulesClassifier.class - [JAR]

├─ de.l3s.boilerpipe.filters.english.HeuristicFilterBase.class - [JAR]

├─ de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter.class - [JAR]

├─ de.l3s.boilerpipe.filters.english.KeepLargestFulltextBlockFilter.class - [JAR]

├─ de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter.class - [JAR]

├─ de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier.class - [JAR]

├─ de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder.class - [JAR]

org.cyberneko.html

├─ org.cyberneko.html.HTMLElements.class - [JAR]

├─ org.cyberneko.html.HTMLTagBalancer.class - [JAR]

Advertisement

Dependencies from Group

Nov 03, 2010
33 usages

Discover Dependencies

Jan 18, 2011
713 usages
Jul 28, 2023
53 usages
1.1k stars
Aug 23, 2023
132 usages
Jun 19, 2023
139 usages
Jun 08, 2023
79 usages
629 stars
Aug 24, 2023
1.9k usages
74.4k stars
May 15, 2023
1 usages
7.3k stars
May 19, 2023
127 usages
515 stars
May 19, 2023
21.5k stars
Apr 08, 2021
52 usages
161 stars