jar

de.cit-ec.scie : pdf-extractor

Maven & Gradle

Dec 10, 2014

SCIE PDF Text Extractor · This is an optimized version of Apache PDFBox. It allows to extract the rough structure of a document (pages, blocks of text and paragraphs as well as formatting information) and was made with the intent to optimize text extraction results for scientific papers. The output can easily be transformed to plaintext (toString) or to an XML format (toXML).

Table Of Contents

Latest Version

Download de.cit-ec.scie : pdf-extractor JAR file - Latest Versions:

All Versions

Download de.cit-ec.scie : pdf-extractor JAR file - All Versions:

Version Vulnerabilities Size Updated
2.0.x
2.0

View Java Class Source Code in JAR file

  1. Download JD-GUI to open JAR file and explore Java source code file (.class .java)
  2. Click menu "File → Open File..." or just drag-and-drop the JAR file in the JD-GUI window pdf-extractor-2.0.1.jar file.
    Once you open a JAR file, all the java classes in the JAR file will be displayed.

de.citec.scie.pdf

├─ de.citec.scie.pdf.DocumentBlockCleaner.class - [JAR]

├─ de.citec.scie.pdf.Histogramm.class - [JAR]

├─ de.citec.scie.pdf.PDFStructuredTextExtractor.class - [JAR]

├─ de.citec.scie.pdf.ParagraphEstimator.class - [JAR]

├─ de.citec.scie.pdf.PreTextBlock.class - [JAR]

├─ de.citec.scie.pdf.PreTextLine.class - [JAR]

├─ de.citec.scie.pdf.StringSimilarity.class - [JAR]

├─ de.citec.scie.pdf.TextBlockRankEstimator.class - [JAR]

├─ de.citec.scie.pdf.VerticalAlignmentEstimator.class - [JAR]

├─ de.citec.scie.pdf.WhiteSpaceEstimator.class - [JAR]

de.citec.scie.pdf.structure

├─ de.citec.scie.pdf.structure.AbstractLineSegment.class - [JAR]

├─ de.citec.scie.pdf.structure.Document.class - [JAR]

├─ de.citec.scie.pdf.structure.LineSegment.class - [JAR]

├─ de.citec.scie.pdf.structure.Page.class - [JAR]

├─ de.citec.scie.pdf.structure.Paragraph.class - [JAR]

├─ de.citec.scie.pdf.structure.Text.class - [JAR]

├─ de.citec.scie.pdf.structure.TextBlock.class - [JAR]

Advertisement