pdfbox - retrieve text and document information

FileInputStream stream = new FileInputStream(new File(filepath));
PDDocument document = PDDocument.load(stream);
if (document.isEncrypted()) {
    try {
        document.decrypt("");
    } catch(Exception e) {
        System.out.println(e.getMessage());
    }
}

// get document information (Title, Subject, Keywords, etc.)
PDDocumentInformation info = document.getDocumentInformation();

// get string containing all text without formatting etc.
PDFTextStripper stripper = new PDFTextStripper();
String textContent = stripper.getText(document);

Reference: http://www.pdfbox.org/javadoc/index.html

Permalink
Posted in:

Comments

No new comments allowed (anymore) on this post.

LiveSearch

Blogroll

Relayed

Archive

Buttons

  • RSS 2.0 Feed
  • Latest comments
  • XHTML 1.0 compliant
  • Powered by Flux CMS
  • Powered by Popoon

Login


BXCMSNG Errors:
Notice[8] Undefined index: 0 in [BX_PROJECT_DIR]/inc/bx/plugins/blog/categories.php at line 59.