pdfbox - retrieve text and document information
FileInputStream stream = new FileInputStream(new File(filepath));
PDDocument document = PDDocument.load(stream);
if (document.isEncrypted()) {
try {
document.decrypt("");
} catch(Exception e) {
System.out.println(e.getMessage());
}
}
// get document information (Title, Subject, Keywords, etc.)
PDDocumentInformation info = document.getDocumentInformation();
// get string containing all text without formatting etc.
PDFTextStripper stripper = new PDFTextStripper();
String textContent = stripper.getText(document);
Reference: http://www.pdfbox.org/javadoc/index.html







