JAVA PDFBOX API
Posted By : Kiran Sharma | 30-Sep-2018
In this blog, I am going to give you a brief introduction about PDFBox API. Many times we need to read data from some pdf document and sometimes we need to write data in pdf format using our programming code. So in Java, we have API provided by Apache. For this, we need to import - import org.apache.pdfbox.*;
It is an open source PDFBox API. It helps us to write Java programs that will help to create, delete and manipulate a PDF document in the application. In addition to this, PDFBox also introduces a command line utility for executing various controls over pdf.
Features of PDFBox API :
1.Helpful in extracting Unicode text from pdf files.
2.Helpful in splitting a single pdf into many files or we can merge many pdf files.
3.Helpful in filling a pdf form or extracts data from pdf forms.
4.Helpful in validating pdf files against the PDF/A-1b standard.
5.Helpful in saving pdf as jpeg or png files.
6.Can create pdf from scratch with embedded fonts and images.
Now let us have a look at the above features using Java code.
Splitting the Pages in a PDF Document
import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.Iterator;
public class PdfDemo {
public static void main(String[] args) throws IOException {
File file = new File("/home/kiran/Desktop/HeadFirstDesignPatterns.pdf");
PDDocument doc = PDDocument.load(file);
Splitter splitter = new Splitter();
// splitting pdf document pages
List<pddocument> pages = splitter.split(doc);
Iterator<pddocument> itr = pages.listIterator();
// Saving each page as an individual document
int i = 1;
while (itr.hasNext()) {
PDDocument pd = itr.next();
pd.save("/home/kiran/Desktop/HeadFirstDesignPatterns" + i++ + ".pdf");
}
System.out.println("Multiple PDF’s created");
doc.close();
}
}
Reading text from an existing pdf document
We can read data by using getText() method available in PDFTextStripper class.
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import java.io.File;
import java.io.IOException;
public class PdfDemo {
public static void main(String[] args) throws IOException {
File file = new File("/home/kiran/Desktop/HeadFirstDesignPatterns.pdf");
PDDocument doc = PDDocument.load(file);
PDFTextStripper pdfStripper = new PDFTextStripper();
// Retrieving text from PDF document
String txt = pdfStripper.getText(doc);
System.out.println(txt);
doc.close();
}
}
Setting the pdf document property
This API provides you a class named PDDocumentInformation. This class gives a set of setter and getter methods.
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.pdmodel.PDPage;
import java.io.IOException;
import java.util.Calendar;
import java.util.GregorianCalendar;
public class PdfDemo {
public static void main(String[] args) throws IOException {
PDDocument doc = new PDDocument();
// Creating blank page
PDPage page = new PDPage();
// Adding blank page to the document
doc.addPage(page);
PDDocumentInformation pdi = doc.getDocumentInformation();
pdi.setAuthor("Kiran Sharma");
pdi.setTitle("Learn Java Programming");
pdi.setCreator("Kiran Sharma");
pdi.setSubject("Demo Document");
// Setting the created date of the document
Calendar date = new GregorianCalendar();
date.set(2018, 9, 29);
pdi.setCreationDate(date);
// Setting the modified date of the document
date.set(2018, 9, 30);
pdi.setModificationDate(date);
// Setting keywords for the document
pdi.setKeywords("java, java programming");
doc.save("/home/kiran/Desktop/kiran.pdf");
System.out.println("Properties added successfully ");
doc.close();
}
}
Cookies are important to the proper functioning of a site. To improve your experience, we use cookies to remember log-in details and provide secure log-in, collect statistics to optimize site functionality, and deliver content tailored to your interests. Click Agree and Proceed to accept cookies and go directly to the site or click on View Cookie Settings to see detailed descriptions of the types of cookies and choose whether to accept certain cookies while on the site.
About Author
Kiran Sharma
Kiran has good knowledge of java with Servlets, JSPs, Spring, and hibernate frameworks. She is very honest towards her work. Her hobby is listening to music.