JAVA PDFBOX API

Posted By : Kiran Sharma | 30-Sep-2018

In this blog, I am going to give you a brief introduction about PDFBox API. Many times we need to read data from some pdf document and sometimes we need to write data in pdf format using our programming code. So in Java, we have  API provided by Apache. For this, we need to import - import org.apache.pdfbox.*;

 

It is an open source PDFBox API. It helps us to write Java programs that will help to create, delete and manipulate a PDF document in the application. In addition to this, PDFBox also introduces a command line utility for executing various controls over pdf.

 

Features of PDFBox API :

1.Helpful in extracting Unicode text from pdf files.

2.Helpful in splitting a single pdf into many files or we can merge many pdf files.

3.Helpful in filling a pdf form or extracts data from pdf forms.

4.Helpful in validating pdf files against the PDF/A-1b standard.

5.Helpful in saving pdf as jpeg or png files.

6.Can create pdf from scratch with embedded fonts and images.

Now let us have a look at the above features using Java code.

 

 

 

Splitting the Pages in a PDF Document

 

import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;

import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.Iterator;

public class PdfDemo {
	public static void main(String[] args) throws IOException {

		File file = new File("/home/kiran/Desktop/HeadFirstDesignPatterns.pdf");
		PDDocument doc = PDDocument.load(file);
		Splitter splitter = new Splitter();

		// splitting pdf document pages
		List<pddocument> pages = splitter.split(doc);

		Iterator<pddocument> itr = pages.listIterator();

		// Saving each page as an individual document
		int i = 1;
		while (itr.hasNext()) {
			PDDocument pd = itr.next();
			pd.save("/home/kiran/Desktop/HeadFirstDesignPatterns" + i++ + ".pdf");
		}
		System.out.println("Multiple PDF’s created");
		doc.close();
	}
}

 

 

Reading text from an existing pdf document

 

 

 

We can read data by using getText() method available in PDFTextStripper class.

 

 

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import java.io.File;
import java.io.IOException;

public class PdfDemo {
	public static void main(String[] args) throws IOException {

		File file = new File("/home/kiran/Desktop/HeadFirstDesignPatterns.pdf");
		PDDocument doc = PDDocument.load(file);
		PDFTextStripper pdfStripper = new PDFTextStripper();
		// Retrieving text from PDF document
		String txt = pdfStripper.getText(doc);
		System.out.println(txt);
		doc.close();
	}
}

 

 

Setting the pdf document property

 

 

This API provides you a class named PDDocumentInformation. This class gives a set of setter and getter methods.

 

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.pdmodel.PDPage;
import java.io.IOException;
import java.util.Calendar;
import java.util.GregorianCalendar;

public class PdfDemo {
	public static void main(String[] args) throws IOException {

		PDDocument doc = new PDDocument();

		// Creating blank page
		PDPage page = new PDPage();

		// Adding blank page to the document
		doc.addPage(page);

		PDDocumentInformation pdi = doc.getDocumentInformation();

		pdi.setAuthor("Kiran Sharma");

		pdi.setTitle("Learn Java Programming");

		pdi.setCreator("Kiran Sharma");

		pdi.setSubject("Demo Document");

		// Setting the created date of the document
		Calendar date = new GregorianCalendar();
		date.set(2018, 9, 29);
		pdi.setCreationDate(date);
		// Setting the modified date of the document
		date.set(2018, 9, 30);
		pdi.setModificationDate(date);

		// Setting keywords for the document
		pdi.setKeywords("java, java programming");

		doc.save("/home/kiran/Desktop/kiran.pdf");

		System.out.println("Properties added successfully ");

		doc.close();

	}
}
        
 
 
 
 
 
 
 
 

About Author

Author Image
Kiran Sharma

Kiran has good knowledge of java with Servlets, JSPs, Spring, and hibernate frameworks. She is very honest towards her work. Her hobby is listening to music.

Request for Proposal

Name is required

Comment is required

Sending message..