How to Convert a file in PDF in Java

Posted By : Balgovind Prajapati | 20-Jun-2019

Here I am going to share a  brief introduction about PDFBox API. Many times we need to read data from some pdf document and sometimes we need to write data in pdf format using our programming code. So in Java, we have  API provided by Apache. For this, we need to import - import org.apache.pdfbox.*;

 

It is an open source PDFBox API. It helps us to write Java programs that will help to create, delete and manipulate a PDF document in the application. In addition to this, PDFBox also introduces a command line utility for executing various controls over pdf.

 

Features of PDFBox API :

   1.Helpful in extracting Unicode text from pdf files.

   2.Helpful in splitting a single pdf into many files or we can merge many pdf files.

   3.Helpful in filling a pdf form or extracts data from pdf forms.

   4.Helpful in validating pdf files against the PDF/A-1b standard.

   5.Helpful in saving pdf as jpeg or png files.

   6.Can create pdf from scratch with embedded fonts and images.

 

Now let us have a look at the above features using Java code.

 

  Splitting the Pages in a PDF Document

 
import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.Iterator;

public class PdfDemo {
    public static void main(String[] args) throws IOException {

        File file = new File("/home/balgovind/Desktop/new.pdf");
        PDDocument doc = PDDocument.load(file);
        Splitter splitter = new Splitter();

        // splitting pdf document pages
        List pages = splitter.split(doc);

        Iterator itr = pages.listIterator();

        // Saving each page as an individual document
        int i = 1;
        while (itr.hasNext()) {
            PDDocument pd = itr.next();
            pd.save("/home/balgovind/Desktop/new" +  i++  + ".pdf");
        }
        System.out.println("Multiple PDF’s created");
        doc.close();
    }
}
 

   Reading text from an existing pdf document

We can read data by using getText() method available in PDFTextStripper class.

 

import org.apache.pdfbox.pdmodel.PDDocument;

import org.apache.pdfbox.text.PDFTextStripper;
import java.io.File;
import java.io.IOException;

public class PdfDemo {
    public static void main(String[] args) throws IOException {

        File file = new File("/home/balgovind/Desktop/new.pdf");
        PDDocument doc = PDDocument.load(file);
        PDFTextStripper pdfStripper = new PDFTextStripper();
        // Retrieving text from PDF document
        String txt = pdfStripper.getText(doc);
        System.out.println(txt);
        doc.close();
    }
}
 

    Setting the pdf document property

This API provides you a class named PDDocumentInformation. This class gives a set of setter and getter methods.

 
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.pdmodel.PDPage;
import java.io.IOException;
import java.util.Calendar;
import java.util.GregorianCalendar;

public class PdfDemo {
    public static void main(String[] args) throws IOException {

        PDDocument doc = new PDDocument();

        // Creating blank page
        PDPage page = new PDPage();

        // Adding blank page to the document
        doc.addPage(page);

        PDDocumentInformation pdi = doc.getDocumentInformation();

        pdi.setAuthor("BalGovoind");

        pdi.setTitle(" Java Programming");

        pdi.setCreator("BalGovind");

        pdi.setSubject("Demo Document");

        // Setting the created date of the document
        Calendar date = new GregorianCalendar();
        date.set(2018, 9, 29);
        pdi.setCreationDate(date);
        // Setting the modified date of the document
        date.set(2018, 9, 30);
        pdi.setModificationDate(date);

        // Setting keywords for the document
        pdi.setKeywords("java, java programming");

        doc.save("/home/kiran/Desktop/kiran.pdf");

        System.out.println("Properties added successfully ");

        doc.close();

    }
}
     
 
Thanks

About Author

Author Image
Balgovind Prajapati

BalGovind is a Java Developer . He has good skill in Java, Spring, Hibernate, J2EE, MySql. He is a goal oriented and focused person.

Request for Proposal

Name is required

Comment is required

Sending message..