How to Convert a file in PDF in Java
Posted By : Balgovind Prajapati | 20-Jun-2019
Here I am going to share a brief introduction about PDFBox API. Many times we need to read data from some pdf document and sometimes we need to write data in pdf format using our programming code. So in Java, we have API provided by Apache. For this, we need to import - import org.apache.pdfbox.*;
It is an open source PDFBox API. It helps us to write Java programs that will help to create, delete and manipulate a PDF document in the application. In addition to this, PDFBox also introduces a command line utility for executing various controls over pdf.
Features of PDFBox API :
1.Helpful in extracting Unicode text from pdf files.
2.Helpful in splitting a single pdf into many files or we can merge many pdf files.
3.Helpful in filling a pdf form or extracts data from pdf forms.
4.Helpful in validating pdf files against the PDF/A-1b standard.
5.Helpful in saving pdf as jpeg or png files.
6.Can create pdf from scratch with embedded fonts and images.
Now let us have a look at the above features using Java code.
Splitting the Pages in a PDF Document
import org.apache.pdfbox.multipdf.Splitter; import org.apache.pdfbox.pdmodel.PDDocument; import java.io.File; import java.io.IOException; import java.util.List; import java.util.Iterator; public class PdfDemo { public static void main(String[] args) throws IOException { File file = new File("/home/balgovind/Desktop/new.pdf"); PDDocument doc = PDDocument.load(file); Splitter splitter = new Splitter(); // splitting pdf document pages List pages = splitter.split(doc); Iterator itr = pages.listIterator(); // Saving each page as an individual document int i = 1; while (itr.hasNext()) { PDDocument pd = itr.next(); pd.save("/home/balgovind/Desktop/new" + i++ + ".pdf"); } System.out.println("Multiple PDF’s created"); doc.close(); } }
Reading text from an existing pdf document
We can read data by using getText() method available in PDFTextStripper class.
import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.text.PDFTextStripper; import java.io.File; import java.io.IOException; public class PdfDemo { public static void main(String[] args) throws IOException { File file = new File("/home/balgovind/Desktop/new.pdf"); PDDocument doc = PDDocument.load(file); PDFTextStripper pdfStripper = new PDFTextStripper(); // Retrieving text from PDF document String txt = pdfStripper.getText(doc); System.out.println(txt); doc.close(); } }
Setting the pdf document property
This API provides you a class named PDDocumentInformation. This class gives a set of setter and getter methods.
import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDDocumentInformation; import org.apache.pdfbox.pdmodel.PDPage; import java.io.IOException; import java.util.Calendar; import java.util.GregorianCalendar; public class PdfDemo { public static void main(String[] args) throws IOException { PDDocument doc = new PDDocument(); // Creating blank page PDPage page = new PDPage(); // Adding blank page to the document doc.addPage(page); PDDocumentInformation pdi = doc.getDocumentInformation(); pdi.setAuthor("BalGovoind"); pdi.setTitle(" Java Programming"); pdi.setCreator("BalGovind"); pdi.setSubject("Demo Document"); // Setting the created date of the document Calendar date = new GregorianCalendar(); date.set(2018, 9, 29); pdi.setCreationDate(date); // Setting the modified date of the document date.set(2018, 9, 30); pdi.setModificationDate(date); // Setting keywords for the document pdi.setKeywords("java, java programming"); doc.save("/home/kiran/Desktop/kiran.pdf"); System.out.println("Properties added successfully "); doc.close(); } }
Cookies are important to the proper functioning of a site. To improve your experience, we use cookies to remember log-in details and provide secure log-in, collect statistics to optimize site functionality, and deliver content tailored to your interests. Click Agree and Proceed to accept cookies and go directly to the site or click on View Cookie Settings to see detailed descriptions of the types of cookies and choose whether to accept certain cookies while on the site.
About Author
Balgovind Prajapati
BalGovind is a Java Developer . He has good skill in Java, Spring, Hibernate, J2EE, MySql. He is a goal oriented and focused person.