iText 7

Chapter 7: Creating PDF/UA and PDF/A documents

In chapter 1 to 4, we've created PDF documents using iText 7. In chapters 5 and 6, we've manipulated and reused existing PDF documents. All the PDFs we dealt with in those chapters were PDF documents that complied to ISO 32000, which is the core standard for PDF. ISO 32000 isn't the only ISO standard for PDF, there are many different sub-standards that were created for specific reasons. In this chapter, we'll highlight two:

  • ISO 14289 is better known as PDF/UA. UA stands for Universal Accessibility. PDFs that comply with the PDF/UA standard can be consumed by anyone, including people who are blind or visually impaired.

  • ISO 19005 is better known as PDF/A. A stands for Archiving. The goal of this standard is the long-term preservation of digital documents.

In this chapter, we'll learn more about PDF/A and PDF/UA by creating a series of PDF/A and PDF/UA files.

Creating accessible PDF documents

Before we start with a PDF/UA example, let's take a closer look at the problem we want to solve. In chapter 1, we created a document that included images. In the sentence "Quick brown fox jumps over the lazy dog", we replaced the words "fox" and "dog" by images representing a fox and a dog. When this file is read out loud, a machine doesn't know that the first image represents a fox and that the second image represents a dog, hence the file will be read as "Quick brown jumps over the lazy."

In an ordinary PDF, content is painted to a canvas. We might use high-level objects such as List and Table, but once the PDF is created, there is no structure left. A list is a sequence of lines and a text snippet in a list item doesn't know that it's part of a list. A table is just a bunch of lines and text added at absolute positions on a page. A text snippet in a table doesn't know it belongs to a cell in a specific column and a specific row.

Unless we make the PDF a tagged PDF, the document doesn't contain any semantic structure. When there's no semantic structure, the PDF isn't accessible. To be accessible, the document needs to be able to distinguish which part of a page is actual content, and which part is an artifact that isn't part of the actual content (e.g. a header, a page number). A line of text needs to know if its a title, if it's part of a paragraph, and so on. We can add all of this information to the page, by creating a structure tree and by defining content as marked content. This sounds complex, but if you use iText 7's high-level objects, it's sufficient to introduce the method setTagged(). By defining a PdfDocument as a tagged document, the structure we introduce by using objects such as ListTableParagraph, will be reflected in the Tagged PDF.

This is only one requirement to make a PDF accessible. The QuickBrownFox_PDFUA example will help us understand the other requirements.

PdfDocument pdf = new PdfDocument(new PdfWriter(dest, new WriterProperties().AddUAXmpMetadata()));
Document document = new Document(pdf);
//Setting some required parameters
pdf.SetTagged();
pdf.GetCatalog().SetLang(new PdfString("en-US"));
pdf.GetCatalog().SetViewerPreferences(new PdfViewerPreferences().SetDisplayDocTitle(true));
PdfDocumentInfo info = pdf.GetDocumentInfo();
info.SetTitle("iText7 PDF/UA example");
//Fonts need to be embedded
PdfFont font = PdfFontFactory.CreateFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p = new Paragraph();
p.SetFont(font);
p.Add(new Text("The quick brown "));
iText.Layout.Element.Image foxImage = new Image(ImageDataFactory.Create(FOX));
//PDF/UA: Set alt text
foxImage.GetAccessibilityProperties().SetAlternateDescription("Fox");
p.Add(foxImage);
p.Add(" jumps over the lazy ");
iText.Layout.Element.Image dogImage = new iText.Layout.Element.Image(ImageDataFactory.Create(DOG));
//PDF/UA: Set alt text
dogImage.GetAccessibilityProperties().SetAlternateDescription("Dog");
p.Add(dogImage);
document.Add(p);
document.Close();

We create a PdfDocument and a Document, but this time we tell the 'PdfWriter' to automatically add XMP metadata using the 'addUAXmpMetadata()' method of 'WriterProperties'. In PDF/UA, it is mandatory to have the same metadata stored in the PDF as XML. This XML may not be compressed. Processors that don't "understand" PDF must be able to detect this XMP metadata and process it. An XMP stream is created automatically based on the entries in the Info dictionary. This Info dictionary is a PDF Object that includes such data as the title of the document. In addition to this requirement, we make sure that we comply to PDF by introducing some extra features:

  1. We tell the PdfDocument that we're going to create Tagged PDF (line 4),

  2. We add a language specifier. In our case, the document knows that the main language used in this document is American English (line 5).

  3. We change the viewer preferences so that the title of the document is always displayed in the top bar of the PDF viewer (line 6-7). Obviously, this implies that we add a title to the metadata of the document (line 8-9).

  4. All fonts need to be embedded (line 12). There are some other requirements relating to fonts, but it would lead us too far right now to discuss these in detail.

  5. All the content needs to be tagged. When an image is encountered, we need to provide a description of that image using alt text (line 16 and line 21).

We have now created a PDF/UA document. When we look at the resulting page in Figure 7.1, we don't see much difference, but if we open the Tags panel, we see that the document has a specific structure.

Figure 7.1: a PDF/UA document and its structure

Figure 7.1: a PDF/UA document and its structure

We see that the <Document> consists of a <P>aragraph that is composed of four parts, two <Span>s and two <Figures>s. We'll create a more complex PDF/UA document later in this chapter, but let's take a look at what makes PDF/A special first.

Creating PDFs for long-term preservation, part 1

Part 1 of ISO 19005 was released in 2005. It was defined as a subset of version 1.4 of Adobe's PDF specification (which, at that time, wasn't an ISO standard yet). ISO 19005-1 introduced a series of obligations and restrictions:

  • The document needs to be self-contained: all fonts need to be embedded; external movie, sound or other binary files are not allowed.

  • The document needs to contain metadata in the eXtensible Metadata Platform (XMP) format: ISO 16684 (XMP) describes how to embed XML metadata into a binary file, so that software that doesn't know how to interpret the binary data format can still extract the file's metadata.

  • Functionality that isn't future-proof isn't allowed: the PDF can't contain any JavaScript and may not be encrypted.

ISO 19005-1:2005 (PDF/A-1) defined two conformance levels:

  • Level B ("basic"): ensures that the visual appearance of a document will be preserved for the long term.

  • Level A ("accessible"): ensures that the visual appearance of a document will be preserved for the long term, but also introduces structural and semantic properties. The PDF needs to be a Tagged PDF.

The QuickBrownFox_PDFA_1b example shows how we can create a "Quick brown fox" PDF that complies to PDF/A-1b.

//Initialize PDFA document with output intent
PdfADocument pdf = new PdfADocument(new PdfWriter(dest), PdfAConformanceLevel.PDF_A_1B, new PdfOutputIntent
    ("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileStream(INTENT, FileMode.Open, FileAccess.Read
    )));
Document document = new Document(pdf);
//Fonts need to be embedded
PdfFont font = PdfFontFactory.CreateFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p = new Paragraph();
p.SetFont(font);
p.Add(new Text("The quick brown "));
iText.Layout.Element.Image foxImage = new Image(ImageDataFactory.Create(FOX));
p.Add(foxImage);
p.Add(" jumps over the lazy ");
iText.Layout.Element.Image dogImage = new iText.Layout.Element.Image(ImageDataFactory.Create(DOG));
p.Add(dogImage);
document.Add(p);
document.Close();

The first thing that jumps to the eye, is that we are no longer using a PdfDocument instance. Instead, we create a PdfADocument instance. The PdfADocument constructor needs a PdfWriter as its first parameter, but also a conformance level (in this case PdfAConformanceLevel.PDF_A_1B) and a PdfOutputIntent. This output intent tells the document how to interpret the colors that will be used in the document. In line 9, we make sure that the font we're using is embedded.

Figure 7.2: a PDF/A-1 level B document

Figure 7.2: a PDF/A-1 level B document

Looking at the PDF shown in Figure 7.2, we see a blue ribbon with the text "This file claims compliance with the PDF/A standard and has been opened read-only to prevent modification." Allow me to explain two things about this sentence:

  1. This doesn't mean that the PDF is, in effect, compliant with the PDF/A standard. It only claims it is. To be sure, you need to open the Standards panel in Adobe Acrobat. When you click on the "Verify Conformance" link, Acrobat will verify if the document is what it claims to be. In this case, we read "Status: verification succeeded"; we have successfully created a document complying with PDF/A-1B.

  2. The document has been opened read-only, not because you are not allowed to modify it (PDF/A is not a way to protect a PDF against modification), but Adobe Acrobat presents it as read-only because any modification might change the PDF into a PDF that is no longer compliant to the PDF/A standard. It's not trivial to update a PDF/A without breaking its PDF/A status.

Let's adapt our example, and create a PDF/A-1 level A document with the QuickBrownFox_PDFA_1a example.

//Initialize PDFA document with output intent
PdfADocument pdf = new PdfADocument(new PdfWriter(dest), PdfAConformanceLevel.PDF_A_1A, new PdfOutputIntent
    ("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileStream(INTENT, FileMode.Open, FileAccess.Read
    )));
Document document = new Document(pdf);
//Setting some required parameters
pdf.SetTagged();
//Fonts need to be embedded
PdfFont font = PdfFontFactory.CreateFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p = new Paragraph();
p.SetFont(font);
p.Add(new Text("The quick brown "));
iText.Layout.Element.Image foxImage = new Image(ImageDataFactory.Create(FOX));
//Set alt text
foxImage.GetAccessibilityProperties().SetAlternateDescription("Fox");
p.Add(foxImage);
p.Add(" jumps over the lazy ");
iText.Layout.Element.Image dogImage = new iText.Layout.Element.Image(ImageDataFactory.Create(DOG));
//Set alt text
dogImage.GetAccessibilityProperties().SetAlternateDescription("Dog");
p.Add(dogImage);
document.Add(p);
document.Close();

We've changed PdfAConformanceLevel.PDF_A_1B into PdfAConformanceLevel.PDF_A_1A in line 3. We've made the PdfADocument a Tagged PDF (line 7) and we've added some alt text for the images. Figure 7.3 is somewhat confusing.

Figure 7.3: a PDF/A-1 level A document

Figure 7.3: a PDF/A-1 level A document

When we look at the Standards panel, we see that the document thinks it conforms to PDF/A-1A and to PDF/UA-1. We don't have a "Verify Conformance" link, so we have to use Preflight. Preflight informs us that there were "No problems found" when executing the "Verify compliance with PDF/A-1a" profile. We can't verify the PDF/UA compliance because PDF/UA involves some requirements that can't be verified by a machine. For instance: a machine wouldn't notice if we switched the description of the image of the fox with the description of the image of the dog. That would make the document inaccessible as the document would spread false information to people depending on screen-readers. In any case, we know that our document doesn't comply to the PDF/UA standard because we omitted a number of essential elements (such as the language).

From the start, it was determined that approved parts of ISO 19005 could never become invalid. New, subsequent parts would only define new, useful features. That's what happened when part 2 and part 3 were created.

Creating PDFs for long-term preservation, part 2 and 3

ISO 19005-2:2011 (PDF/A-2) was introduced to have a PDF/A standard that was based on the ISO standard (ISO 32000-1) instead of on Adobe's PDF specification. PDF/A-2 also adds a handful of features that were introduced in PDF 1.5, 1.6 and 1.7:

  • Useful additions include: support for JPEG2000, Collections, object-level XMP, and optional content.

  • Useful improvements include: better support for transparency, comment types and annotations, and digital signatures.

PDF/A-2 also defines an extra level besides Level A and Level B:

  • Level U ("Unicode"): ensures that the visual appearance of a document will be preserved for the long term, and that all text is stored in UNICODE.

ISO 19005-3:2012 (PDF/A-3) was an almost identical copy of PDF/A-2. There was only one difference with PDF/A-2: in PDF/A-3, attachments don't need to be PDF/A. You can attach any file to a PDF/A-3, for instance: an XLS file containing calculations of which the results are used in the document, the original Word document that was used to create the PDF document, and so on. The document itself needs to conform to all the obligations and restrictions of the PDF/A specification, but these obligations and restrictions do not apply to its attachments.

In the UnitedStates_PDFA_3a example, we'll create a document that complies with PDF/UA as well as with PDF/A-3A. We choose PDF/A3, because we're going to add the CSV file that was used as the source for creating the PDF.

PdfADocument pdf = new PdfADocument(new PdfWriter(dest), PdfAConformanceLevel.PDF_A_3A, new PdfOutputIntent
    ("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileStream(INTENT, FileMode.Open, FileAccess.Read
    )));
Document document = new Document(pdf, PageSize.A4.Rotate());
document.SetMargins(20, 20, 20, 20);
//Setting some required parameters
pdf.SetTagged();
pdf.GetCatalog().SetLang(new PdfString("en-US"));
pdf.GetCatalog().SetViewerPreferences(new PdfViewerPreferences().SetDisplayDocTitle(true));
PdfDocumentInfo info = pdf.GetDocumentInfo();
info.SetTitle("iText7 PDF/A-3 example");
//Add attachment
PdfDictionary parameters = new PdfDictionary();
parameters.Put(PdfName.ModDate, new PdfDate().GetPdfObject());
PdfFileSpec fileSpec = PdfFileSpec.CreateEmbeddedFileSpec(pdf, File.ReadAllBytes(System.IO.Path.Combine(DATA
    )), "united_states.csv", "united_states.csv", new PdfName("text/csv"), parameters, PdfName.Data, false
    );
fileSpec.Put(new PdfName("AFRelationship"), new PdfName("Data"));
pdf.AddFileAttachment("united_states.csv", fileSpec);
PdfArray array = new PdfArray();
array.Add(fileSpec.GetPdfObject().GetIndirectReference());
pdf.GetCatalog().Put(new PdfName("AF"), array);
//Embed fonts
PdfFont font = PdfFontFactory.CreateFont(FONT, true);
PdfFont bold = PdfFontFactory.CreateFont(BOLD_FONT, true);
// Create content
Table table = new Table(new float[] { 4, 1, 3, 4, 3, 3, 3, 3, 1 });
table.SetWidth(UnitValue.CreatePercentValue(100));
StreamReader sr = File.OpenText(DATA);
String line = sr.ReadLine();
Process(table, line, bold, true);
while ((line = sr.ReadLine()) != null) {
    Process(table, line, font, false);
}
sr.Close();
document.Add(table);
//Close document
document.Close();

Let's examine the different parts of this example.

  • Line 1-5: We create a PdfADocument (PdfAConformanceLevel.PDF_A_3A) and a Document.

  • Line 7: Making the PDF a Tagged PDF is a requirement for PDF/UA as well as for PDF/A-3A.

  • Line 8-11: Setting the language, the document title and the viewer preference to display the title is a requirement for PDF/UA.

  • Line 13-22: We add a file attachment using specific parameters that are required for PDF/A-3A.

  • Line 24-25: We embed the fonts which is a requirement for PDF/UA as well as for PDF/A.

  • Line 27-36: We've seen this code before in the UnitedStates example in chapter 1 (including the process() method).

  • Line 38: We close the document.

Figure 7.4 demonstrates how using the Table class with Cell objects added as header cells, and Cell objects added as normal cells, resulted in a structure tree that makes the PDF document accessible.

Figure 7.4: a PDF/A-3 level A document

Figure 7.4: a PDF/A-3 level A document

When we open the Attachments panel as shown in Figure 7.5, we see our original united_states.csv file that we can easily extract from the PDF.

Figure 7.5: a PDF/A-3 level A document and its attachment

Figure 7.5: a PDF/A-3 level A document and its attachment

The examples in this chapter taught us that PDF/UA or PDF/A documents involve extra requirements when compared to ordinary PDFs. "Can we use iText to convert an existing PDF to a PDF/UA or PDF/A document" is a question that is posted frequently on mailing-lists or user forums. I hope that this chapter explains that iText can't do this automatically.

  • If you have a document that has a picture of a fox and a dog, iText can't add any missing alt text for those images, because iText can't see that fox nor that dog. iText only sees pixels, it can't interpret the image.

  • If you are using a font that isn't embedded, iText doesn't know what that font looks like. If you don't provide the corresponding font program, iText can never embed that font.

These are only two examples of many that explain why converting an ordinary PDF to PDF/A or PDF/UA isn't trivial. It's very easy to change the PDF so that it shows a blue bar saying that the document complies to PDF/A, but that doesn't many that claim is true.

We also need to pay attention when we merge existing PDF/A documents.

Merging PDF/A documents

When merging PDF/A documents, it's very important that every single document that you are adding to PdfMerger is already a PDF/A document. You can't mix PDF/A documents and ordinary PDF documents into one single PDF and hope the result will be a PDF/A document. The same is true for mixing a PDF/A level A document with a PDF/A level B document. One has a structure tree, the other hasn't; you can't expect the resulting PDF to be a PDF/A level A document.

Figure 7.6 shows how we merged the two PDF/A level A documents we created in the previous sections.

Figure 7.6: merging 2 PDF/A level A documents

Figure 7.6: merging 2 PDF/A level A documents

When we look at the structure of the tags, we see that the <P>aragraph is now followed by a <Table>. The MergePDFADocuments shows how it's done.

//Initialize PDFA document with output intent
PdfADocument pdf = new PdfADocument(new PdfWriter(dest), PdfAConformanceLevel.PDF_A_1A, new PdfOutputIntent
    ("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileStream(INTENT, FileMode.Open, FileAccess.Read
    )));
//Setting some required parameters
pdf.SetTagged();
pdf.GetCatalog().SetLang(new PdfString("en-US"));
pdf.GetCatalog().SetViewerPreferences(new PdfViewerPreferences().SetDisplayDocTitle(true));
PdfDocumentInfo info = pdf.GetDocumentInfo();
info.SetTitle("iText7 PDF/A-1a example");
//Create PdfMerger instance
PdfMerger merger = new PdfMerger(pdf);
//Add pages from the first document
PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(SRC1));
merger.Merge(firstSourcePdf, 1, firstSourcePdf.GetNumberOfPages());
//Add pages from the second pdf document
PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(SRC2));
merger.Merge(secondSourcePdf, 1, secondSourcePdf.GetNumberOfPages());
//Close the documents
firstSourcePdf.Close();
secondSourcePdf.Close();
pdf.Close();

This example is assembled using parts of two examples we've already seen before:

  • Lines 1 to 10 are almost identical to the first part of the UnitedStates_PDFA_3a example we've used in the previous section, except that we now use PdfAConformanceLevel.PDF_A_1A and that we don't need a Document object.

  • Lines 12 to 22 are identical to the last part of the 88th_Oscar_Combine example of the previous chapter. Note that we use a PdfDocument instance instead of a PdfADocument; the PdfADocument will check if the source documents comply.

There's a lot more to be said about PDF/UA and PDF/A, and even about other sub-standards. For instance: there's a German standard for invoicing called ZUGFeRD that is built on top of PDF/A-3, but let's save that for another tutorial.

Summary

In this chapter, we've discovered that there's more to PDF than meets the eye. We've learned how to introduce structure into our documents so that they are accessible for the blind and the visually impaired. We've also made sure that our PDFs were self-contained, for instance by embedding fonts, so that our documents can be archived for the long term.

We'll need several other tutorials to cover the functionality covered in this tutorial in more depth, but these seven chapters should already give you a good impression of what you can do with iText 7.

Subtitle
eBook

Chapter 6: Reusing existing PDF documents

In this chapter, we'll do some more document manipulation, but there will be a subtle difference in approach. In the examples of the previous chapter, we created one PdfDocument instance that linked a PdfReader to a PdfWriter. We manipulated a single document.

In this chapter, we'll always create at least two PdfDocument instances: one or more for the source document(s), and one for the destination document.

Scaling, tiling, and N-upping

Let's start with some examples that scale and tile a document.

Scaling PDF pages

Suppose that we have a PDF file with a single page, measuring 16.54 by 11.69 in. See Figure 6.1.

Figure 6.1: Golden Gate Bridge, original size 16.54 x 11.69 in

Figure 6.1: Golden Gate Bridge, original size 16.54 x 11.69 in

Now we want to create a PDF file with three pages. In page one, the original page is scaled down to 11.69 x 8.26 in as shown in Figure 6.2. On page 2, the original page size is preserved. On page 3, the original page is scaled up to 23.39 x 16.53 in as shown in Figure 6.3.

Figure 6.2: Golden Gate Bridge, scaled down to 11.69 x 8.26 in

Figure 6.2: Golden Gate Bridge, scaled down to 11.69 x 8.26 in

Figure 6.3: Golden Gate Bridge, scaled up to 23.39 x 16.53 in

Figure 6.3: Golden Gate Bridge, scaled up to 23.39 x 16.53 in

The TheGoldenGateBridge_Scale_Shrink example shows how it's done.

//Initialize PDF document
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
PdfDocument origPdf = new PdfDocument(new PdfReader(src));
//Original page size
PdfPage origPage = origPdf.GetPage(1);
Rectangle orig = origPage.GetPageSizeWithRotation();
//Add A4 page
PdfPage page = pdf.AddNewPage(PageSize.A4.Rotate());
//Shrink original page content using transformation matrix
PdfCanvas canvas = new PdfCanvas(page);
AffineTransform transformationMatrix = AffineTransform.GetScaleInstance(page.GetPageSize().GetWidth() / orig
    .GetWidth(), page.GetPageSize().GetHeight() / orig.GetHeight());
canvas.ConcatMatrix(transformationMatrix);
PdfFormXObject pageCopy = origPage.CopyAsFormXObject(pdf);
canvas.AddXObject(pageCopy, 0, 0);
//Add page with original size
pdf.AddPage(origPage.CopyTo(pdf));
//Add A2 page
page = pdf.AddNewPage(PageSize.A2.Rotate());
//Scale original page content using transformation matrix
canvas = new PdfCanvas(page);
transformationMatrix = AffineTransform.GetScaleInstance(page.GetPageSize().GetWidth() / orig.GetWidth(), page
    .GetPageSize().GetHeight() / orig.GetHeight());
canvas.ConcatMatrix(transformationMatrix);
canvas.AddXObject(pageCopy, 0, 0);
pdf.Close();
origPdf.Close();

In this code snippet, we create a PdfDocument instance that will create a new PDF document (line 2); and we create a PdfDocument instance that will read an existing PDF document (line 3). We get a PdfPage instance for the first page of the existing PDF (line 3), and we get its dimensions (line 6). We then add three pages to the new PDF document:

  1. We add an A4 page using landscape orientation (line 8) and we create a PdfCanvas object for that page. Instead of calculating the abcde, and f value for a transformation matrix that will scale the coordinate system, we use an AffineTransform instance using the getScaleInstance() method (line 11-12). We apply that transformation (line 13), we create a Form XObject containing the original page (line 14) and we add that XObject to the new page (line 15).

  2. Adding the original page in its original dimensions is much easier. We just create a new page by copying the origPage to the new PdfDocument instance, and we add it to the pdf using the addPage() method (line 17).

  3. Scaling up and shrinking is done in the exact same way. This time, we add a new A2 page using landscape orientation (line 19) and we use the exact same code we had before to scale the coordinate system (line 22-24). We reuse the pageCopy object and add it to the canvas (line 25).

We close the pdf to finalize the new document (line 30) and we close the origPdf to release the resources of the original document.

We can use the same functionality to tile a PDF page.

Tiling PDF pages

Tiling a PDF page means that you distribute the content of one page over different pages. For instance: if you have a PDF with a single page of size A3, you can create a PDF with four pages of a different size –or even the same size–, each showing one quarter of the original A3 page. This is what we've done in Figure 6.4.

Figure 6.4: Golden Gate Bridge, tiled pages

Figure 6.4: Golden Gate Bridge, tiled pages

Let's take a look at the TheGoldenGateBridge_Tiles example.

//Initialize PDF document
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
PdfDocument sourcePdf = new PdfDocument(new PdfReader(src));
//Original page
PdfPage origPage = sourcePdf.GetPage(1);
PdfFormXObject pageCopy = origPage.CopyAsFormXObject(pdf);
//Original page size
Rectangle orig = origPage.GetPageSize();
//Tile size
Rectangle tileSize = PageSize.A4.Rotate();
// Transformation matrix
AffineTransform transformationMatrix = AffineTransform.GetScaleInstance(tileSize.GetWidth() / orig.GetWidth
    () * 2f, tileSize.GetHeight() / orig.GetHeight() * 2f);
//The first tile
PdfPage page = pdf.AddNewPage(PageSize.A4.Rotate());
PdfCanvas canvas = new PdfCanvas(page);
canvas.ConcatMatrix(transformationMatrix);
canvas.AddXObject(pageCopy, 0, -orig.GetHeight() / 2f);
//The second tile
page = pdf.AddNewPage(PageSize.A4.Rotate());
canvas = new PdfCanvas(page);
canvas.ConcatMatrix(transformationMatrix);
canvas.AddXObject(pageCopy, -orig.GetWidth() / 2f, -orig.GetHeight() / 2f);
//The third tile
page = pdf.AddNewPage(PageSize.A4.Rotate());
canvas = new PdfCanvas(page);
canvas.ConcatMatrix(transformationMatrix);
canvas.AddXObject(pageCopy, 0, 0);
//The fourth tile
page = pdf.AddNewPage(PageSize.A4.Rotate());
canvas = new PdfCanvas(page);
canvas.ConcatMatrix(transformationMatrix);
canvas.AddXObject(pageCopy, -orig.GetWidth() / 2f, 0);
pdf.Close();
sourcePdf.Close();

We've seen lines 1-8 before; we already used them in the previous example. In line 10, we define a tile size, and we create a transformationMatrix to scale the coordinate system depending on the original size and the tile size. Then we add the tiles, one by one: line 15-18, line 20-23, line 25-28, and line 30-33 are identical, except for one detail: the offset used in the addXObject() method.

Let's use the PDF with the Golden Gate Bridge for one more example. Let's do the opposite of tiling: let's N-up a PDF.

N-upping a PDF

Figure 6.5 shows what we mean by N-upping. In the next example, we're going to put N pages on one single page.

Figure 6.5: Golden Gate Bridge, four pages on one

Figure 6.5: Golden Gate Bridge, four pages on one

In the TheGoldenGateBridge_N_up example, N is equal to 4. We will put 4 pages on one single page.

//Initialize PDF document
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
PdfDocument sourcePdf = new PdfDocument(new PdfReader(SRC));
//Original page
PdfPage origPage = sourcePdf.GetPage(1);
//Original page size
Rectangle orig = origPage.GetPageSize();
PdfFormXObject pageCopy = origPage.CopyAsFormXObject(pdf);
//N-up page
PageSize nUpPageSize = PageSize.A4.Rotate();
PdfPage page = pdf.AddNewPage(nUpPageSize);
PdfCanvas canvas = new PdfCanvas(page);
//Scale page
AffineTransform transformationMatrix = AffineTransform.GetScaleInstance(nUpPageSize.GetWidth() / orig.GetWidth
    () / 2f, nUpPageSize.GetHeight() / orig.GetHeight() / 2f);
canvas.ConcatMatrix(transformationMatrix);
//Add pages to N-up page
canvas.AddXObject(pageCopy, 0, orig.GetHeight());
canvas.AddXObject(pageCopy, orig.GetWidth(), orig.GetHeight());
canvas.AddXObject(pageCopy, 0, 0);
canvas.AddXObject(pageCopy, orig.GetWidth(), 0);
pdf.Close();
sourcePdf.Close();

So far, we've only reused a single page from a single PDF in this chapter. In the next series of examples, we'll assemble different PDF files into one.

Assembling documents

Let's go from San Francisco to Los Angeles, and take a look at Figure 6.6 where we'll find three documents about the Oscars.

Figure 6.6: The Oscars, source documents

Figure 6.6: The Oscars, source documents

The documents are:

In the next couple of examples, we'll merge these documents.

Merging documents with PdfMerger

Figure 6.7 shows a PDF that was created by merging the first 32-page document with the second 15-page document, resulting in a 47-page document.

Figure 6.7: Merging two documents

Figure 6.7: Merging two documents

The code of the 88th_Oscar_Combine example is almost self-explaining.

//Initialize PDF document with output intent
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
PdfMerger merger = new PdfMerger(pdf);
//Add pages from the first document
PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(SRC1));
merger.Merge(firstSourcePdf, 1, firstSourcePdf.GetNumberOfPages());
//Add pages from the second pdf document
PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(SRC2));
merger.Merge(secondSourcePdf, 1, secondSourcePdf.GetNumberOfPages());
firstSourcePdf.Close();
secondSourcePdf.Close();
pdf.Close();

We create a PdfDocument to create a new PDF (line 2). The PdfMerger class is new. It's a class that will make it easier for us to reuse pages from existing documents (line 3). Just like before, we create a PdfDocument for the source file (line 5, line 8); we then add all the pages using the merger instance and the 'merge()' method (line 6, line 9). Once we're done adding pages, weclose() (line 10-12).

We don't need to add all the pages if we don't want to. We can easily add only a limited selection of pages. See for instance the 88th_Oscar_CombineXofY example.

//Initialize PDF document with output intent
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
PdfMerger merger = new PdfMerger(pdf);
//Add pages from the first document
PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(SRC1));
merger.Merge(firstSourcePdf, iText.IO.Util.JavaUtil.ArraysAsList(1, 5, 7, 1));
//Add pages from the second pdf document
PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(SRC2));
merger.Merge(secondSourcePdf, iText.IO.Util.JavaUtil.ArraysAsList(1, 15));
firstSourcePdf.Close();
secondSourcePdf.Close();
pdf.Close()

Now the resulting document only has six pages. Pages 1, 5, 7, 1 from the first document (the first page is repeated), and pages 1 and 15 from the second document. PdfMerger is a convenience class that makes merging documents a no-brainer. In some cases however, you'll want to add pages one by one.

Adding pages to a PdfDocument

Figure 6.8 shows the result of the merging of specific pages based on a Table of Contents (TOC) that we'll create on the fly. This TOC contains link annotations that allow you to jump to a specific page if you click an entry of the TOC.

Figure 6.8: Merging documents based on a TOC

Figure 6.8: Merging documents based on a TOC

The 88th_Oscar_Combine_AddTOC example is more complex than the two previous examples. Let's examine it step by step.

Suppose that we have a TreeMap of all the categories the move "The Revenant" was nominated for, where the key is the nomination and the value is the page number of the document where the nomination is mentioned.

public static readonly IDictionary<String, int> TheRevenantNominations = new SortedDictionary<String, int
        >();
    static C06E06_88th_Oscar_Combine_AddTOC() {
        TheRevenantNominations["Performance by an actor in a leading role"] = 4;
        TheRevenantNominations["Performance by an actor in a supporting role"] = 4;
        TheRevenantNominations["Achievement in cinematography"] = 4;
        TheRevenantNominations["Achievement in costume design"] = 5;
        TheRevenantNominations["Achievement in directing"] = 5;
        TheRevenantNominations["Achievement in film editing"] = 6;
        TheRevenantNominations["Achievement in makeup and hairstyling"] = 7;
        TheRevenantNominations["Best motion picture of the year"] = 8;
        TheRevenantNominations["Achievement in production design"] = 8;
        TheRevenantNominations["Achievement in sound editing"] = 9;
        TheRevenantNominations["Achievement in sound mixing"] = 9;
        TheRevenantNominations["Achievement in visual effects"] = 10;
    }

The first lines of the code that creates the PDF are pretty simple.

PdfDocument pdfDoc = new PdfDocument(new PdfWriter(dest));
Document document = new Document(pdfDoc);
document.Add(new Paragraph(new Text("The Revenant nominations list"))
    .SetTextAlignment(TextAlignment.CENTER));

But we need to take a really close look once we start to loop over the entries in the TreeMap.

PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(SRC1));
foreach (KeyValuePair<String, int> entry in TheRevenantNominations) {
    //Copy page
    PdfPage page = firstSourcePdf.GetPage(entry.Value).CopyTo(pdfDoc);
    pdfDoc.AddPage(page);
    //Overwrite page number
    Text text = new Text(String.Format("Page %d", pdfDoc.GetNumberOfPages() - 1));
    text.SetBackgroundColor(Color.WHITE);
    document.Add(new Paragraph(text).SetFixedPosition(pdfDoc.GetNumberOfPages(), 549, 742, 100));
    //Add destination
    String destinationKey = "p" + (pdfDoc.GetNumberOfPages() - 1);
    PdfArray destinationArray = new PdfArray();
    destinationArray.Add(page.GetPdfObject());
    destinationArray.Add(PdfName.XYZ);
    destinationArray.Add(new PdfNumber(0));
    destinationArray.Add(new PdfNumber(page.GetMediaBox().GetHeight()));
    destinationArray.Add(new PdfNumber(1));
    pdfDoc.AddNamedDestination(destinationKey, destinationArray);
    //Add TOC line with bookmark
    Paragraph p = new Paragraph();
    p.AddTabStops(new TabStop(540, TabAlignment.RIGHT, new DottedLine()));
    p.Add(entry.Key);
    p.Add(new Tab());
    p.Add((pdfDoc.GetNumberOfPages() - 1).ToString());
    p.SetProperty(Property.ACTION, PdfAction.CreateGoTo(destinationKey));
    document.Add(p);
}
firstSourcePdf.Close();

Here we go:

  • Line 1: we create a PdfDocument with the source file containing all the info about all the nominations.

  • Line 2: we loop over an alphabetic list of the nominations for "The Revenant".

  • Line 4-5: we get the page that corresponds with the nomination, and we add a copy to the PdfDocument.

  • Line 7: we create an iText Text element containing the page number. We subtract 1 from that page number, because the first page in our document is the unnumbered page containing the TOC.

  • Line 8: we set the background color to Color.WHITE. This will cause an opaque white rectangle to be drawn with the same size of the Text. We do this to cover the original page number.

  • Line 9: we add this text at a fixed position on the the current page in the PdfDocument. The fixed position is: X = 549, Y = 742, and the width of the text is 100 user units.

  • Line 11: we create a key we'll use to name the destination.

  • Line 12-17: we create a PdfArray containing information about the destination. We'll refer to the page we've just added (line 15), we'll define the destination using an X,Y coordinate and a zoom factor (line 16), we add the values of X (line 17), Y (line 18), and the zoom factor (line 19).

  • Line 18: we add the named destination to the PdfDocument.

  • Line 20: we create an empty Paragraph.

  • Line 21: we add a tab stop at position X = 540, we define that the tab needs to be right aligned, and the space preceding the tab needs to be a DottedLine.

  • Line 22: we add the nomination to the Paragraph.

  • Line 23: we introduce a Tab.

  • Line 24: we add the page number minus 1 (because the page with the TOC is page 0).

  • Line 25: we add an action that will be triggered when someone clicks on the Paragraph.

  • Line 26: we add the Paragraph to the document.

  • Line 28: we close the source document.

We've been introducing a lot of new functionality that really requires a more in-depth tutorial, but we're looking at this example for one main reason: to show that there's a significant difference between the PdfDocument object, to which a new page is added with every pass through the loop, and the Document object, to which we keep adding Paragraph objects on the first page.

Let's go through some of these steps one more time to add the checklist.

//Add the last page
PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(SRC2));
PdfPage page_1 = secondSourcePdf.GetPage(1).CopyTo(pdfDoc);
pdfDoc.AddPage(page_1);
//Add destination
PdfArray destinationArray_1 = new PdfArray();
destinationArray_1.Add(page_1.GetPdfObject());
destinationArray_1.Add(PdfName.XYZ);
destinationArray_1.Add(new PdfNumber(0));
destinationArray_1.Add(new PdfNumber(page_1.GetMediaBox().GetHeight()));
destinationArray_1.Add(new PdfNumber(1));
pdfDoc.AddNamedDestination("checklist", destinationArray_1);
//Add TOC line with bookmark
Paragraph p_1 = new Paragraph();
p_1.AddTabStops(new TabStop(540, TabAlignment.RIGHT, new DottedLine()));
p_1.Add("Oscars\u00ae 2016 Movie Checklist");
p_1.Add(new Tab());
p_1.Add((pdfDoc.GetNumberOfPages() - 1).ToString());
p_1.SetProperty(Property.ACTION, PdfAction.CreateGoTo("checklist"));
document.Add(p_1);
secondSourcePdf.Close();
// close the document
document.Close();

This code snippet adds the check list with the overview of all the nominations. An extra line saying "Oscars® 2016 Movie Checklist" is added to the TOC.

This example introduces a couple of new concepts for educational purposes. It shouldn't be used in a real-world application, because it contains a major flaw. We make the assumption that the TOC will consist of only one page. Suppose that we added more lines to the document object, then you would see a strange phenomenon: the text that doesn't fit on the first page, would be added on the second page. This second page wouldn't be a new page, it would be the first page that we added in the loop. In other words: the content of the first imported page would be overwritten. This is a problem that can be fixed, but it's outside the scope of this short introductory tutorial.

We'll finish this chapter with some examples in which we merge forms.

Merging forms

Merging forms is special. In HTML, it's possible to have more than one form in a single HTML file. That's not the case for PDF. In a PDF file, there can be only one form. If you want to merge two forms and you want to preserve the forms, you need to use a special method and a special IPdfPageExtraCopier implementation.

Figure 6.9 shows the combination of two different forms, subscribe.pdf and state.pdf

Figure 6.9: merging two different forms

Figure 6.9: merging two different forms

The Combine_Forms example is different from what we had before.

PdfDocument destPdfDocument = new PdfDocument(new PdfWriter(dest));
PdfDocument[] sources = new PdfDocument[] { new PdfDocument(new PdfReader(SRC1)), new PdfDocument(new PdfReader
    (SRC2)) };
PdfPageFormCopier formCopier = new PdfPageFormCopier();
foreach (PdfDocument sourcePdfDocument in sources) {
    sourcePdfDocument.CopyPagesTo(1, sourcePdfDocument.GetNumberOfPages(), destPdfDocument, formCopier);
    sourcePdfDocument.Close();
}
destPdfDocument.Close();

In this code snippet, we use the copyPageTo() method. The first two parameters define the from/to range for the pages of the source document. The third parameter defines the destination document. The fourth parameter indicates that we are copying forms and that the two different forms in the two different documents should be merged into a single form. PdfPageFormCopier is an implementation of the IPdfPageExtraCopier interface that makes sure that the two different forms are merged into one single form.

Merging two forms isn't always trivial, because the name of each field needs to be unique. Suppose that we would merge the same form twice. Then we would have two widget annotations for each field. A field with a specific name, for instance "name", can be visualized using different widget annotations, but it can only have one value. Suppose that you would have a widget annotation for the field "name" on page one, and a widget annotation for the same field on page two, then changing the value shown in the widget annotation on one page would automatically also change the value shown in the widget annotations on the other page.

In the next example, we are going to fill out and merge the same form, state.pdf, as many times as there are entries in the CSV file united_states.csv; see Figure 6.10.

Figure 6.10: Merging identical forms

Figure 6.10: Merging identical forms

If we'd keep the names of the fields the way they are in the original form, changing the value of the state "ALABAMA" into "CALIFORNIA", would also change the name "ALASKA" on the second page, and the name of all the other states on the other pages. We made sure that this doesn't happen by renaming all the fields before merging the forms.

Let's take a look at the FillOutAndMergeForms example.

PdfDocument pdfDocument = new PdfDocument(new PdfWriter(dest));
PdfPageFormCopier formCopier = new PdfPageFormCopier();
StreamReader sr = File.OpenText(DATA);
String line;
bool headerLine = true;
int i = 1;
while ((line = sr.ReadLine()) != null) {
    if (headerLine) {
        headerLine = false;
        continue;
    }
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    PdfDocument sourcePdfDocument = new PdfDocument(new PdfReader(SRC), new PdfWriter(baos));
    //Rename fields
    i++;
    PdfAcroForm form = PdfAcroForm.GetAcroForm(sourcePdfDocument, true);
    form.RenameField("name", "name_" + i);
    //Removed repeated lines ...
    form.RenameField("dst", "dst_" + i);
    //Fill out fields
    StringTokenizer tokenizer = new StringTokenizer(line, ";");
    IDictionary<String, PdfFormField> fields = form.GetFormFields();
    PdfFormField toSet;
    fields.TryGetValue("name_" + i, out toSet);
    toSet.SetValue(tokenizer.NextToken());
    //Removed repeated lines
    fields.TryGetValue("dst_" + i, out toSet);
    toSet.SetValue(tokenizer.NextToken());
    sourcePdfDocument.Close();
    sourcePdfDocument = new PdfDocument(new PdfReader(new MemoryStream(baos.ToArray())));
    //Copy pages
    sourcePdfDocument.CopyPagesTo(1, sourcePdfDocument.GetNumberOfPages(), pdfDocument, formCopier);
    sourcePdfDocument.Close();
}
sr.Close();
pdfDocument.Close();

Let's start by looking at the code inside the while loop. We're looping over the different states of the USA stored in a CSV file (line 7). We skip the first line that contains the information for the column headers (line 8-10). The next couple of lines are interesting. So far, we've always been writing PDF files to disk. In this example, we are creating PDF files in memory using a ByteArrayOutputStream (line 12-13).

As mentioned before, we start by renaming all the fields. We get the PdfAcroForm instance (line 16) and we use the renameField() method to rename fields such as "name" to "name_1""name_2", and so on. Note that we've skipped some lines for brevity in the code snippet. Once we've renamed all the fields, we set their value (line 22-28).

When we close the sourcePdfDocument (line 29), we have a complete PDF file in memory. We create a new sourcePdfDocument using a ByteArrayInputStream created with that file in memory (line 31). We can now copy the pages of that new sourcePdfDocument to our destination pdfDocument.

This is a rather artificial example, but it's a good example to explain some of the usual pitfalls when merging forms:

  • Without the PdfPageFormCopier, the forms won't be correctly merged.

  • One field can only have one value, no matter how many times that field is visualized using a widget annotation.

A more common use case, is to fill out and flatten the same form multiple times in memory, simultaneously merging all the resulting documents in one PDF.

Merging flattened forms

Figure 6.11 shows two PDF documents that were the result of the same procedure: we filled out a form in memory as many times as there are states in the USA. We flattened these filled out forms, and we merged them into one single document.

Figure 6.11: Filling, flattening and merging forms

Figure 6.11: Filling, flattening and merging forms

From the outside, these documents look identical, but if we look at their file size in Figure 12, we see a huge difference.

Figure 6.12: difference in file size depending on how documents are merged

Figure 6.12: difference in file size depending on how documents are merged

What is causing this difference in file size? We need to take a look at the FillOutFlattenAndMergeForms example to find out.

PdfDocument destPdfDocument = new PdfDocument(new PdfWriter(dest1));
//Smart mode
PdfDocument destPdfDocumentSmartMode = new PdfDocument(new PdfWriter(dest2).SetSmartMode(true));
StreamReader sr = File.OpenText(DATA);
String line;
bool headerLine = true;
int i = 0;
while ((line = sr.ReadLine()) != null) {
    if (headerLine) {
        headerLine = false;
        continue;
    }
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    PdfDocument sourcePdfDocument = new PdfDocument(new PdfReader(SRC), new PdfWriter(baos));
    //Read fields
    PdfAcroForm form = PdfAcroForm.GetAcroForm(sourcePdfDocument, true);
    StringTokenizer tokenizer = new StringTokenizer(line, ";");
    IDictionary<String, PdfFormField> fields = form.GetFormFields();
    //Fill out fields
    PdfFormField toSet;
    fields.TryGetValue("name", out toSet);
    toSet.SetValue(tokenizer.NextToken());
    //Removed repeated lines...
    fields.TryGetValue("dst", out toSet);
    toSet.SetValue(tokenizer.NextToken());
    //Flatten fields
    form.FlattenFields();
    sourcePdfDocument.Close();
    sourcePdfDocument = new PdfDocument(new PdfReader(new MemoryStream(baos.ToArray())));
    //Copy pages
    sourcePdfDocument.CopyPagesTo(1, sourcePdfDocument.GetNumberOfPages(), destPdfDocument, null);
    sourcePdfDocument.CopyPagesTo(1, sourcePdfDocument.GetNumberOfPages(), destPdfDocumentSmartMode, null);
    sourcePdfDocument.Close();
}
sr.Close();
destPdfDocument.Close();
destPdfDocumentSmartMode.Close();

In this code snippet, we create two documents simultaneously:

  • The destPdfDocument instance (line 1) is created the same way we've been creating PdfDocument instances all along.

  • The destPdfDocumentSmartMode instance (line 3) is also created that way, but we've turned on the smart mode.

We loop over the lines of the CSV file like we did before (line 8), but since we're going to flatten the forms, we no longer have to rename the fields. The fields will be lost due to the flattening process anyway. We create a new PDF document in memory (line 13-14) and we fill out the fields (line 16-25). We flatten the fields (line 27) and close the document created in memory (line 26). We use the file created in memory to create a new source file. We add all the pages of this source file to the two PdfDocumentinstances, one working in normal mode, the other in smart mode. We no longer need to use a PdfPageFormCopier instance, because the forms have been flattened; they are no longer forms.

What is the difference between these normal and smart mode?

  • When we copy the pages of the filled out forms to the PdfDocument working in normal mode, the PdfDocumentprocesses each document as if it's totally unrelated to the other documents that are being added. In this case, the resulting document will be bloated, because the documents are related: they all share the same template. That template is added to the PDF document as many times as there are states in the USA. In this case, the result is a file of about 12 MBytes.

  • When we copy the pages of the filled out forms to the PdfDocument working in smart mode, the PdfDocument will take the time to compare the resources of each document. If two separate documents share the same resources (e.g. a template), then that resource is copied to the new file only once. In this case, the result can be limited to 365 KBytes.

Both the 12 MBytes and the 365 KBytes files look exactly the same when opened in a PDF viewer or when printed, but it goes without saying that the 365 KBytes files is to be preferred over the 12 MBytes file.

Summary

In this chapter, we've been scaling, tiling, N-upping one file with a different file as result. We've also assembled files in many different ways. We discovered that there are quite some pitfalls when merging interactive forms. Much more remains to be said about reusing content from existing PDF documents.

In the next chapter, we'll discuss PDF documents that comply to special PDF standards such as PDF/UA and PDF/A. We'll discover that merging PDF/A documents also requires some special attention.

Subtitle
eBook

Chapter 5: Manipulating an existing PDF document

In the examples for chapter 1 to 3, we've always created a new PDF document from scratch with iText. In the last couple of examples of chapter 4, we worked with an existing PDF document. We took an existing interactive PDF form and filled it out, either resulting in a pre-filled form, or resulting in a flattened document that was no longer interactive. In this example, we'll continue working with existing PDFs. We'll load an existing file using PdfReader and we'll use the reader object to create a new PdfDocument.

Adding annotations and content

In the previous chapter, we took an existing PDF form, job_application.pdf, and we filled out the fields. In this chapter, we'll take it a step further. We'll start by adding a text annotation, some text, and a new check box. This is shown in Figure 5.1.

Figure 5.1: an updated form

Figure 5.1: an updated form

We'll repeat the code we've seen in the previous chapter in the AddAnnotationsAndContent example.

PdfDocument pdfDoc = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
// add content
pdfDoc.close();

Where it says // add content, we'll add the annotation, the extra text, and the extra check box.

Just like in chapter 4, we add the annotation to a page obtained from the PdfDocument instance:

//Add text annotation
PdfAnnotation ann = new PdfTextAnnotation(new Rectangle(400, 795, 0, 0))
    .SetTitle(new PdfString("iText"))
    .SetContents("Please, fill out the form.")
    .SetOpen(true);
pdfDoc.GetFirstPage().AddAnnotation(ann);

If we want to add content to a content stream, we need to create a PdfCanvas object. We can do this using a PdfPage object as a parameter for the PdfCanvas constructor:

PdfCanvas canvas = new PdfCanvas(pdfDoc.GetFirstPage());
canvas.BeginText()
    .SetFontAndSize(PdfFontFactory.CreateFont(FontConstants.HELVETICA), 12)
    .MoveText(265, 597)
    .ShowText("I agree to the terms and conditions.")
    .EndText();

The code to add the text is similar to what we did in chapter 2. Whether you're creating a document from scratch, or adding content to an existing document, has no impact on the instructions we use. The same goes for adding fields to a PdfAcroForminstance:

//Add form field
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdfDoc, true);
PdfButtonFormField checkField = PdfFormField.CreateCheckBox(pdfDoc, new Rectangle(245, 594, 15, 15), "agreement", "Off",PdfFormField.TYPE_CHECK);
checkField.SetRequired(true);
form.AddField(checkField);

Now that we've added an extra field, we might want to change the reset action:

//Update reset button
form.GetField("reset").SetAction(PdfAction.CreateResetForm(new String[] { "name", "language", "experience1", "experience2", "experience3", "shift", "info", "agreement" }, 0));
pdfDoc.Close();

Let's see if we can also change some of the visual aspects of the form fields.

Changing the properties of form fields

In the FillAndModifyForm example, we return to the FillForm example from chapter 4, but instead of merely filling out the form, we also change the properties of the fields:

PdfAcroForm form = PdfAcroForm.GetAcroForm(pdfDoc, true);
IDictionary<String, PdfFormField> fields = form.GetFormFields();
PdfFormField toSet;
fields.TryGetValue("name", out toSet);
toSet.SetValue("James Bond").SetBackgroundColor(Color.ORANGE);
fields.TryGetValue("experience1", out toSet);
toSet.SetValue("Yes");
fields.TryGetValue("experience2", out toSet);
toSet.SetValue("Yes");
fields.TryGetValue("experience3", out toSet);
toSet.SetValue("Yes");
IList<PdfObject> options = new List<PdfObject>();
options.Add(new PdfString("Any"));
options.Add(new PdfString("8.30 am - 12.30 pm"));
options.Add(new PdfString("12.30 pm - 4.30 pm"));
options.Add(new PdfString("4.30 pm - 8.30 pm"));
options.Add(new PdfString("8.30 pm - 12.30 am"));
options.Add(new PdfString("12.30 am - 4.30 am"));
options.Add(new PdfString("4.30 am - 8.30 am"));
PdfArray arr = new PdfArray(options);
fields.TryGetValue("shift", out toSet);
toSet.SetOptions(arr);
toSet.SetValue("Any");
PdfFont courier = PdfFontFactory.CreateFont(FontConstants.COURIER);
fields.TryGetValue("info", out toSet);
toSet.SetValue("I was 38 years old when I became an MI6 agent.", courier, 7f);
pdfDoc.Close();

Please take a closer look at the following lines:

  • line 3: we set the value of the "name" field to "James Bond", but we also change the background color to Color.ORANGE.

  • line 8-17: we create a Java List containing more options than the form originally contained (line 8-15). We convert this List to a PdfArray (line 16) and we use this array to update the options of the "shift" field (line 17).

  • line 19-21: we create a new PdfFont and we use this font and a new font size as extra parameters when we set the value of the "info" field.

Let's take a look at Figure 5.2 to see if our changes were applied.

Figure 5.2: updated form with highlighted fields

Figure 5.2: updated form with highlighted fields

We see that the "shift" field now has more options, but we don't see the background color of the "name" field. It's also not clear if the font of the "info" field has changed. What's wrong? Nothing is wrong, the fields are currently highlighted and the blue highlighting covers the background color. Let's click "Highlight Existing Fields" and see what happens.

Figure 5.3: updated form, no highlighting

Figure 5.3: updated form, no highlighting

Now Figure 5.3 looks exactly the way we expected. We wouldn't have had this problem if we had added form.flattenFields(); right before closing the PdfDocument, but in that case, we would no longer have a form either. We'll make some more forms examples in the next chapter, but for now, let's see what we can do with existing documents that don't contain a form.

Adding a header, footer, and watermark

Do you remember the report of the UFO sightings in the 20th century we created in chapter 3? We'll use a similar report for the next couple of examples: ufo.pdf, see Figure 5.4.

Figure 5.4: UFO sightings report

Figure 5.4: UFO sightings report

As you can see, it's not so fancy as the report we made in chapter 3. What if we'd like to add a header, a watermark and a footer saying "page X of Y" to this existing report? Figure 5.5 shows what such a report would look like.

Figure 5.5: UFO sightings report with header, footer, and watermark

Figure 5.5: UFO sightings report with header, footer, and watermark

In Figure 5.5, we zoom in on an advantage that we didn't have when we added the page numbers in chapter 3. In chapter 3, we didn't know the total number of pages at the moment we were adding the footer, hence we only added the current page number. Now that we have an existing document, we can add "1 of 4", "2 of 4", and so on.

When creating a document from scratch, it's possible to create a placeholder for the total number of pages. Once all the pages are created, we can then add the total number of pages to that placeholder, but that's outside the scope of this introductory tutorial.

The AddContent example shows how we can add content to every page in an existing document.

//Initialize PDF document
PdfDocument pdfDoc = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
Document document = new Document(pdfDoc);
Rectangle pageSize;
PdfCanvas canvas;
int n = pdfDoc.GetNumberOfPages();
for (int i = 1; i <= n; i++) {
    PdfPage page = pdfDoc.GetPage(i);
    pageSize = page.GetPageSize();
    canvas = new PdfCanvas(page);
    //Draw header text
    }
pdfDoc.close();

We use the pdfDoc object to create a Document instance. We'll use that document object to add some content. We also use the pdfDoc object to find the number of pages in the original PDF. We loop over all the pages, and we get the PdfPage object of each page. Let's take a look at the // add new content part we omitted.

//Draw header text
canvas.BeginText()
    .SetFontAndSize(PdfFontFactory.CreateFont(FontConstants.HELVETICA), 7)
    .MoveText(pageSize.GetWidth() / 2 - 24, pageSize.GetHeight() - 10)
    .ShowText("I want to believe")
    .EndText();
//Draw footer line
canvas.SetStrokeColor(Color.BLACK)
    .SetLineWidth(.2f)
    .MoveTo(pageSize.GetWidth() / 2 - 30, 20)
    .LineTo(pageSize.GetWidth() / 2 + 30, 20)
    .Stroke();
//Draw page number
canvas.BeginText()
    .SetFontAndSize(PdfFontFactory.CreateFont(FontConstants.HELVETICA), 7)
    .MoveText(pageSize.GetWidth() / 2 - 7, 10)
    .ShowText(i.ToString())
    .ShowText(" of ")
    .ShowText(n.ToString())
    .EndText();
//Draw watermark
Paragraph p = new Paragraph("CONFIDENTIAL").SetFontSize(60);
canvas.SaveState();
PdfExtGState gs1 = new PdfExtGState().SetFillOpacity(0.2f);
canvas.SetExtGState(gs1);
document.ShowTextAligned(p, pageSize.GetWidth() / 2, pageSize.GetHeight() / 2, pdfDoc.GetPageNumber(page), TextAlignment.CENTER, VerticalAlignment.MIDDLE, 45);
canvas.RestoreState();

We are adding four parts of content:

  1. A header (line 2-6): we use low-level text functionality to add "I want to believe" at the top of the page.

  2. A footer line (line 8-11): we use low-level graphics functionality to draw a line at the bottom of the page.

  3. A footer with the page number (13-19): we use low-level text functionality to add the page number, followed by " of ", followed by the total number of pages at the bottom of the page.

  4. A watermark (lin 21-28): we create a Paragraph with the text we want to add as a watermark. Then we change the opacity of the canvas. Finally we add the Paragraph to the document, centered in the middle of the page and with an angle of 45 degrees, using the showTextAligned() method.

We're doing something special when we add the watermark. We're changing the graphics state of the canvas object obtained from the page. Then we add text to the corresponding page in the document. Internally, iText will detect that we're already using the PdfCanvas instance of that page and the showTextAligned() method will write to that same canvas. This way, we can use a mix of low-level and convenience methods.

In the final example of this chapter, we'll change the page size and orientation of the pages of our UFO sightings report.

Changing the page size and orientation

If we take a look at Figure 5.6, we see our original report from Figure 5.4, but the pages are bigger and the second page has been turned up-side down.

Figure 5.6: changed page size and orientation

Figure 5.6: changed page size and orientation

The ChangePage example shows how this was done.

/Initialize PDF document
PdfDocument pdfDoc = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
float margin = 72;
for (int i = 1; i <= pdfDoc.GetNumberOfPages(); i++) {
    PdfPage page = pdfDoc.GetPage(i);
    // change page size
    Rectangle mediaBox = page.GetMediaBox();
    Rectangle newMediaBox = new Rectangle(mediaBox.GetLeft() - margin, mediaBox.GetBottom() - margin, mediaBox.GetWidth() + margin * 2, mediaBox.GetHeight() + margin * 2);
    page.SetMediaBox(newMediaBox);
    // add border
    PdfCanvas over = new PdfCanvas(page);
    over.SetStrokeColor(Color.GRAY);
    over.Rectangle(mediaBox.GetLeft(), mediaBox.GetBottom(), mediaBox.GetWidth(), mediaBox.GetHeight());
    over.Stroke();
    // change rotation of the even pages
    if (i % 2 == 0) {
        page.SetRotation(180);
    }
}
pdfDoc.Close();

No need for a Document instance here, we work with the PdfDocument instance only. We loop over all the pages (line 4) and get the PdfPage instance of each page (line 5).

  • A page can have different page boundaries, one of which isn't optional: the /MediaBox. We get the value of this page boundary as a Rectangle (line 7) and we create a new Rectangle that is an inch larger on each side (line 8-10). We use the setMediaBox() method to change the page size.

  • We create a PdfCanvas object for the page (line 13), and we stroke a gray line using the dimensions of the original mediaBox (line 14-17).

  • For every even page (line 19), we set the page rotation to 180 degrees.

Manipulating an existing PDF document requires some knowledge about PDF. For instance: you need to know the concept of the /MediaBox. We have tried to keep the examples simple, but that also means that we've cut some corners. For instance: in our last example, we didn't bother to check if a /CropBox was defined. If the original PDF had a /CropBox, enlarging the /MediaBox wouldn't have had any visual effect. We'll need a more in-depth tutorial to cover topics like these.

Summary

In the previous chapter, we learned about interactive PDF forms. In this chapter, we continued working with these forms. We added an annotation, some text, and an extra field to an existing form. We also changed some properties while filling out a form.

We then moved on to PDFs without any interactivity. First, we added a header, a footer, and a watermark. Then, we played with the size and the orientation of the pages of an existing document.

In the next chapter, we'll scale and tile existing documents, and we'll discover how to assemble multiple documents into a single PDF.

Subtitle
eBook

Chapter 4: Making a PDF interactive

In the previous chapters, we’ve created PDF documents by adding content to a page. It didn’t matter if we were adding high-level objects (e.g. a 'Paragraph') or low-level instructions (e.g. 'LineTo()', 'MoveTo()', 'Stroke'), iText converted everything to PDF syntax that was written to one or more content streams. In this chapter, we’ll add content of a different nature. We’ll add interactive features, known as annotations. Annotations aren’t part of the content stream. They are usually added on top of the existing content. There are many different types of annotations, many of which allow user interaction.

Adding annotations

We’ll start with a series of simple examples. Figure 4.1 shows a PDF with a 'Paragraph' of text. On top of the text, we’ve added a green text annotation.

Figure 4.1: a text annotation

Figure 4.1: a text annotation

Most of the code of the TextAnnotation example is identical to the Hello World example. The only difference is that we create and add an annotation:

//Create text annotation
PdfAnnotation ann = new PdfTextAnnotation(new Rectangle(20, 800, 0, 0))
    .SetColor(Color.GREEN)
    .SetTitle(new PdfString("iText")).SetContents("With iText, you can truly take your documentation needs to the next level.")
    .SetOpen(true);
pdf.GetFirstPage().AddAnnotation(ann);

We define the location of the text annotation using a 'Rectangle'. We set the color, title (a 'PdfString'), contents (a 'String'), and the open status of the annotation. We ask the 'PdfDocument' for its first page and add the annotation.

In Figure 4.2, we created an annotation that is invisible, but that shows an URL if you hover over its location. You can open that URL by clicking the annotation. This is a link annotation.

Figure 4.2: a link annotation

Figure 4.2: a link annotation

As the annotation is part of a sentence, it wouldn't be convenient if we had to calculate the position of the word "here". Fortunately, we can wrap the link annotation in a Link object and iText will calculate the Rectangle automatically. The LinkAnnotation example shows how it's done.

/Create link annotation
PdfLinkAnnotation annotation = ((PdfLinkAnnotation)new PdfLinkAnnotation(new Rectangle(0, 0))
    .SetAction(PdfAction
    .CreateURI("http://itextpdf.com/")));
Link link = new Link("here", annotation);
Paragraph p = new Paragraph("The example of link annotation. Click ").Add(link.SetUnderline()).Add(" to learn more...");
document.Add(p);
​

In line 2, we create a URI action that opens the iText web site. We use this action for the link annotation. We then create a 'Link' object. This is a basic building block that accepts a link annotation as parameter. This link annotation won’t be added to the content stream, because annotations aren’t part of the content stream. Instead it will be added to the corresponding page at the corresponding coordinates. Making text clickable doesn’t change the appearance of that text in the content stream. In our example, we underlined the word “here” so that we know where to click.

Every type of annotation requires its own type of parameters. Figure 4.3 shows a page with a line annotation.

Figure 4.3: a line annotation

Figure 4.3: a line annotation

The LineAnnotation shows what is needed to create this appearance.

PdfPage page = pdf.AddNewPage();
PdfArray lineEndings = new PdfArray();
lineEndings.Add(new PdfName("Diamond"));
lineEndings.Add(new PdfName("Diamond"));
//Create line annotation with inside caption
PdfAnnotation annotation = new PdfLineAnnotation(new Rectangle(0, 0), new float[] { 20, 790, page.GetPageSize().GetWidth() - 20, 790 })
    .SetLineEndingStyles((lineEndings))
    .SetContentsAsCaption(true)
    .SetTitle(new PdfString("iText"))
    .SetContents("The example of line annotation")
    .SetColor(Color.BLUE);
page.AddAnnotation(annotation);
//Close document
pdf.Close();

In this example, we add the annotation to a newly created page. There’s no 'Document' instance involved in this example.

ISO-32000-2 defines 28 different annotation types, two of which are deprecated in PDF 2.0. With iText, you can add all of these annotation types to a PDF document, but in the context of this tutorial,we’ll only look at one more example before we move on to interactive forms. See figure 4.4.

Figure 4.4: a markup annotation

Figure 4.4: a markup annotation

Looking at the TextMarkupAnnotation example, we see that we really need a separate tutorial to understand what all the nuts and bolts used in this code snippet are about.

//Create text markup annotation
PdfAnnotation ann = PdfTextMarkupAnnotation.CreateHighLight(new Rectangle(105, 790, 64, 10), new float[] {169, 790, 105, 790, 169, 800, 105, 800 })
    .SetColor(Color.YELLOW)
    .SetTitle(new PdfString("Hello!"))
    .SetContents(new PdfString("I'm a popup."))
    .SetTitle(new PdfString("iText"))
    .SetOpen(true)
    .SetRectangle(new PdfArray(new float[] { 100, 600, 200, 100 }));
pdf.GetFirstPage().AddAnnotation(ann);

In the next section, we’ll create an interactive form consisting of different form fields. Each form field in that form will correspond with a widget annotation, but those annotations will be created implicitly.

Creating an interactive form

In the next example, we’re going to create an interactive form based on AcroForm technology. This technology was introduced in PDF 1.2 (1996) and allows you to populate a PDF document with form fields such as text fields, choices (combo box or list field), buttons (push buttons, check boxes and radio buttons), and signature fields.

It’s tempting to compare a PDF form with a form in HTML, but that would be wrong. When text doesn’t fit into the available text area of an HTML form, that field can be resized. The content of a list field can be updated on the fly based on a query to the server. In short, an HTML form can be very dynamic.

That isn’t true for interactive forms based on AcroForm technology. Such a form can best be compared with a paper form where every field has its fixed place and its fixed size. The idea of using PDF forms for collecting user data in a web browser has been abandoned over the years. HTML forms are much more user friendly for online data collection.

That doesn’t mean that AcroForm technology has become useless. Interactive PDF forms are very common in two specific use cases:

  • When the form is the equivalent of digital paper. In some cases, there are strict formal requirements with respect to a form. It is important that the digital document is an exact replica of the corresponding form. Every form that is filled out needs to comply to the exact same formal requirements. If this is the case, then it’s better to use PDF forms than HTML forms.

  • When the form isn’t used for data collection, but as a template. For example: you have a form that represents a voucher or an entry ticket for an event. On this form, you have different fields for the name of the person who bought the ticket, the date and the time of the event, the row and the seat number, and so on. When people buy a ticket, you don’t need to regenerate the complete voucher, you can take the form and simply fill it out with the appropriate data.

In both use cases, the form will be created manually, for instance using Adobe software, LibreOffice, or any other tool with a graphical user interface.

You could also create such a form programmatically, but there are very few use cases that would justify using a software library to create a form or a template, instead of using a tool with a GUI. Nevertheless, we’re going to give it a try.

Figure 4.5: an interactive form

Figure 4.5: an interactive form

In Figure 4.5, we see text fields, radio buttons, check boxes, a combo box, a multi-line text field, and a push button. We see these fields because they are represented by a widget annotation. This widget annotation is created implicitly when we create a field. In the JobApplication example, we create a 'PdfAcroForm' object, using the 'PdfDocument' instance obtained from the 'Document' object. The second parameter is a 'Boolean' indicating if a new form needs to be created if there is no existing form. As we've just created the 'Document', there is no form present yet, so that parameter should be true:

PdfAcroForm form = PdfAcroForm.GetAcroForm(doc.GetPdfDocument(), true);

Now we can start adding fields. We’ll use a Rectangle to define the dimension of each widget annotation and its position on the page.

Text field

We’ll start with the text field that will be used for the full name.

//Create text field
PdfTextFormField nameField = PdfTextFormField.CreateText(doc.GetPdfDocument(), new Rectangle(99, 753, 425, 15), "name", "");
form.AddField(nameField);

The 'CreateText()' method needs a 'PdfDocument' instance, a 'Rectangle', the name of the field, and a default value (in this case, the default value is an empty 'String'). Note that the label of the field and the widget annotation are two different things. We’ve added “Full name:” using a 'Paragraph'. That 'Paragraph' is part of the content stream. The field itself doesn’t belong in the content stream. It’s represented using a widget annotation.

Radio buttons

We create a radio field for choosing a language. Note that there is one radio group named 'language' with five unnamed button fields, one for each language that can be chosen:

//Create radio buttons
PdfButtonFormField group = PdfFormField.CreateRadioGroup(doc.GetPdfDocument(), "language", "");
PdfFormField.CreateRadioButton(doc.GetPdfDocument(), new Rectangle(130, 728, 15, 15), group, "English");
PdfFormField.CreateRadioButton(doc.GetPdfDocument(), new Rectangle(200, 728, 15, 15), group, "French");
PdfFormField.CreateRadioButton(doc.GetPdfDocument(), new Rectangle(260, 728, 15, 15), group, "German");
PdfFormField.CreateRadioButton(doc.GetPdfDocument(), new Rectangle(330, 728, 15, 15), group, "Russian");
PdfFormField.CreateRadioButton(doc.GetPdfDocument(), new Rectangle(400, 728, 15, 15), group, "Spanish");
form.AddField(group);

Only one language can be selected at a time. If multiple options could apply, we should have used check boxes.

Check boxes

In the next snippet, we’ll introduce three check boxes, named 'experience0', 'experience1', 'experience2':

//Create checkboxes
for (int i = 0; i < 3; i++) {
    PdfButtonFormField checkField = PdfFormField.CreateCheckBox(doc.GetPdfDocument(), new Rectangle(119 + i * 69, 701, 15, 15), String.Concat("experience", (i + 1).ToString()), "Off", PdfFormField.TYPE_CHECK);
    form.AddField(checkField);
}

As you can see, we use the 'CreateCheckBox()' method with the following parameters: the 'PdfDocument' object, a 'Rectangle', the name of the field, the current value of the field, and the appearance of the check mark.

A check box has two possible values: the value of the off state must be '"Off"'; the value of the on state is usually '"Yes"' (it’s the value iText uses by default), but some freedom is allowed here.

It’s also possible to have people select one or more option from a list or a combo box. In PDF terminology, we call this a choice field.

Choice field

Choice fields can be configured in a way that people can select only one of the options, or several options. In our example, we create a combo box.

//Create combobox
String[] options = new String[] { "Any", "6.30 am - 2.30 pm", "1.30 pm - 9.30 pm" };
PdfChoiceFormField choiceField = PdfFormField.CreateComboBox(doc.GetPdfDocument(), new Rectangle(163, 676, 115, 15), "shift", "Any", options);
form.AddField(choiceField);

Our choice field is named '"shift"' and it offers three 'options' of which '"Any"' is selected by default.

Multi-line field

We also see a multi-line field in the form. As opposed to the regular text field, where you can only add text in a single line, text in this field will be wrapped if it doesn’t fit on a single line.

//Create multiline text field
PdfTextFormField infoField = PdfTextFormField.CreateMultilineText(doc.GetPdfDocument(), new Rectangle(158, 625, 366, 40), "info", "");
form.AddField(infoField);

We’ll conclude our form with a push button.

Push button

In a real-world example we’d use a submit button that allows people to submit the data they’ve entered in the form to a server. Such PDF forms have become rare since HTML evolved to HTML 5 and related technologies, introducing much more user-friendly functionality to fill out form. We conclude the example by adding a reset button that will reset a selection of fields to their initial value when the button is clicked.

//Create push button field
PdfButtonFormField button = PdfFormField.CreatePushButton(doc.GetPdfDocument(), new Rectangle(479, 594, 45, 15), "reset", "RESET");
button.SetAction(PdfAction.CreateResetForm(new String[] { "name", "language", "experience1", "experience2", "experience3", "shift", "info" },0));
form.AddField(button);

If you want to create a PDF form using iText, you now have a fair idea of how it’s done. In many cases, it’s a much better idea to create a form manually, using a tool with a graphical user interface. You are then going to use iText to fill out this form automatically, for instance using data from a database.

Filling out a form

When we created our form, we could have defined default values, so that the form was filled out as shown in Figure 4.6.

Figure 4.6: a filled-out interactive form

Figure 4.6: a filled-out interactive form

We can still add these values after we've created the form. The CreateAndFill example shows us how.

IDictionary<String, PdfFormField> fields = form.GetFormFields();
PdfFormField toSet;
fields.TryGetValue("name", out toSet);
toSet.SetValue("James Bond");
fields["name"].SetValue("007");
fields.TryGetValue("language", out toSet);
toSet.SetValue("English");
fields.TryGetValue("experience1", out toSet);
toSet.SetValue("Off");
fields.TryGetValue("experience2", out toSet);
toSet.SetValue("Yes");
fields.TryGetValue("experience3", out toSet);
toSet.SetValue("Yes");
fields.TryGetValue("shift", out toSet);
toSet.SetValue("Any");
fields.TryGetValue("info", out toSet);
toSet.SetValue("I was 38 years old when I became an MI6 agent.");
doc.Close();

We asked the 'PdfAcroForm' to which we’ve added all the form field for its fields, and we get an'IDictionary' consisting of key-value pairs with the names and 'PdfFormField' objects of each field. We can get the 'PdfFormField' instances one by one, and set their value. Granted, this doesn’t make much sense. It would probably have been smarter to set the correct value right away the moment you create each field. A more common use case is to pre-fill an existing form.

Pre-filling an existing form

In the next example, we'll take an existing form, job_application.pdf, get a PdfAcroForm object from that form, and use the very same code to fill out that existing document. See the FillForm example.

//Initialize PDF document
PdfDocument pdf = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdf, true);
IDictionary<String, PdfFormField> fields = form.GetFormFields();
PdfFormField toSet;
fields.TryGetValue("name", out toSet);
toSet.SetValue("James Bond");
fields.TryGetValue("language", out toSet);
toSet.SetValue("English");
fields.TryGetValue("experience1", out toSet);
toSet.SetValue("Off");
fields.TryGetValue("experience2", out toSet);
toSet.SetValue("Yes");
fields.TryGetValue("experience3", out toSet);
toSet.SetValue("Yes");
fields.TryGetValue("shift", out toSet);
toSet.SetValue("Any");
fields.TryGetValue("info", out toSet);
toSet.SetValue("I was 38 years old when I became an MI6 agent.");
pdf.Close();

We introduce a new object in line 2. 'PdfReader' is a class that allows iText to access a PDF file and read the different PDF objects stored in a PDF file. In this case, 'src' holds the path to an existing form.

I/O is handled by two classes in iText.

  • PdfReader is the input class;
  • PdfWriter is the output class.

In line 2, we create a PdfWriter that will write a new version of the source file. Line 1 and 2 are different from what we did before. We now create a PdfDocument object using the reader and the writer object as parameters. We obtain a PdfAcroForm instance using the same getAcroForm() method as before. Lines 4 to 11 are identical to the lines we used to fill out the values of the fields we created from scratch. When we close the PdfDocument (line 12), we have a PDF that is identical to the one shown in Figure 4.6.

The form is still interactive: people can still change values if they want to. iText has been used in many applications to pre-fill forms. For instance: when people log in into an online service, a lot of information (e.g. name, address, phone number) is already known about them on the server side. When they need to fill out a form online, it doesn't make much sense to present them a blank file where they have to fill out their name, address and phone number all over again. Plenty of time can be saved if these values are already present in the form. This can be achieved by pre-filling the form with iText. People can check if the information is correct and if it isn't (for instance because their phone number changed), they can still change the content of the field.

Sometimes you don't want an end user to change information on a PDF. For instance: if the form is a voucher with a specific date and time, you don't want the end user to change that date and time. In that case, you'll flatten the form.

Flattening a form

When we add a single line to the previous code snippet, we get a PDF that is no longer interactive. The bar with the message "This file includes fillable form fields" has disappeared in Figure 4.7. When you click the name "James Bond", you can no longer manually change it.

Figure 4.7: a flattened form

Figure 4.7: a flattened form

This extra line was added in the FlattenForm example.

//Initialize PDF document
PdfDocument pdf = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdf, true);
IDictionary<String, PdfFormField> fields = form.GetFormFields();
PdfFormField toSet;
fields.TryGetValue("name", out toSet);
toSet.SetValue("James Bond");
fields.TryGetValue("language", out toSet);
toSet.SetValue("English");
fields.TryGetValue("experience1", out toSet);
toSet.SetValue("Off");
fields.TryGetValue("experience2", out toSet);
toSet.SetValue("Yes");
fields.TryGetValue("experience3", out toSet);
toSet.SetValue("Yes");
fields.TryGetValue("shift", out toSet);
toSet.SetValue("Any");
fields.TryGetValue("info", out toSet);
toSet.SetValue("I was 38 years old when I became an MI6 agent.");
form.FlattenFields();
pdf.Close();

Summary

We started this chapter by looking as a handful of annotation types:

  • a text annotation,

  • a link annotation,

  • a line annotation, and

  • a text markup annotation.

We also mentioned widget annotations. This led us to the subject of interactive forms. We learned how to create a form, but more importantly how to fill out and flatten a form.

In the fill and flatten examples, we encountered a new class, PdfReader. In the next chapter, we'll take a look at some more examples that use this class.

Subtitle
eBook

Chapter 2: Adding low-level content

When we talk about low-level content in iText documentation, we always refer to PDF syntax that is written to a PDF content stream. PDF defines a series of operators such as m for which we created the MoveTo() method in iText, lfor which we created the LineTo() method, and S for which we created the Stroke() method. By combining these operands in a PDF (or by combining these methods in iText) you can draw paths and shapes.

Let’s take a look at a small example:

-406 20 m
406 20 l
S

This is PDF syntax that says: move to position ( X = -406 ; Y = 20 ), then construct a path to position ( X = 406 ; Y = 20 ); finally stroke that line – in this context, “stroking” means drawing. If we want to create this snippet of PDF syntax with iText, it goes like this:

canvas.MoveTo(-406, 20)
    .LineTo(406, 20)
    .Stroke();

That looks easy, doesn’t it? But what is that canvas object we’re using? Let’s take a look at a couple examples to find out.

Drawing lines on a canvas

Suppose that we would like to create a PDF that looks like Figure 2.1.

Figure 2.1: drawing an X and Y axis

Figure 2.1: drawing an X and Y axis

This PDF showing an X and Y axis was created with the Axes example.

Let’s examine this example step by step.

var fos = new FileStream(dest, FileMode.Create);
var writer = new PdfWriter(fos);
var pdf = new PdfDocument(writer);
var ps = PageSize.A4.Rotate();
var page = pdf.AddNewPage(ps);
var canvas = new PdfCanvas(page);
// Draw the axes
pdf.Close();

The first thing that jumps out is that we no longer use a Document object. Just like in the previous chapter, we create an Stream (line 1), a PdfWriter (line 2) and a PdfDocument (line 3) but instead of creating a Document with a default or specific page size, we create a PdfPage (line 5) with a specific PageSize (line 4). In this case, we use an A4 page with landscape orientation. Once we have a PdfPage instance, we use it to create a PdfCanvas (line 6). We’ll use this canvas object to create a sequence of PDF operators and operands. As soon as we’ve finished drawing and painting whatever paths and shapes we want to add to the page, we close the PdfDocument (line 8).

In the previous chapter, we closed the Document object with document.Close(); This implicitly closed the PdfDocumentobject. Now that there is no Document object, we have to close the PdfDocument object.

In PDF, all measurements are done in user units. By default one user unit corresponds with one point. This means that there are 72 user units in one inch. In PDF, the X axis points to the right and the Y axis points upwards. If you use the PageSize object to create the page size, the origin of the coordinate system is located in the lower-left corner of the page. All the coordinates that we use as operands for operators such as m or l use this coordinate system. We can change the coordinate system by changing the current transformation matrix.

The coordinate system and the transformation matrix

If you’ve followed a class in analytical geometry, you know that you can move objects around in space by applying a transformation matrix. In PDF, we don’t move the objects, but we move the coordinate system and we draw the objects in the new coordinate system. Suppose that we want to move the coordinate system in such a way that the origin of the coordinate system is positioned in the exact middle of the page. In that case, we’d need to use the ConcatMatrix() method:

canvas.ConcatMatrix(1, 0, 0, 1, ps.GetWidth() / 2, ps.GetHeight() / 2);

The parameters of the concatMatrix() method are elements of a transformation matrix. This matrix consists of three columns and three rows:

a  b  0 
c  d  0 
e  f  1 

The values of the elements in the third column are always fixed (00, and 1), because we’re working in a two dimensional space. The values abc, and d can be used to scale, rotate, and skew the coordinate system. There is no reason why we are confined to a coordinate system where the axes are orthogonal or where the progress in the X direction needs to be identical to the progress in the Y direction. But let’s keep things simple and use 100, and 1 as values for abc, and d. The elements e and f define the translation. We take the page size ps and we divide its width and height by two to get the values for e and f.

The graphics state

The current transformation matrix is part of the graphics state of the page. Other values that are defined in the graphics state are the line width, the stroke color (for lines), the fill color (for shapes), and so on. In another tutorial, we’ll go in more depth, describing each value of the graphics state in great detail. For now it’s sufficient to know that the default line width is 1 user unit and that the default stroke color is black. Let’s draw those axes we saw in Figure 2.1:

//Store a "backup" of the current graphical state
canvas.SaveState();
 
//Change the page's coordinate system so that 0,0 is at the center
canvas.ConcatMatrix(1, 0, 0, 1, ps.GetWidth() / 2, ps.GetHeight() / 2);
 
/When joining lines we want them to use a rounded corner
canvas.SetLineJoinStyle(PdfCanvasConstants.LineJoinStyle.ROUND);
 
//Draw X axis
canvas.MoveTo(-(ps.GetWidth() / 2 - 15), 0)
        .LineTo(ps.GetWidth() / 2 - 15, 0)
        .Stroke();
 
//Draw Y axis
canvas.MoveTo(0, -(ps.GetHeight() / 2 - 15))
        .LineTo(0, ps.GetHeight() / 2 - 15)
        .Stroke();
 
//Draw X axis arrow
canvas.MoveTo(ps.GetWidth() / 2 - 25, -10)
        .LineTo(ps.GetWidth() / 2 - 15, 0)
        .LineTo(ps.GetWidth() / 2 - 25, 10)
        .Stroke();
 
//Draw Y axis arrow
canvas.MoveTo(-10, ps.GetHeight() / 2 - 25)
        .LineTo(0, ps.GetHeight() / 2 - 15)
        .LineTo(10, ps.GetHeight() / 2 - 25)
        .Stroke();
 
//Draw X serif
for (int i = -((int)ps.GetWidth() / 2 - 61); i < ((int)ps.GetWidth() / 2 - 60); i += 40) {
    canvas.MoveTo(i, 5).LineTo(i, -5);
}
//Draw Y serif
for (int j = -((int)ps.GetHeight() / 2 - 57); j < ((int)ps.GetHeight() / 2 - 56); j += 40) {
    canvas.MoveTo(5, j).LineTo(-5, j);
}
canvas.Stroke();
 
//"Restore" our "backup" which resets any changes that the above made
canvas.RestoreState();

This code snippet consists of different parts:

  • Lines 2 and 43 show a best practice that we should use whenever we change the graphics state. First we save the current graphics state with the SaveState() method, then we change the state and draw whatever lines or shapes we want to draw, and finally we use the RestoreState() method to return to the original graphics state. All the changes that we applied after SaveState() will be undone. This is especially interesting if you change multiple values (line width, color,…) or when it’s difficult to calculate the reverse change (returning to the original coordinate system).

  • Line 5 changes the current coordinate system so that (0, 0) is in the middle of the page instead of the bottom left corner

  • Line 8 changes the way that two or more lines are drawn when they interset as a result of multiple LineTo calls. By default, lines will miter (the lines join in a sharp point) but this can be changed to bevel or round. We want our arrows to join using a rounded corner so we change the default line join value to ROUND.

  • Lines 11-13, 16-18 shouldn’t have any secrets for you anymore. We move to a coordinate, we construct a line to another coordinate, and we stroke the line.

  • Lines 21-24 and 27-30 are very similar to to what we did previously but these show you how to draw multiple lines in one operation. We construct the path of each arrow head with one MoveTo() and two LineTwo() calls.

  • In lines 32-40, we construct small serifs to be drawn on both axes every 40 user units. Observe that we don’t stroke them immediately. Only when we’ve constructed the complete path, we call the Stroke() method.

There’s usually more than one way to draw lines and shapes to the canvas. It would lead us too far to explain the advantages and disadvantages of different approaches with respect to the speed of production of the PDF file, the impact on the file size, and the speed of rendering the document in a PDF viewer. That’s something that needs to be further discussed in another tutorial.

There are also specific rules that need to be taken into account. For instance: sequences of SaveState() and RestoreState() need to be balanced. Every SaveState() needs a RestoreState(); it’s forbidden to have a RestoreState() that wasn’t preceded by a SaveState().

For now let’s adapt the first example of this chapter by changing line widths, introducing a dash pattern, and applying different stroke colors so that we get a PDF as shown in Figure 2.2.

Figure 2.2: drawing a grid

Figure 2.2: drawing a grid

In the Gridlines example, we first define a series of Color objects:

//DeviceCmyk lives in iText.Kernel.Colors
var grayColor = new DeviceCmyk(0, 0, 0, 0.875f);
var greenColor = new DeviceCmyk(1, 0, 1, 0.176f);
var blueColor = new DeviceCmyk(1, 0.156f, 0, 0.118f);

The PDF specification (ISO-32000) defines many different color spaces, each of which has been implemented in a separate class in iText. The most commonly used color spaces are DeviceGray (a color defined by a single intensity parameter),DeviceRgb(defined by three parameters: red, green, and blue) and DeviceCmyk (defined by four parameters: cyan, magenta, yellow and black). In our example, we use three CMYK colors.

Be aware that we’re not working with the System.Drawing.Color class. We’re working with iText’s Color class that can be found in iText.Kernel.Colors.

We want to create a grid that consists of thin blue lines:

canvas.SetLineWidth(0.5f).SetStrokeColor(blueColor);
for (var i = -((int)ps.GetHeight() / 2 - 57); i < ((int)ps.GetHeight() / 2 - 56); i += 40) {
    canvas.MoveTo(-(ps.GetWidth() / 2 - 15), i)
            .LineTo(ps.GetWidth() / 2 - 15, i);
}
for (var j = -((int)ps.GetWidth() / 2 - 61); j < ((int)ps.GetWidth() / 2 - 60); j += 40) {
    canvas.MoveTo(j, -(ps.GetHeight() / 2 - 15))
            .LineTo(j, ps.GetHeight() / 2 - 15);
 
canvas.Stroke();

In line 1, we set the line width to half a user unit and the color to blue. In lines 2-9, we construct the paths of the grid lines, and we stroke them in line 10.

We reuse the code to draw the axes from the previous example, but we let them precede by a line that changes the line width and stroke color.

canvas.SetLineWidth(3).SetStrokeColor(grayColor);

After we’ve drawn the axes, we draw a dashed green line that is 2 user units wide:

canvas.SetLineWidth(2)
    .SetStrokeColor(greenColor)
    .SetLineDash(10, 10, 8)
    .MoveTo(-(ps.GetWidth() / 2 - 15), -(ps.GetHeight() / 2 - 15))
    .LineTo(ps.GetWidth() / 2 - 15, ps.GetHeight() / 2 - 15)
    .Stroke();

There are many possible variations to define a line dash, but in this case, we are defining the line dash using three parameters. The length of the dash is 10 user units; the length of the gap is 10 user units; the phase is 8 user units (the phase defines the distance in the dash pattern to start the dash).

Feel free to experiment with some of the other methods that are available in the PdfCanvas class. You can construct curves with the CurveTo() method, rectangles with the Rectangle() method, and so on. Instead of stroking paths with the Stroke() method using the stroke color, you can also fill paths with the Fill() method using the fill color. The PdfCanvasclass offers much more than a .Net version of the PDF operators. It also introduces a number of convenience classes to construct specific paths for which there are no operators available in PDF, such as ellipses or circles.

In our next example, we’ll look at a subset of the graphics state that will allow us to add text at absolute positions.

The text state

In Figure 2.3, we see the opening titles of Episode V of Star Wars: The Empire Strikes Back.

Figure 2.3: adding text at absolute positions

Figure 2.3: adding text at absolute positions

The best way to create such a PDF, would be to use a sequence of Paragraph objects with different alignments (center for the title, left aligned for the body text), and to add these paragraphs to a Document object. Using the high-level approach will distribute the text over several lines, introducing line breaks automatically if the content doesn’t fit the width of the page, and page breaks if the remaining content doesn’t fit the height of the page.

All of this doesn’t happen when we add text using low-level methods. We need to break up the content into small chunks of text ourselves as is done in the StarWars example:

var text = new List<string>();
text.Add("         Episode V         ");
text.Add("  THE EMPIRE STRIKES BACK  ");
text.Add("It is a dark time for the");
text.Add("Rebellion. Although the Death");
text.Add("Star has been destroyed,");
text.Add("Imperial troops have driven the");
text.Add("Rebel forces from their hidden");
text.Add("base and pursued them across");
text.Add("the galaxy.");
text.Add("Evading the dreaded Imperial");
text.Add("Starfleet, a group of freedom");
text.Add("fighters led by Luke Skywalker");
text.Add("has established a new secret"); 
text.Add("base on the remote ice world");
text.Add("of Hoth...");

 For reasons of convenience, we change the coordinate system so that its origin lies in the top-left corner instead of the bottom-left corner. We then create a text object with the BeginText() method, and we change the text state:

canvas.ConcatMatrix(1, 0, 0, 1, 0, ps.GetHeight());
 canvas.BeginText()
     .SetFontAndSize(PdfFontFactory.CreateFont(iText.IO.Font.FontConstants.COURIER_BOLD), 14)
     .SetLeading(14 * 1.2f)
     .MoveText(70, -40);

We create a PdfFont to show the text in Courier Bold and we change the text state so that all text that is drawn will use this font with font size 14. We also define a leading of 1.2 times this font size. The leading is the distance between the baselines of two subsequent lines of text. Finally, we change the text matrix so that the cursor moves 70 user units to the right and 40 user units down.

Next, we loop over the different String values in our text list, show each String on a new line — moving the cursor down 16.2 user units (this is the leading) —, and we close the text object with the EndText() method.

foreach( var s in text) {
    //Add text and move to the next line
    canvas.NewlineShowText(s);
}
canvas.EndText();

It’s important not to show any text outside of a text object — which is delimited by the BeginText()/EndText() methods. It’s also forbidden to nest BeginText()/EndText() sequences.

What if we pimped this example and changed it in such a way that it produces the PDF shown in figure 2.4?

Figure 2.4: adding skewed and colored text at absolute positions

Figure 2.4: adding skewed and colored text at absolute positions

Changing the color of the background is the easy part in the StarWarsCrawl example:

canvas.Rectangle(0, 0, ps.GetWidth(), ps.GetHeight())
    .SetColor(iText.Kernel.Colors.Color.BLACK, true)
    .Fill();

We create a rectangle of which the lower-left corner has the coordinate X = 0, Y = 0, and of which the width and the height correspond with the width and the height of the page size. We set the fill color to black. We could have used SetFillColor(iText.Kernel.Colors.Color.BLACK), but we used the more generic SetColor() method instead. The boolean indicates if we want to change the stroke color (false) or the fill color (true). Finally, we fill that path of the rectangle using the fill color as paint.

Now comes the less trivial part of the code: how do we add the text?

//Get the length of the longest string
var maxStringWidth = text.Max(s => s.Length);
 
canvas.ConcatMatrix(1, 0, 0, 1, 0, ps.GetHeight());
 
var yellowColor = new DeviceCmyk(0, 0.0537f, 0.769f, 0.051f);
 
float lineHeight = 5;
float yOffset = -40;
 
canvas.BeginText()
        .SetFontAndSize(PdfFontFactory.CreateFont(iText.IO.Font.FontConstants.COURIER_BOLD), 1)
        .SetColor(yellowColor, true);
 
for (var j = 0; j < text.Count; j++) {
    var line = text[j];
    var xOffset = ps.GetWidth() / 2 - 45 - 8 * j;
    var fontSizeCoeff = 6 + j;
    var lineSpacing = (lineHeight + j) * j / 1.5f;
    var stringWidth = line.Length;
    for (var i = 0; i < stringWidth; i++) {
        float angle = (maxStringWidth / 2 - i) / 2f;
        float charXOffset = (4 + (float)j / 2) * i;
        canvas.SetTextMatrix(fontSizeCoeff, 0,
                angle, fontSizeCoeff / 1.5f,
                xOffset + charXOffset, yOffset - lineSpacing)
          .ShowText(line[i].ToString());
    }
}
canvas.EndText();

First, we find the length of the longest string to use in our calculations later (line 2). Next, we once more, we change the origin of the coordinate system to the top of the page (line 4). We define a CMYK color for the text (line 6). We initialize a value for the line height (line 8) and the offset in the Y-direction (line 9). We begin writing a text object. We’ll use Courier Bold as font and define a font size of 1 user unit (line 12). The font size is only 1, but we’ll scale the text to a readable size by changing the text matrix. We don’t define a leading; we won’t need a leading because we won’t use NewlineShowText(). Instead we’ll calculate the starting position of each individual character, and draw the text character by character. We also introduce a fill color (line 13).

Every glyph in a font is defined as a path. By default, the paths of the glyphs that make up a text are filled. That’s why we set the fill color to change the color of the text.

We start looping over the text (line 15) and we read each line into a String (line 16). We’ll need plenty of Math to define the different elements of the text matrix that will be used to position each glyph. We define an xOffset for every line (line 17). Our font size was defined as 1 user unit, but we’ll multiply it with a fontSizeCoeff that will depend on the index of the line in the text array (line 18). We’ll also define where the line will start relative to the yOffset (19).

We calculate the number of characters in each line (line 20) and we loop over all the characters (line 21). We define an angle depending on the position of the character in the line (line 22). The charXOffset depends on both the index of the line and the position of the character (line 23).

Now we’re ready to set the text matrix (line 24-26). Parameter a and d define the scaling factors. We’ll use them to change the font size. With parameter c, we introduce a skew factor. Finally, we calculate the coordinate of the character to determine the parameter e and f. Now that the exact position of the character is determined, we show the character using the ShowText()method (line 27). This method doesn’t introduce any new lines. Once we’ve finished looping over all the characters in all the lines, we close the text object with the EndText() method (line 30).

If you think this example was rather complex, you are absolutely right. I used it just to show that iText allows you to create content in whatever way you want. If it’s possible in PDF, it’s possible with iText. But rest assured, the upcoming examples will be much easier to understand.

Summary

In this chapter, we’ve been experimenting with PDF operators and operands and the corresponding iText methods. We’ve learned about a concept called graphics state that keeps track of properties such as the current transformation matrix, line width, color, and so on. Text state is a subset of the graphics state that covers all the properties that are related to text, such as the text matrix, the font and size of the text, and many other properties we haven’t discussed yet. We’ll get into much more detail in another tutorial.

One might wonder why a developer would need access to the low-level API knowing that there’s so much high-level functionality in iText. That question will be answered in the next chapter.

Subtitle
eBook

Migration guide from iText 5 to iText 7

Migration guide from iText 5 to iText 7
Main image
iText 7 and UWP

Digital Signatures in a Universal Application (UWP) with iText - reblog

Recently we found Paul Madary's blog post about digital signatures in files in a UWP app with iText, and we wanted to share it. Paul gracefully agreed to let us do that, and as a bonus we upgraded the code to be usable out-of-the-box with iText 7.1.3. The only change needed is the method SignDocumentSignature.

What does it take to make electronic signatures legally binding?

In the United States, the main law governing electronic signatures is the Electronic Signatures in Global and National Commerce (ESIGN) act of 2000. There is also the Uniform Electronic Transactions Act (UETA), which has been adopted by 47 states. Those laws require the following criteria to be met to make an electronic signature legally binding:

  • Intent to sign

  • Consent to do business electronically

    • Received UETA Consumer Consent Disclosures

    • Affirmatively agreed to use electronic records for the transaction

    • Has not withdrawn such consent

  • Association of signature with the record

    • System used to capture the transaction must keep an associated record that reflects the process by which the signature was created or generate a textual or graphic statement (which is added to the signed record) proving that it was executed with an electronic signature.

  • Record retention

    • Electronic signature records be capable of retention and accurate reproduction for reference by all parties or persons entitled to retain the contract or record.

  • Opt-out clause

  • Signed copy of fully executed agreement

Electronic signatures and digital signatures...is there a difference?

In a word, YES! Although the terms are often used interchangeably, there is a difference between an electronic and digital signature.

Electronic Signature

It is the equivalent of a hand-written signature. It can be a client typing their name in a text box, checking an “I accept” checkbox, or some other process that records a person’s agreement to be bound by a contract. In the past, contracts would need to be printed out, signed, and stored. This could take days/weeks; but, with electronic signatures, it can be completed in minutes.

Digital Signature

It is more like having a notary public stamp a document assuring that the signatures are valid and that the document hasn’t been tampered with. Simply put, the electronic signature captures the person’s intent to enter into the agreement, and the digital signature is used to secure the data and verify the authenticity of the signed document.

Tell me more about a digital signature!

Digital signatures are typically used in Adobe PDF documents by individuals and organizations to prove who verified the document on a specific date/time and that the document hasn’t been modified since being signed.

Based on the requirements and work flow associated with signing a contract, there can be one or more signatures on a single PDF document. The actual digital signature is a section of non-visible, hashed & encrypted metadata embedded in the document using a certificate. In addition to the actual digital signature, an optional visible representation of the signature can also be included in the PDF document. The visible portion of the signature can include an image of the signature and/or a description of the signing certificate.

signature example

digital signature properties

What type of certificate do I need?

Depending on your needs, there are varying certificate options available:

Self-signed Certificate

They’re free to create and work to sign PDF documents. They’re great for testing, but they are the least secure option. If they are ever compromised, they can’t be revoked – which could potentially invalidate all contracts signed with the certificate.

Client Certificate

For about $10 per year, a certificate authority (CA) such as DigiCert, GlobalSign, etc. can issue a certificate used for signing PDF documents. The upsides of this solution are that they’re cheap, the certificate is issued by a trusted CA, and it can be revoked if it ever gets compromised. The downsides of this option are that client certificates aren’t included in Adobe’s Approved Trust List (AATL) so the PDF document would display a default warning in Adobe, and (since client certificates have fewer security requirements compared to AATL certificates) they are less secure.

AATL Certificate

This is the most secure option, but because of the added security and certification requirements imposed by Adobe, it can cost thousands of dollars per year (pricing is typically based on the number of signatures needed). For our project, costs for this solution were estimated between $4,000 - $13,000 per year. It requires 2-factor authentication via a hardware key per device, a Hardware Security Device (HSM), or a cloud-based solution. Some AATL solutions offer a new certificate per signature which would facilitate a unique certificate per signer.

Note: If you need to sign a document directly in Adobe, MS Word, or other similar application, the AATL certificate is the only solution available. Finally, this is the only option that will show up as a valid & trusted signature in Adobe by default.

aalt certifcate

So, I’ve heard of another topic called Long Term Validation (LTV). What is it?

Say you need to store the signed contract for years or decades but are concerned that the signature may become invalid if the certificate expires, is revoked, or (worse yet) the CA goes out of business. Any of these scenarios, which are external to the signed document, put the validity of the contract in jeopardy. This is where Long Term Validation comes in. LTV adds an extra section to the digital signature that includes a timestamp from a trusted Timestamp Authority and the status of the certificate at the time of signing. By adding an LTV section, everything needed to verify the certificate and signature which were valid at the time the contract was signed is self-contained within the PDF document and have no external dependencies that may change over time.

Let’s get into the details starting with the workflow….

In our specific case,

  1. A client comes into the business and signs an agreement for a year of service.

  2. We use a tablet device running a custom UWP application to first populate the agreement with the specific terms of the contract for the client (name, price, service plan, etc.). Then we display the filled-in PDF document on the screen for the client to review.

  3. Once the client reviews the document, they are given the option to either print the agreement and hand-sign it manually or electronically sign the document on the tablet.

  4. If they choose to electronically sign the document, a “Capture Signature” screen is displayed where the client can sign their name and choose whether they want a copy of the signed contract emailed to them and/or have a physical copy of the contract printed.

  5. When they click the “Yes, I Agree” button, the PDF contract is digitally signed, a copy of the contract is saved in the database with additional details related to the document, and then the contract is emailed and/or printed based on what the client selected when signing the contract.

Long term validation signature

Can I see some code?

You bet!

  • On the UI, I used an InkCanvas control to capture the client’s signature: <InkCanvas x:Name="signatureInkCanvas" Height="125" Width="750" />

  • In the code, the InkCanvas gets initialized in the constructor:inkcanvas code

  • Add a check to make sure the client has made at least some sort of signature before clicking the “Yes, I Agree” button:inkpresenter code

  • And finally capture the image:client signature button codeink canvas bitmapink canvas byte array code

  • In the SaveClientSignature method, that’s where the digital signature magic happens. First, we fill in the PDF document with the name, price, service plan, etc., and then flatten and save the PDF to a file [not shown]. After we have the flattened PDF, we apply the signature using the SignDocumentDigitalSignature method below:

    private async Task SignDocumentSignature(string filePath, AgreementElectronicSignatureParametersDTO agreementParameters, ElectronicSignatureInfoDTO signatureInfo)
    {
        if (agreementParameters != null && signatureInfo != null)
        {
            //Maintain the same ratio as the height/width of the client's signature image
            const int signatureHeight = 25;
            const int signatureWidth = 150;
     
            string clientSignaturePath = string.Concat(filePath.Replace(".pdf", "_ClientSignature.jpg"));
            string filePathSigned = string.Concat(filePath.Replace(".pdf", "_Signed.pdf"));
     
            try
            {
                PdfReader pdfReader = new PdfReader(filePath);
     
                PdfSigner pdfSigner = new PdfSigner(pdfReader, new FileStream(filePathSigned, FileMode.Create), false);
     
                IExternalSignature pks = GetPrivateKeySignature();         
                Org.BouncyCastle.X509.X509Certificate[] chain = GetCertificateChain();
     
                OCSPVerifier ocspVerifier = new OCSPVerifier(null, null);
                OcspClientBouncyCastle ocspClient = new OcspClientBouncyCastle(ocspVerifier);
                CrlClientOnline crlClient = new CrlClientOnline();
     
                TSAClientBouncyCastle tsa = new TSAClientBouncyCastle(GetTimeStampAuthorityURL());
     
                //Show image of the client's signature on the pdf
                SaveBase64AsImage(clientSignaturePath, agreementParameters.ClientSignature);
                ImageData clientSignatureImage = ImageDataFactory.Create(clientSignaturePath);
     
                pdfSigner.SetCertificationLevel(PdfSigner.CERTIFIED_NO_CHANGES_ALLOWED);
                pdfSigner.SetFieldName("signature");
     
                PdfSignatureAppearance signatureAppearance = pdfSigner.GetSignatureAppearance();
                signatureAppearance.SetRenderingMode(PdfSignatureAppearance.RenderingMode.GRAPHIC);
                signatureAppearance.SetReason("");
                signatureAppearance.SetLocationCaption("");
                signatureAppearance.SetSignatureGraphic(clientSignatureImage);
                signatureAppearance.SetPageNumber(signatureInfo.PageNumber);
                signatureAppearance.SetPageRect(new Rectangle(signatureInfo.Left, signatureInfo.Bottom,
                    signatureWidth, signatureHeight));
     
                pdfSigner.SignDetached(pks, chain, (new List<ICrlClient>() {crlClient}), ocspClient, tsa, 0,
                    PdfSigner.CryptoStandard.CMS);
     
                // Replace the original agreement with the signed version
                File.Delete(filePath);
                File.Copy(filePathSigned, filePath);
                File.Delete(filePathSigned);
            }
            catch
            {
                throw;
            }
            finally
            {
                //Remove signature images if it exists
                if (!String.IsNullOrEmpty(clientSignaturePath) && File.Exists(clientSignaturePath))
                    File.Delete(clientSignaturePath);
            }
        }
    }

    key vault client codesignature code

    And there you have it: an electronically and digitally signed document.

Where to Learn More

What makes an electronic signature legally binding:

Article type
Technical notes

Installing iText 7 pdfSweep for .NET developers

We will walk you through the installation process, from downloading iText 7 pdfSweep to adding the dependency to your .NET building tool.

Installing iText 7 pdfSweep for Java developers

We will walk you through the installation process, from downloading iText 7 pdfSweep to adding the dependency to your Java building tool.

Installing iText 7 pdfHTML for .NET developers

We will walk you through the installation process, from downloading iText 7 pdfHTML to adding the dependency to your Java building tool.
Contact

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now