Working with Documents#

Passwords & Security#

A document may require a password if it is protected. To check this use the needsPassword method as follows:

EXAMPLE

let needsPassword = document.needsPassword()

To provide a password use the authenticatePassword method as follows:

EXAMPLE

let auth = document.authenticatePassword("abracadabra")

See the authenticate password return values for what the return value means.

Document Metadata#

Get Metadata#

You can get metadata for a document using the getMetaData method.

The common keys are: format, encryption, info:ModDate, and info:Title.

EXAMPLE

const format = document.getMetaData("format")
const modificationDate = document.getMetaData("info:ModDate")
const author = document.getMetaData("info:Author")

Set Metadata#

You can set metadata for a document using the setMetaData method.

EXAMPLE

document.setMetaData("info:Author", "Jane Doe")

Get the Document Page Count#

Count the number of pages in the document.

EXAMPLE

const numPages = document.countPages()

Load a Page of a Document#

To load a page of a document use the PDFDocument loadPage method to return a page instance.

EXAMPLE

// load the 1st page of the document
let page = document.loadPage(0)

Extracting Document Text#

To get the text for an entire document we can retrieve StructuredText objects as JSON for each page as follows:

EXAMPLE

let i = 0
while (i < document.countPages()) {
    const page = document.loadPage(i)
    const json = page.toStructuredText("preserve-whitespace").asJSON()
    console.log(`json=${json}`)
    i++
}

StructuredText contains objects from a page that have been analyzed and grouped into blocks, lines and spans. As such the JSON returned is structured and contains positional data and font data alongside text values, e.g.:

EXAMPLE

{
    "blocks": [
        {
            "type": "text",
            "bbox": {
                "x": 30,
                "y": 32,
                "w": 216,
                "h": 13
            },
            "lines": [
                {
                    "wmode": 0,
                    "bbox": {
                        "x": 30,
                        "y": 32,
                        "w": 216,
                        "h": 13
                    },
                    "font": {
                        "name": "FKGYDX+Arial",
                        "family": "sans-serif",
                        "weight": "normal",
                        "style": "normal",
                        "size": 12
                    },
                    "x": 30,
                    "y": 43,
                    "text": "Welcome to the Node server test.pdf file."
                }
            ]
        },
        {
            "type": "text",
            "bbox": {
                "x": 30,
                "y": 68,
                "w": 190,
                "h": 13
            },
            "lines": [
                {
                    "wmode": 0,
                    "bbox": {
                        "x": 30,
                        "y": 68,
                        "w": 190,
                        "h": 13
                    },
                    "font": {
                        "name": "FKGYDX+Arial",
                        "family": "sans-serif",
                        "weight": "normal",
                        "style": "normal",
                        "size": 12
                    },
                    "x": 30,
                    "y": 79,
                    "text": "Sorry there is not much to see here!"
                }
            ]
        },
        {
            "type": "text",
            "bbox": {
                "x": 568,
                "y": 31,
                "w": 6,
                "h": 13
            },
            "lines": [
                {
                    "wmode": 0,
                    "bbox": {
                        "x": 568,
                        "y": 31,
                        "w": 6,
                        "h": 13
                    },
                    "font": {
                        "name": "YDTIJL+Arial",
                        "family": "sans-serif",
                        "weight": "normal",
                        "style": "normal",
                        "size": 12
                    },
                    "x": 568,
                    "y": 42,
                    "text": "1"
                }
            ]
        },
        {
            "type": "text",
            "bbox": {
                "x": 28,
                "y": 744,
                "w": 84,
                "h": 19
            },
            "lines": [
                {
                    "wmode": 0,
                    "bbox": {
                        "x": 28,
                        "y": 744,
                        "w": 84,
                        "h": 19
                    },
                    "font": {
                        "name": "Arial",
                        "family": "sans-serif",
                        "weight": "normal",
                        "style": "normal",
                        "size": 14
                    },
                    "x": 28,
                    "y": 759,
                    "text": "Page 1 footer"
                }
            ]
        }
    ]
}

Extracting Document Annotations#

We can retrieve Annotation objects from document pages by querying each page with getAnnotations.

EXAMPLE

let i = 0
while (i < document.countPages()) {
    const page = document.loadPage(0)
    const annots = page.getAnnotations()
    console.log(`Page=${page}, Annotations=${annots}`)
    i++
}

“Baking” a Document#

If you need to flatten your document’s annotations and/or widgets this is known as “baking”.

You can use the bake method as follows:

EXAMPLE

document.bake()

Removing a File from a Document#

Use the deleteEmbeddedFile method on a document instance to remove an attached file.

EXAMPLE

document.deleteEmbeddedFile("test.txt")

Searching a Document#

To search a document we can look at each page and use the search method as follows:

EXAMPLE

let results = page.search("my search phrase")

Note

The resulting array contains numbers which are a sequence of [ulx, uly, urx, ury, llx, lly, lrx, lry] which defines each rectangle for each result. These type of rectangles are known as QuadPoints in the PDF specification.

For example, the following would represent a search result with two results showing one “QuadPoint” (or “Quad”) for each result:

EXAMPLE

[
    [
        [
            97.44780731201172,
            32.626708984375,
            114.12963104248047,
            32.626708984375,
            97.44780731201172,
            46.032958984375,
            114.12963104248047,
            46.032958984375
        ]
    ],
    [
        [
            62.767799377441406,
            68.626708984375,
            79.44963073730469,
            68.626708984375,
            62.767799377441406,
            82.032958984375,
            79.44963073730469,
            82.032958984375
        ]
    ]
]