Working with Documents#
Passwords & Security#
A document may require a password if it is protected. To check this use the needsPassword method as follows:
EXAMPLE
let needsPassword = document.needsPassword()
To provide a password use the authenticatePassword method as follows:
EXAMPLE
let auth = document.authenticatePassword("abracadabra")
See the authenticate password return values for what the return value means.
Document Metadata#
Get Metadata#
You can get metadata for a document using the getMetaData method.
The common keys are: format, encryption, info:ModDate, and info:Title.
EXAMPLE
const format = document.getMetaData("format")
const modificationDate = document.getMetaData("info:ModDate")
const author = document.getMetaData("info:Author")
Set Metadata#
You can set metadata for a document using the setMetaData method.
EXAMPLE
document.setMetaData("info:Author", "Jane Doe")
Get the Document Page Count#
Count the number of pages in the document.
EXAMPLE
const numPages = document.countPages()
Load a Page of a Document#
To load a page of a document use the PDFDocument loadPage method to return a page instance.
EXAMPLE
// load the 1st page of the document
let page = document.loadPage(0)
Extracting Document Text#
To get the text for an entire document we can retrieve StructuredText objects as JSON for each page as follows:
EXAMPLE
let i = 0
while (i < document.countPages()) {
const page = document.loadPage(i)
const json = page.toStructuredText("preserve-whitespace").asJSON()
console.log(`json=${json}`)
i++
}
StructuredText contains objects from a page that have been analyzed and grouped into blocks, lines and spans. As such the JSON returned is structured and contains positional data and font data alongside text values, e.g.:
EXAMPLE
{
"blocks": [
{
"type": "text",
"bbox": {
"x": 30,
"y": 32,
"w": 216,
"h": 13
},
"lines": [
{
"wmode": 0,
"bbox": {
"x": 30,
"y": 32,
"w": 216,
"h": 13
},
"font": {
"name": "FKGYDX+Arial",
"family": "sans-serif",
"weight": "normal",
"style": "normal",
"size": 12
},
"x": 30,
"y": 43,
"text": "Welcome to the Node server test.pdf file."
}
]
},
{
"type": "text",
"bbox": {
"x": 30,
"y": 68,
"w": 190,
"h": 13
},
"lines": [
{
"wmode": 0,
"bbox": {
"x": 30,
"y": 68,
"w": 190,
"h": 13
},
"font": {
"name": "FKGYDX+Arial",
"family": "sans-serif",
"weight": "normal",
"style": "normal",
"size": 12
},
"x": 30,
"y": 79,
"text": "Sorry there is not much to see here!"
}
]
},
{
"type": "text",
"bbox": {
"x": 568,
"y": 31,
"w": 6,
"h": 13
},
"lines": [
{
"wmode": 0,
"bbox": {
"x": 568,
"y": 31,
"w": 6,
"h": 13
},
"font": {
"name": "YDTIJL+Arial",
"family": "sans-serif",
"weight": "normal",
"style": "normal",
"size": 12
},
"x": 568,
"y": 42,
"text": "1"
}
]
},
{
"type": "text",
"bbox": {
"x": 28,
"y": 744,
"w": 84,
"h": 19
},
"lines": [
{
"wmode": 0,
"bbox": {
"x": 28,
"y": 744,
"w": 84,
"h": 19
},
"font": {
"name": "Arial",
"family": "sans-serif",
"weight": "normal",
"style": "normal",
"size": 14
},
"x": 28,
"y": 759,
"text": "Page 1 footer"
}
]
}
]
}
Extracting Document Annotations#
We can retrieve Annotation objects from document pages by querying each page with getAnnotations.
EXAMPLE
let i = 0
while (i < document.countPages()) {
const page = document.loadPage(0)
const annots = page.getAnnotations()
console.log(`Page=${page}, Annotations=${annots}`)
i++
}
“Baking” a Document#
If you need to flatten your document’s annotations and/or widgets this is known as “baking”.
You can use the bake method as follows:
EXAMPLE
document.bake()
Removing a File from a Document#
Use the deleteEmbeddedFile method on a document instance to remove an attached file.
EXAMPLE
document.deleteEmbeddedFile("test.txt")
Searching a Document#
To search a document we can look at each page and use the search method as follows:
EXAMPLE
let results = page.search("my search phrase")
Note
The resulting array contains numbers which are a sequence of [ulx, uly, urx, ury, llx, lly, lrx, lry] which defines each rectangle for each result. These type of rectangles are known as QuadPoints in the PDF specification.
For example, the following would represent a search result with two results showing one “QuadPoint” (or “Quad”) for each result:
EXAMPLE
[
[
[
97.44780731201172,
32.626708984375,
114.12963104248047,
32.626708984375,
97.44780731201172,
46.032958984375,
114.12963104248047,
46.032958984375
]
],
[
[
62.767799377441406,
68.626708984375,
79.44963073730469,
68.626708984375,
62.767799377441406,
82.032958984375,
79.44963073730469,
82.032958984375
]
]
]
Getting Document Links#
To get document links (if any) we can look at each page and use the getLinks() method as follows:
let links = page.getLinks()
Note
The resulting array contains an array of Link objects which have their own bounds and uri for the link.