Working with Pages#
A Page is an instance of the PDFPage class.
Loading a Page#
To load a page of a document use the document.loadPage()
method to return a page instance.
EXAMPLE
// load the 1st page of the document
let page = document.loadPage(0)
Getting the Page Bounds#
To get the bounds of page do the following:
EXAMPLE
let rect = page.getBounds()
This returns a numerical array object in the following format: [ulx,uly,lrx,lry]
.
Convert a Page to an Image#
To convert a page to an image use the toPixmap()
method, after this the Pixmap data can be converted to the image format you require.
The parameters for the method define:
the resolution via a matrix
the ColorSpace for rendering
background transparency
whether to render any annotations on the page.
EXAMPLE
let pixmap = page.toPixmap(mupdfjs.Matrix.identity, mupdfjs.ColorSpace.DeviceRGB, false, true)
let pngImage = pixmap.asPNG()
let base64Image = Buffer.from(pngImage, 'binary').toString('base64')
Extracting Page Text#
There are two methods for extracting text, one which simply gives the plain text and another which delivers a more detailed text object.
Basic Text#
To get the plain text for a page we can retrieve a string
as follows:
EXAMPLE
const text = page.getText()
console.log(`text=${text}`)
Advanced Text#
To get a more advanced representation of the page text we can retrieve a StructuredText object as JSON
as follows:
EXAMPLE
const json = page.toStructuredText("preserve-whitespace").asJSON()
console.log(`json=${json}`)
StructuredText contains objects from a page that have been analyzed and grouped into blocks, lines and spans. As such the JSON
returned is structured and contains positional data and font data alongside text values, e.g.:
EXAMPLE
{
"blocks": [
{
"type": "text",
"bbox": {
"x": 30,
"y": 32,
"w": 216,
"h": 13
},
"lines": [
{
"wmode": 0,
"bbox": {
"x": 30,
"y": 32,
"w": 216,
"h": 13
},
"font": {
"name": "FKGYDX+Arial",
"family": "sans-serif",
"weight": "normal",
"style": "normal",
"size": 12
},
"x": 30,
"y": 43,
"text": "Welcome to the Node server test.pdf file."
}
]
},
{
"type": "text",
"bbox": {
"x": 30,
"y": 68,
"w": 190,
"h": 13
},
"lines": [
{
"wmode": 0,
"bbox": {
"x": 30,
"y": 68,
"w": 190,
"h": 13
},
"font": {
"name": "FKGYDX+Arial",
"family": "sans-serif",
"weight": "normal",
"style": "normal",
"size": 12
},
"x": 30,
"y": 79,
"text": "Sorry there is not much to see here!"
}
]
},
{
"type": "text",
"bbox": {
"x": 568,
"y": 31,
"w": 6,
"h": 13
},
"lines": [
{
"wmode": 0,
"bbox": {
"x": 568,
"y": 31,
"w": 6,
"h": 13
},
"font": {
"name": "YDTIJL+Arial",
"family": "sans-serif",
"weight": "normal",
"style": "normal",
"size": 12
},
"x": 568,
"y": 42,
"text": "1"
}
]
},
{
"type": "text",
"bbox": {
"x": 28,
"y": 744,
"w": 84,
"h": 19
},
"lines": [
{
"wmode": 0,
"bbox": {
"x": 28,
"y": 744,
"w": 84,
"h": 19
},
"font": {
"name": "Arial",
"family": "sans-serif",
"weight": "normal",
"style": "normal",
"size": 14
},
"x": 28,
"y": 759,
"text": "Page 1 footer"
}
]
}
]
}
Extracting Page Images#
To get the images for a page we can use the getImages()
method as follows:
EXAMPLE
var result = page.getImages()
This returns an array of objects which includes the image (Image) along with the bounding box and matrix transform.
The following example would extract all the images from a page and save them as individual files:
var imageStack = page.getImages()
for (var i in imageStack) {
var image = imageStack[i].image;
var pixmap = image.toPixmap();
let raster = pixmap.asJPEG(80);
fs.writeFileSync('image-'+i+'.jpg', raster);
}
Extracting Page Annotations#
We can retrieve Annotation objects from pages by querying with getAnnotations()
.
EXAMPLE
const annots = page.getAnnotations()
console.log(`Annotations=${annots}`)
Adding Text to Pages#
The following script creates a blank PDF document, adds some styled text to the top of the document using the insertText()
method, and then saves the result to a file.
EXAMPLE
let document = mupdfjs.PDFDocument.createBlankDocument()
let page = document.loadPage(0) // get the 1st page of the document
page.insertText("HELLO WORLD",
[0,0],
"Times-Roman",
20,
{
strokeColor:[0,0,0,1],
fillColor:[1,0,0,0.75],
strokeThickness:0.5
}
)
fs.writeFileSync("output.pdf", document.saveToBuffer("").asUint8Array())
Adding Images to Pages#
The following script creates a blank PDF document, adds an Image to the top of the document using the insertImage()
method, and then saves the result to a file.
EXAMPLE
let image = new mupdfjs.Image(fs.readFileSync("logo.png"))
page.insertImage({image:image, name:"MyLogo"})
fs.writeFileSync("output.pdf", document.saveToBuffer("").asUint8Array())
Note
See coordinate space and PDFObject for more about how the image is sized and positioned with the addStream
method.
Adding Pages#
Use the newPage()
method to add pages to a document, you can choose where to insert the page in the document and the metrics for the new page.
EXAMPLE
The code below creates a blank document with a default A4 sized page and then adds a new 300x500 point sized page at the end of the document.
// Create a blank document with a blank page
let document = mupdfjs.PDFDocument.createBlankDocument()
// Add a page to the end of the document
document.newPage(-1, 300, 500)
Copying Pages#
To copy a page we can use the copyPage()
method and insert it as a new page of the document.
EXAMPLE
document.copyPage(0,-1)
Copying pages from another document#
The following script uses graftPage()
to copy the first page (0
) of another document to the end (-1
) of the current document:
EXAMPLE
let anotherDocument = mupdfjs.PDFDocument.openDocument(fs.readFileSync("test.pdf"), "application/pdf")
document.graftPage(-1, anotherDocument, 0)
Deleting Pages#
To delete a page from a document use the deletePage()
method on the Document()
instance.
EXAMPLE
// delete the first page of a document
document.deletePage(0)
Note
The page number is zero-indexed.
Rotating Pages#
Rotating a page with rotate()
allows for 90 increment rotations on a page.
EXAMPLE
// rotate a page 90 degrees anti-clockwise
page.rotate(-90)
Note
Positive rotation values are clockwise, negative are anti-clockwise.
Cropping Pages#
To crop a page we just need to set its “CropBox” value with setPageBox()
and an associated rectangle.
EXAMPLE
page.setPageBox("CropBox", [ 0, 0, 500, 500 ])
Implement a Device to print out PDF page contents#
If you need to invesigate the internals of a PDF page then you can run a Device on a page to detect objects.
EXAMPLE
const Q = JSON.stringify
function print(...args) {
console.log(args.join(" "))
}
var pathPrinter = {
moveTo: function (x,y) { print("moveTo", x, y) },
lineTo: function (x,y) { print("lineTo", x, y) },
curveTo: function (x1,y1,x2,y2,x3,y3) { print("curveTo", x1, y1, x2, y2, x3, y3) },
closePath: function () { print("closePath") },
}
var textPrinter = {
beginSpan: function (f,m,wmode, bidi, dir, lang) {
print("beginSpan",f,m,wmode,bidi,dir,Q(lang));
},
showGlyph: function (f,m,g,u,v,b) { print("glyph",f,m,g,String.fromCodePoint(u),v,b) },
endSpan: function () { print("endSpan"); }
}
var traceDevice = {
fillPath: function (path, evenOdd, ctm, colorSpace, color, alpha) {
print("fillPath", evenOdd, ctm, colorSpace, color, alpha)
path.walk(pathPrinter)
},
clipPath: function (path, evenOdd, ctm) {
print("clipPath", evenOdd, ctm)
path.walk(pathPrinter)
},
strokePath: function (path, stroke, ctm, colorSpace, color, alpha) {
print("strokePath", Q(stroke), ctm, colorSpace, color, alpha)
path.walk(pathPrinter)
},
clipStrokePath: function (path, stroke, ctm) {
print("clipStrokePath", Q(stroke), ctm)
path.walk(pathPrinter)
},
fillText: function (text, ctm, colorSpace, color, alpha) {
print("fillText", ctm, colorSpace, color, alpha)
text.walk(textPrinter)
},
clipText: function (text, ctm) {
print("clipText", ctm)
text.walk(textPrinter)
},
strokeText: function (text, stroke, ctm, colorSpace, color, alpha) {
print("strokeText", Q(stroke), ctm, colorSpace, color, alpha)
text.walk(textPrinter)
},
clipStrokeText: function (text, stroke, ctm) {
print("clipStrokeText", Q(stroke), ctm)
text.walk(textPrinter)
},
ignoreText: function (text, ctm) {
print("ignoreText", ctm)
text.walk(textPrinter)
},
fillShade: function (shade, ctm, alpha) {
print("fillShade", shade, ctm, alpha)
},
fillImage: function (image, ctm, alpha) {
print("fillImage", image, ctm, alpha)
},
fillImageMask: function (image, ctm, colorSpace, color, alpha) {
print("fillImageMask", image, ctm, colorSpace, color, alpha)
},
clipImageMask: function (image, ctm) {
print("clipImageMask", image, ctm)
},
beginMask: function (area, luminosity, colorspace, color) {
print("beginMask", area, luminosity, colorspace, color)
},
endMask: function () {
print("endMask")
},
popClip: function () {
print("popClip")
},
beginGroup: function (area, isolated, knockout, blendmode, alpha) {
print("beginGroup", area, isolated, knockout, blendmode, alpha)
},
endGroup: function () {
print("endGroup")
},
beginTile: function (area, view, xstep, ystep, ctm, id) {
print("beginTile", area, view, xstep, ystep, ctm, id)
return 0
},
endTile: function () {
print("endTile")
},
beginLayer: function (name) {
print("beginLayer", name)
},
endLayer: function () {
print("endLayer")
},
beginStructure: function (structure, raw, uid) {
print("beginStructure", structure, raw, uid)
},
endStructure: function () {
print("endStructure")
},
beginMetatext: function (meta, metatext) {
print("beginMetatext", meta, metatext)
},
endMetatext: function () {
print("endMetatext")
},
renderFlags: function (set, clear) {
print("renderFlags", set, clear)
},
setDefaultColorSpaces: function (colorSpaces) {
print("setDefaultColorSpaces", colorSpaces.getDefaultGray(),
colorSpaces.getDefaultRGB(), colorSpaces.getDefaultCMYK(),
colorSpaces.getOutputIntent())
},
close: function () {
print("close")
},
}
var doc = mupdfjs.PDFDocument.openDocument(fs.readFileSync("test.pdf"), "application/pdf")
var page = doc.loadPage(0)
var device = new mupdfjs.Device(traceDevice)
page.run(device, mupdfjs.Matrix.identity)
Code samples
Code samples are in TypeScript and assume that the following requirements are defined in your TypeScript file header as follows:
import * as fs from "fs"
import * as mupdfjs from "mupdf/mupdfjs"