Objects Reference

Alongside the basic object types documented in the COS Objects Reference, pdfnaut implements high-level objects mainly for use with PdfDocument.

Action Objects

class pdfnaut.objects.actions.Action[source]

Bases: PdfDictionary

An action instructs the PDF reader to perform an action such as opening an application, going to a page in the document, or playing a sound, when activating an annotation or outline item.

See ISO 32000-2:2020 § 12.6 “Actions” for details.

__init__(subtype: Literal['GoTo', 'GoToR', 'GoToE', 'GoToDPart', 'Launch', 'Thread', 'URI', 'Sound', 'Movie', 'Hide', 'Named', 'SubmitForm', 'ResetForm', 'ImportData', 'SetOCGState', 'Rendition', 'Trans', 'GoTo3DView', 'JavaScript', 'RichMediaExecute'], next_action: list[Action] | Action | None = None) None[source]
property next_action: list[Action] | Action | None

The next action or sequence of actions that shall be performed after this action.

subtype: Annotated[Literal['GoTo', 'GoToR', 'GoToE', 'GoToDPart', 'Launch', 'Thread', 'URI', 'Sound', 'Movie', 'Hide', 'Named', 'SubmitForm', 'ResetForm', 'ImportData', 'SetOCGState', 'Rendition', 'Trans', 'GoTo3DView', 'JavaScript', 'RichMediaExecute'], 'name']

The type of action.

Refer to ISO 32000-2:2020 “Table 201 - Action types” for available types.

class pdfnaut.objects.actions.GoToAction[source]

Bases: Action

A go-to action changes the view to a specified destination.

See ISO 32000-2:2020 § 12.6.4.2 “Go-To actions” for details.

__init__(destination: PdfName | PdfHexString | bytes | Destination, next_action: list[Action] | Action | None = None) None[source]
property destination: PdfName | PdfHexString | bytes | Destination

The destination to jump to.

subtype: Annotated[Literal['GoTo', 'GoToR', 'GoToE', 'GoToDPart', 'Launch', 'Thread', 'URI', 'Sound', 'Movie', 'Hide', 'Named', 'SubmitForm', 'ResetForm', 'ImportData', 'SetOCGState', 'Rendition', 'Trans', 'GoTo3DView', 'JavaScript', 'RichMediaExecute'], 'name']

The type of action.

Refer to ISO 32000-2:2020 “Table 201 - Action types” for available types.

class pdfnaut.objects.actions.URIAction[source]

Bases: Action

A URI action causes a URI or uniform resource identifier to be resolved.

See ISO 32000-2:2020 § 12.6.4.8 “URI actions” for details.

__init__(uri: str, is_map: bool = False, next_action: list[Action] | Action | None = None) None[source]
is_map: bool

Whether to track the mouse position when the URI is resolved.

subtype: Annotated[Literal['GoTo', 'GoToR', 'GoToE', 'GoToDPart', 'Launch', 'Thread', 'URI', 'Sound', 'Movie', 'Hide', 'Named', 'SubmitForm', 'ResetForm', 'ImportData', 'SetOCGState', 'Rendition', 'Trans', 'GoTo3DView', 'JavaScript', 'RichMediaExecute'], 'name']

The type of action.

Refer to ISO 32000-2:2020 “Table 201 - Action types” for available types.

uri: str

The uniform resource identifier (URI) to resolve.

pdfnaut.objects.actions.action_into(mapping: PdfDictionary) Action[source]

Converts a dictionary mapping into a corresponding Action subclass.

Annotation Objects

class pdfnaut.objects.annotations.Annotation[source]

Bases: PdfDictionary

An annotation associates an object such as a note, link, or multimedia element with a location on a page of a PDF document.

See ISO 32000-2:2020 § 12.5 “Annotations” for details.

__init__(kind: Literal['Text', 'Link', 'FreeText', 'Line', 'Square', 'Circle', 'Polygon', 'PolyLine', 'Highlight', 'Underline', 'Squiggly', 'StrikeOut', 'Caret', 'Stamp', 'Ink', 'Popup', 'FileAttachment', 'Sound', 'Movie', 'Screen', 'Widget', 'PrinterMark', 'TrapNet', 'Watermark', '3D', 'Redact', 'Projection', 'RichMedia'], rect: Iterable[float], contents: str, name: str, *, indirect_ref: PdfReference | None = None) None[source]
color: PdfArray[float] | None

An array of 0 to 4 numbers in the range 0.0 to 1.0, representing a color used for the following purposes:

  • The background of the annotation’s icon when closed.

  • The title bar of the annotation’s popup window.

  • The border of a link annotation.

The number of array elements determines the color space in which the color shall be defined: 0 is no color or transparent; 1 is grayscale; 3 is RGB; and 4 is CMYK.

contents: str

The text contents that shall be displayed when the annotation is open or, if this annotation kind does not display text, an alternate description of the annotation’s contents.

flags: AnnotationFlags

Flags specifying various characteristics of the annotation.

kind: Literal['Text', 'Link', 'FreeText', 'Line', 'Square', 'Circle', 'Polygon', 'PolyLine', 'Highlight', 'Underline', 'Squiggly', 'StrikeOut', 'Caret', 'Stamp', 'Ink', 'Popup', 'FileAttachment', 'Sound', 'Movie', 'Screen', 'Widget', 'PrinterMark', 'TrapNet', 'Watermark', '3D', 'Redact', 'Projection', 'RichMedia']

2020 “Table 171 — Annotation types” for details.

Type:

The kind of annotation. See ISO 32000-2

language: str | None

(PDF 2.0) A language identifier specifying the natural language for all text in the annotation except where overridden by other explicit language specifications

See ISO 32000-2:2020 § 14.9.2 “Natural language specification” for details.

last_modified: str | None

The date and time the annotation was most recently modified. This value should be a PDF date string but PDF processors are expected to accept and display a string in any format.

name: str

An annotation name uniquely identifying the annotation among others in its page.

rect: PdfArray[float]

A rectangle specifying the location of the annotation in the page.

class pdfnaut.objects.annotations.AnnotationBorderStyle[source]

Bases: PdfDictionary

The border style for the outline that surrounds an annotation.

See ISO 32000-2:2020 § 12.5.4 “Border styles” for details.

__init__(width=1, style='S', dash_pattern=None)
dash_pattern: list[int | float] | None

The dash pattern that will be used for the border if the style specified is dashed. The array consists of alternating dashes and gaps. The dash phase is not specified and is assumed to be zero.

style: Literal['S', 'D', 'B', 'I', 'U']

The border style. May be either of the following:

  • S: A solid rectangle.

  • D: A dashed rectangle specified by AnnotationBorderStyle.dash_pattern.

  • B: A simulated embossed (beveled) rectangle.

  • I: A simulated engraved (inset) rectangle.

  • U: An underline.

width: float

The border width in points.

class pdfnaut.objects.annotations.AnnotationFlags[source]

Bases: IntFlag

Flags for a particular annotation.

See ISO 32000-2:2020 § 12.5.3 “Annotation flags” for details.

HIDDEN = 2

Do not render the annotation or allow user interaction with it.

INVISIBLE = 1

If the annotation is non-standard, do not render or print the annotation.

If this flag is clear, the annotation shall be rendered according to its appearance stream.

LOCKED = 128

Do not allow the annotation to be removed or its properties to be modified but still allow its contents to be modified.

LOCKED_CONTENTS = 512

Do not allow the contents of the annotation to be modified.

NO_ROTATE = 16

Do not rotate the annotation to match the page’s rotation.

NO_VIEW = 32

Do not render the annotation or allow user interaction with it, but still allow printing according to the AnnotationFlags.PRINT flag.

NO_ZOOM = 8

Do not scale the annotation’s appearance to the page’s zoom factor.

NULL = 0

A default value meaning that no flags are set.

PRINT = 4

Print the annotation when the page is printed unless AnnotationFlags.HIDDEN is set. If clear, do not print the annotation.

READ_ONLY = 64

Do not allow user interaction with the annotation. This is ignored for Widget annotations.

TOGGLE_NO_VIEW = 256

Toggle the AnnotationFlags.NO_VIEW flag when selecting or hovering over the annotation.

__new__(value)
class pdfnaut.objects.annotations.AnnotationList[source]

Bases: MutableSequence[Annotation]

A mutable sequence representing the list of annotations (the Annots key) in a page object.

__init__(array: PdfArray, pdf: PdfParser | None = None) None[source]
append(value: Annotation) None[source]

Appends an annotation value to the list.

clear() None[source]

Clears the annotation list.

extend(values: Iterable[Annotation]) None[source]

Extends the annotation list by appending values to its end.

insert(index: int, value: Annotation) None[source]

Inserts an annotation value at index.

pop(index: int = -1) Annotation[source]

Pops an annotation at index.

remove(value: Annotation) None[source]

Removes an annotation value from the list.

reverse() None[source]

Reverses the annotation list.

class pdfnaut.objects.annotations.AnnotationReplyType[source]

Bases: Enum

The reply type or relationship between an annotation and its annotation’s MarkupAnnotation.in_reply_to value.

GROUP = 0

The annotation shall be grouped with the annotation replied to.

REPLY = 0

The annotation is considered a reply to another annotation.

class pdfnaut.objects.annotations.LinkAnnotation[source]

Bases: Annotation

A link annotation represents either a hypertext link to a location within the document or an action to perform.

See ISO 32000-2:2020 § 12.5.6.5 “Link annotations” for details.

__init__(rect: Iterable[float], contents: str, name: str, action: Action | None = None, destination: PdfName | PdfHexString | bytes | Destination | None = None, *, indirect_ref: PdfReference | None = None) None[source]
property action: Action | None

The action that shall be performed when the link annotation is triggered.

property border_style: AnnotationBorderStyle | None

The border style specifying the line width and dash pattern that shall be used when drawing the annotation outline.

color: PdfArray[float] | None

An array of 0 to 4 numbers in the range 0.0 to 1.0, representing a color used for the following purposes:

  • The background of the annotation’s icon when closed.

  • The title bar of the annotation’s popup window.

  • The border of a link annotation.

The number of array elements determines the color space in which the color shall be defined: 0 is no color or transparent; 1 is grayscale; 3 is RGB; and 4 is CMYK.

contents: str

The text contents that shall be displayed when the annotation is open or, if this annotation kind does not display text, an alternate description of the annotation’s contents.

property destination: PdfName | PdfHexString | bytes | Destination | None

The destination that shall be displayed when the link annotation is triggered.

flags: AnnotationFlags

Flags specifying various characteristics of the annotation.

highlight_mode: Literal['N', 'I', 'O', 'P']

The annotation’s highlight mode. May be either of the following:

  • N: No highlight.

  • I: Invert the contents of the annotation rectangle (default).

  • O: Invert the annotation’s border/outline.

  • P: Display the annotation as if it were being pushed below the surface of the page.

kind: Literal['Text', 'Link', 'FreeText', 'Line', 'Square', 'Circle', 'Polygon', 'PolyLine', 'Highlight', 'Underline', 'Squiggly', 'StrikeOut', 'Caret', 'Stamp', 'Ink', 'Popup', 'FileAttachment', 'Sound', 'Movie', 'Screen', 'Widget', 'PrinterMark', 'TrapNet', 'Watermark', '3D', 'Redact', 'Projection', 'RichMedia']

2020 “Table 171 — Annotation types” for details.

Type:

The kind of annotation. See ISO 32000-2

language: str | None

(PDF 2.0) A language identifier specifying the natural language for all text in the annotation except where overridden by other explicit language specifications

See ISO 32000-2:2020 § 14.9.2 “Natural language specification” for details.

last_modified: str | None

The date and time the annotation was most recently modified. This value should be a PDF date string but PDF processors are expected to accept and display a string in any format.

name: str

An annotation name uniquely identifying the annotation among others in its page.

quad_points: PdfArray[float] | None

A sequence of n quadrilaterals, comprised of 8 numbers representing the coordinates in default user space that comprise the region in which the link should be activated.

Item order: x1, y1, x2, y2, x3, y3, x4, y4

rect: PdfArray[float]

A rectangle specifying the location of the annotation in the page.

class pdfnaut.objects.annotations.MarkupAnnotation[source]

Bases: Annotation

A markup annotation is a type of annotation used primarily to mark PDF documents.

See ISO 32000-2:2020 § 12.5.6.2 “Markup annotations” for details.

__init__(kind: Literal['Text', 'Link', 'FreeText', 'Line', 'Square', 'Circle', 'Polygon', 'PolyLine', 'Highlight', 'Underline', 'Squiggly', 'StrikeOut', 'Caret', 'Stamp', 'Ink', 'Popup', 'FileAttachment', 'Sound', 'Movie', 'Screen', 'Widget', 'PrinterMark', 'TrapNet', 'Watermark', '3D', 'Redact', 'Projection', 'RichMedia'], rect: Iterable[float], contents: str, name: str, *, indirect_ref: PdfReference | None = None) None[source]
color: PdfArray[float] | None

An array of 0 to 4 numbers in the range 0.0 to 1.0, representing a color used for the following purposes:

  • The background of the annotation’s icon when closed.

  • The title bar of the annotation’s popup window.

  • The border of a link annotation.

The number of array elements determines the color space in which the color shall be defined: 0 is no color or transparent; 1 is grayscale; 3 is RGB; and 4 is CMYK.

contents: str

The text contents that shall be displayed when the annotation is open or, if this annotation kind does not display text, an alternate description of the annotation’s contents.

creation_date: datetime | None

The datetime the annotation was created.

flags: AnnotationFlags

Flags specifying various characteristics of the annotation.

property in_reply_to: Annotation | None

The annotation that this annotation is in reply to.

kind: Literal['Text', 'Link', 'FreeText', 'Line', 'Square', 'Circle', 'Polygon', 'PolyLine', 'Highlight', 'Underline', 'Squiggly', 'StrikeOut', 'Caret', 'Stamp', 'Ink', 'Popup', 'FileAttachment', 'Sound', 'Movie', 'Screen', 'Widget', 'PrinterMark', 'TrapNet', 'Watermark', '3D', 'Redact', 'Projection', 'RichMedia']

2020 “Table 171 — Annotation types” for details.

Type:

The kind of annotation. See ISO 32000-2

language: str | None

(PDF 2.0) A language identifier specifying the natural language for all text in the annotation except where overridden by other explicit language specifications

See ISO 32000-2:2020 § 14.9.2 “Natural language specification” for details.

last_modified: str | None

The date and time the annotation was most recently modified. This value should be a PDF date string but PDF processors are expected to accept and display a string in any format.

name: str

An annotation name uniquely identifying the annotation among others in its page.

rect: PdfArray[float]

A rectangle specifying the location of the annotation in the page.

property reply_type: AnnotationReplyType | str | None

The relationship or reply type between this annotation and the one in in_reply_to.

subject: str | None

A short description of the subject being addressed by the annotation.

title: str | None

The text label to display as the title of the annotation’s popup window. This shall identify the user who added the annotation.

class pdfnaut.objects.annotations.TextAnnotation[source]

Bases: MarkupAnnotation

A text annotation represents a sticky note attached to a point in the PDF document. When closed, it shall appear as an icon (defined by TextAnnotation.icon); when open, it shall display a popup window containing the text of the note.

See ISO 32000-2:2020 § 12.5.6.4 “Text annotations” for details.

__init__(rect: Iterable[float], contents: str, name: str, is_open: bool = False, icon: str = 'Note', *, indirect_ref: PdfReference | None = None) None[source]
color: PdfArray[float] | None

An array of 0 to 4 numbers in the range 0.0 to 1.0, representing a color used for the following purposes:

  • The background of the annotation’s icon when closed.

  • The title bar of the annotation’s popup window.

  • The border of a link annotation.

The number of array elements determines the color space in which the color shall be defined: 0 is no color or transparent; 1 is grayscale; 3 is RGB; and 4 is CMYK.

contents: str

The text contents that shall be displayed when the annotation is open or, if this annotation kind does not display text, an alternate description of the annotation’s contents.

creation_date: datetime | None

The datetime the annotation was created.

flags: AnnotationFlags

Flags specifying various characteristics of the annotation.

icon: Annotated[str, 'name']

The name of an icon that shall be used when displaying the annotation.

The icon name may be any of the following standard names or any other supported value.

Standard names: Comment, Key, Note, Help, NewParagraph, Paragraph, and Insert.

is_open: bool

Whether the annotation is initially displayed open.

kind: Literal['Text', 'Link', 'FreeText', 'Line', 'Square', 'Circle', 'Polygon', 'PolyLine', 'Highlight', 'Underline', 'Squiggly', 'StrikeOut', 'Caret', 'Stamp', 'Ink', 'Popup', 'FileAttachment', 'Sound', 'Movie', 'Screen', 'Widget', 'PrinterMark', 'TrapNet', 'Watermark', '3D', 'Redact', 'Projection', 'RichMedia']

2020 “Table 171 — Annotation types” for details.

Type:

The kind of annotation. See ISO 32000-2

language: str | None

(PDF 2.0) A language identifier specifying the natural language for all text in the annotation except where overridden by other explicit language specifications

See ISO 32000-2:2020 § 14.9.2 “Natural language specification” for details.

last_modified: str | None

The date and time the annotation was most recently modified. This value should be a PDF date string but PDF processors are expected to accept and display a string in any format.

name: str

An annotation name uniquely identifying the annotation among others in its page.

rect: PdfArray[float]

A rectangle specifying the location of the annotation in the page.

subject: str | None

A short description of the subject being addressed by the annotation.

title: str | None

The text label to display as the title of the annotation’s popup window. This shall identify the user who added the annotation.

pdfnaut.objects.annotations.annotation_into(annot: PdfDictionary, *, indirect_ref: PdfReference | None = None) Annotation[source]

Converts a mapping annot into an instance of Annotation or one of its subclasses according to the annotation subtype.

Catalog Objects

class pdfnaut.objects.catalog.DeveloperExtension[source]

Bases: PdfDictionary

An entry in an extension dictionary.

See ISO 32000-2:2020 § 7.12.3 “Developer extensions dictionary” for details.

__init__(base_version, level, url=None, revision=None)
base_version: Annotated[str, 'name']

The PDF version to which this extension applies. This value shall be consistent with the syntax used for the Version entry of the document catalog dictionary.

level: int

An developer-defined integer denoting the extension being used.

If the developer introduces more than one extension to a given base version, the extension level assigned by the developer should increase over time.

revision: str | None

(PDF 2.0) Additional revision information on the extension level being used.

url: str | None

(PDF 2.0) A URL referring to the documentation for this extension.

class pdfnaut.objects.catalog.ExtensionMap[source]

Bases: PdfDictionary

A map defining developer extensions in a document.

See ISO 32000-2:2020 § 7.12 “Extensions dictionary” for details.

query(key: str) DeveloperExtension | list[DeveloperExtension][source]

Returns a developer-defined extension (or a sequence of them) for a base prefix key.

class pdfnaut.objects.catalog.MarkInfo[source]

Bases: PdfDictionary

Information relevant to specialized uses of structured PDF documents.

See ISO 32000-2:2020 § 14.7 “Logical structure” for details.

__init__(marked=False, suspects=False, user_properties=False)
marked: bool

Whether the document claims to conform to tagged PDF conventions.

suspects: bool

(PDF 1.6; deprecated in PDF 2.0) Whether the document includes tag suspects which are applied for marked content elements whose page content order could not be determined.

In such case, the document may not fully conform to tagged PDF conventions.

user_properties: bool

(PDF 1.6) Whether structure elements including user properties are present in the document.

See ISO 32000-2:2020 § 14.7.6.4 “User properties” for details.

class pdfnaut.objects.catalog.UserAccessPermissions[source]

Bases: IntFlag

User access permissions as specified in the P entry of the document’s standard encryption dictionary.

See ISO 32000-2:2020 “Table 22 - Standard security handler user access permissions” for details.

ACCESSIBILITY = 512

(deprecated in PDF 2.0) Extract content for the purposes of accessibility.

This bit should always be set for compatibility with processors supporting earlier specifications.

ASSEMBLE_DOCUMENT = 1024

For security revision 3 or greater, assemble the document (i.e. insert, rotate, and delete pages, create outlines, etc.), even if MODIFY is clear.

COPY_CONTENT = 16

Copy or extract text and graphics. Assistive technology should assume this bit as set for its purposes, as per ACCESSIBILITY.

FAITHFUL_PRINT = 2048

For security revision 3 or greater, print the document in such a way that a faithful digital representation of the PDF can be generated.

If this bit is not set (and PRINT is set), printing shall be limited to a low-level representation, possibly of lower quality.

FILL_FORM_FIELDS = 256

For security revision 3 or greater, fill existing interactive form fields, even if MANAGE_ANNOTATIONS is clear.

MANAGE_ANNOTATIONS = 32

Add or modify text annotations, fill interactive form fields and, depending on whether MODIFY is set, create and modify form fields.

MODIFY = 8

Modify the contents of the document. May be influenced by MANAGE_ANNOTATIONS, FILL_FORM_FIELDS, and ASSEMBLE_DOCUMENT.

PRINT = 4

For security revision 2 or greater, Print the document. If the document uses revision 3 or greater, print quality may be influenced by FAITHFUL_PRINT.

__new__(value)
class pdfnaut.objects.catalog.ViewerPreferences[source]

Bases: PdfDictionary

The viewer preferences dictionary specifying the way a PDF viewer shall display a document on the screen.

See § 12.2, “Viewer preferences” for details.

__init__(hide_toolbar=False, hide_menubar=False, hide_window_ui=False, fit_window=False, center_window=False, display_doc_title=False, non_full_screen_page_mode='UseNone', direction='L2R', view_area='CropBox', view_clip='CropBox', print_area='CropBox', print_clip='CropBox', print_scaling='AppDefault', duplex=None, pick_tray_by_pdf_size=None, print_page_range=None, num_copies=None)
center_window: bool

Whether to center the document’s window position on the screen.

direction: Literal['L2R', 'R2L']

The predominant logical content order for text. Either ‘L2R’ (left to right, default) or ‘R2L’ (right to left). This is effectively a display hint and has no direct effect on the contents of the document.

display_doc_title: bool

(PDF 1.4) Whether the document’s window title should display the title described in the document’s metadata. If False, the title bar should instead display the name of the PDF file containing the document.

duplex: Literal['Simplex', 'DuplexFlipShortEdge', 'DuplexFlipLongEdge'] | None

The paper handling option to use when printing the document. Should be either of:

  • Simplex: Print single-sided

  • DuplexFlipShortEdge: Duplex, flip on the short edge of the sheet

  • DuplexFlipLongEdge: Duplex, flip on the long edge of the sheet

If this value is none, the document producer may choose their own default setting.

property enforce: list[Literal['PrintScaling']] | None

(PDF 2.0) An array of names of viewer preferences that shall be enforced by PDF processors and that shall not be overridden by subsequent selections in the application user interface.

fit_window: bool

Whether to resize the document’s window to fit the size of the page.

hide_menubar: bool

Whether to hide the interactive PDF processor’s menubar when the document is active.

hide_toolbar: bool

Whether to hide the interactive PDF processor’s toolbars when the document is active.

hide_window_ui: bool

Whether to hide UI elements in the document’s window (such as scroll bars or navigation controls), leaving only the document’s contents displayed.

non_full_screen_page_mode: Literal['UseNone', 'UseOutlines', 'UseThumbs', 'UseOC']

The document’s page mode displayed when exiting full-screen mode. This property is only relevant if the PageMode entry in the catalog is set to ‘FullScreen’ and should be ignored otherwise. Accepted values are ‘UseNone’, ‘UseOutlines’, ‘UseThumbs’, and ‘UseOC’.

num_copies: int | None

The number of copies that shall be printed when the print dialog is opened for this file.

If this value is none, the document producer may choose their own default setting, though this setting is usually 1.

pick_tray_by_pdf_size: bool | None

Whether the PDF page size shall be used to select the input paper tray. This setting influences only the preset values used to populate the print dialog. This setting has no effect on systems that do not provide the ability to pick the input tray by size.

If this value is none, the document producer may choose their own default setting.

print_area: Literal['MediaBox', 'CropBox', 'BleedBox', 'TrimBox', 'ArtBox']

(deprecated in PDF 2.0) The name of the page boundary representing the area of a page that shall be rendered when printing the document. Similar to ViewArea, the value should be the key of the relevant page boundary in a page object.

print_clip: Literal['MediaBox', 'CropBox', 'BleedBox', 'TrimBox', 'ArtBox']

(deprecated in PDF 2.0) The name of the page boundary representing to which the contents of a page shall be clipped when printing the document. Similar to ViewArea, the value should be the key of the relevant page boundary in a page object.

print_page_range: PdfArray[int] | None

The page numbers used to initialize the print dialog box. The array should contain an even number of values interpreted as pairs, with each pair specifying the first and last pages in a sub-range of pages to be printed (the first page being denoted by the number 1).

If this value is none, the document producer may choose their own default setting.

print_scaling: Literal['None', 'AppDefault']

The page scaling option to select when a print dialog is displayed for this document.

Accepted values are ‘None’ meaning no page scaling or ‘AppDefault’ (default) indicating that the interactive PDF processor should select its default print scaling value.

view_area: Literal['MediaBox', 'CropBox', 'BleedBox', 'TrimBox', 'ArtBox']

(deprecated in PDF 2.0) The name of the page boundary representing the area of a page that shall be displayed when viewing the document on the screen. The value should be the key of the relevant page boundary in a page object. If no such boundary is defined, the default value (‘CropBox’) is used.

Accepted values are ‘CropBox’, ‘MediaBox’, ‘BleedBox’, ‘TrimBox’, and ‘ArtBox’.

view_clip: Literal['MediaBox', 'CropBox', 'BleedBox', 'TrimBox', 'ArtBox']

(deprecated in PDF 2.0) The name of the page boundary representing to which the contents of a page shall be clipped when viewing the document. Similar to ViewArea, the value should be the key of the relevant page boundary in a page object.

Outline Objects

class pdfnaut.objects.outlines.OutlineItem[source]

Bases: PdfDictionary

An outline item within the outline tree.

See ISO 32000-2:2020 “Table 151 - Entries in an outline item dictionary” for details.

__init__(text: str, flags: OutlineItemFlags = OutlineItemFlags.NULL, destination: PdfName | PdfHexString | bytes | Destination | None = None, action: Action | None = None, color: PdfArray[int | float] | None = None, *, pdf: PdfParser | None = None, indirect_ref: PdfReference | None = None) None[source]
property action: Action | None

The action that shall be triggered when the item is activated.

property children: OutlineList

The immediate children of the outline item.

close() None[source]

If the item has children, closes the outline item and hides the immediate children.

property color: PdfArray[int | float]

The color that shall be used for the outline item text, as an array of RGB color components in the range 0 to 1.

property destination: PdfName | PdfHexString | bytes | Destination | None

The destination that shall be displayed when the item is activated, either a named destination (a name or byte string) or an explicit destination (a Destination object).

property first: OutlineItem | None

The first child item of the outline if any.

flags: OutlineItemFlags

A set of bit flags describing characteristics of the outline item text.

property last: OutlineItem | None

The last child item of the outline if any.

property next: OutlineItem | None

The next item at the current outline level if any.

open() None[source]

If the item has children, opens the outline item and displays the immediate children (and its descendants if they are also visible).

property parent: OutlineItem | OutlineTree

The parent outline item or tree containing this outline.

property previous: OutlineItem | None

The previous item at the current outline level if any.

text: str

The display text for this outline item.

property visible_items: int
  • If the outline item is open, the number of visible descendent outline items.

  • If the outline item is closed, a negative number representing the number of descendants that would be visible if the item were opened.

  • If the outline item has no children, zero.

class pdfnaut.objects.outlines.OutlineItemFlags[source]

Bases: IntFlag

Flags specifying style characteristics for an outline item. See “Table 152 - Outline item flags” for details.

BOLD = 2

Display the outline item text in bold.

ITALIC = 1

Display the outline item text in italic.

NULL = 0

No flags

__new__(value)
class pdfnaut.objects.outlines.OutlineList[source]

Bases: MutableSequence[OutlineItem]

The outline list representing the children of an outline tree or item.

Warning

This class is not designed to be constructed by a user. Using the outline list should be done via OutlineTree and OutlineItem.

__init__(pdf: PdfParser, parent: OutlineItem | OutlineTree) None[source]
append(value: OutlineItem) None[source]

Appends an outline item value to the immediate children of the list.

clear() None[source]

Removes all children in the outline item.

count(value: Any) int[source]

Returns the amount of times outline item value appears in the page list.

extend(values: Iterable[OutlineItem]) None[source]

Appends a list of outline items values to the end of the outline list.

index(value: Any, start: int = 0, stop: int = sys.maxsize) int[source]

Returns the index at which outline item value was first found in the range of start included to stop excluded.

insert(index: int, value: OutlineItem) None[source]

S.insert(index, value) – insert value before index

pop(index: int = -1) OutlineItem[source]

Removes the outline item at index from the immediate children of this outline list.

Raises:

IndexError – The outline list is empty or the item is not in the list.

Returns:

The outline item that was popped.

Return type:

OutlineItem

remove(value: OutlineItem) None[source]

Removes the first occurrence of outline item value in the immediate children of this tree.

Raises:

IndexError – The outline list is empty or the item is not in the list.

reverse() None[source]

S.reverse() – reverse IN PLACE

class pdfnaut.objects.outlines.OutlineTree[source]

Bases: PdfDictionary

The document outline tree containing a hierarchy of outline items that allow navigating throughout the document.

See ISO 32000-2:2020 § 12.3.3 “Document outline” for details.

Warning

This class is not designed to be constructed by a user. To add an outline tree to a document, PdfDocument.new_outline() should be used.

__init__(pdf: PdfParser, tree: PdfDictionary, tree_ref: PdfReference) None[source]
property children: OutlineList

The immediate children of the outline tree.

close() None[source]

If the item has children, closes all outline items within the tree.

property first: OutlineItem | None

The first outline item in the tree.

property last: OutlineItem | None

The last outline item in the tree.

open() None[source]

If the item has children, opens all outline items within the tree.

property visible_items: int

The total number of visible outline items at all levels of the tree.

pdfnaut.objects.outlines.flatten_outlines(item: OutlineItem | OutlineTree) Generator[OutlineItem, None, None][source]

Yields the immediate children of the outline item.

pdfnaut.objects.outlines.get_count(item: OutlineTree | OutlineItem) int[source]

Calculates the count of visible items within an outline item or tree.

pdfnaut.objects.outlines.is_outline_tree(item: PdfDictionary) bool[source]

Reports whether a dictionary item is an outline tree.

pdfnaut.objects.outlines.update_ancestor_count(item: OutlineTree | OutlineItem) None[source]

Recalculates the visible item count for the outline item, reflecting this count in the ancestors.

Page Objects

class pdfnaut.objects.page.Page[source]

Bases: PdfDictionary

A page in a PDF document (see ISO 32000-2:2020 § 7.7.3.3 “Page objects”).

Parameters:
  • size (tuple[float, float]) – The width and height of the physical medium in which the page should be printed or displayed. Values shall be provided in multiples of 1/72 of an inch (points).

  • pdf (PdfParser, optional) –

    The PDF document that this page belongs to.

    In typical usage, this value need not be specified. pdfnaut will take care of populating it.

  • indirect_ref (PdfReference, optional) –

    The indirect reference that this page object is referred to by.

    As with pdf, this value need not be specified in typical usage.

__init__(size: tuple[float, float], *, pdf: PdfParser | None = None, indirect_ref: PdfReference | None = None) None[source]
property annotations: AnnotationList | None

All annotations associated with this page. If a page does not specify a list of annotations, this field is none.

artbox: PdfArray[float] | None

A rectangle defining the extent of the page’s meaningful content as intended by the page’s creator.

If none, the artbox is the same as the cropbox.

bleedbox: PdfArray[float] | None

A rectangle defining the region to which the contents of the page shall be clipped when output in a production environment.

If none, the bleedbox is the same as the cropbox.

property content_stream: ContentStreamTokenizer | None

An iterator over the instructions producing the contents of this page.

cropbox: PdfArray[float] | None

A rectangle defining the visible region of the page.

If none, the cropbox is the same as the mediabox.

mediabox: PdfArray[float]

A rectangle defining the boundaries of the physical medium in which the page should be printed or displayed.

metadata: PdfStream | None

A metadata stream, generally written in XMP, containing information about this page.

new_annotations() None[source]

Creates a new annotation list.

resources: PdfDictionary | None

Resources required by the page contents.

If the page requires no resources, this should return an empty resource dictionary. If the page inherits its resources from an ancestor, this should return None.

rotation: int

The number of degrees by which the page shall be visually rotated clockwise. The value is a multiple of 90 (by default, 0).

tab_order: Literal['R', 'C', 'S', 'A', 'W'] | None

(optional; PDF 1.5) The tab order to be used for annotations on the page. If present, it shall be one of the following values:

  • R: Row order

  • C: Column order

  • S: Logical structure order

  • A: Annotations array order (PDF 2.0)

  • W: Widget order (PDF 2.0)

trimbox: PdfArray[float] | None

A rectangle defining the intended dimensions of the finished page after trimming.

If none, the trimbox is the same as the cropbox.

user_unit: float

The size of a user space unit, in multiples of 1/72 of an inch (by default, 1).

Trailer Objects

class pdfnaut.objects.trailer.Info[source]

Bases: PdfDictionary

Document-level metadata representing the structure described in ISO 32000-2:2020 § 14.3.3 “Document information dictionary”.

Since PDF 2.0, most of the attributes here have been deprecated in favor of their equivalents in the document-level metadata stream (see PdfDocument.xmp_info), with exception of Info.creation_date and Info.modify_date.

__init__(title=None, author=None, subject=None, keywords=None, creator=None, producer=None, creation_date=None, modify_date=None, trapped=None)
author: str | None

The name of the person who created the document.

creation_date: datetime | None

The date and time the document was created, in human-readable form.

creation_date_raw: str | None

The date and time the document was created, as a text string.

creator: str | None

If the document was converted to PDF from another format (ex. DOCX), the name of the PDF processor that created the original document from which it was converted (ex. Microsoft Word).

keywords: str | None

Keywords associated with the document.

modify_date: datetime | None

The date and time the document was most recently modified, in human-readable form.

modify_date_raw: str | None

The date and time the document was most recently modified, as a text string.

producer: str | None

If the document was converted to PDF from another format (ex. PostScript), the name of the PDF processor that converted it to PDF (ex. Adobe Distiller).

subject: str | None

The subject or topic of the document.

title: str | None

The document’s title.

trapped: Literal['True', 'False', 'Unknown'] | None

A value reporting whether the document has been modified to include trapping information (see ISO 32000-2:2020 § 14.11.6 “Trapping support”).

XMP Objects

class pdfnaut.objects.xmp.XMPDateProperty[source]

Bases: XMPProperty

An XMP Date property – an ISO 8601 date string, or specifically, the subset specified in https://www.w3.org/TR/NOTE-datetime.

See https://developer.adobe.com/xmp/docs/XMPNamespaces/XMPDataTypes/#date.

class pdfnaut.objects.xmp.XMPLangAltProperty[source]

Bases: XMPProperty

An XMP Language Alternative property – an alternative array of simple text items facilitating the selection of a text item based on a desired language.

In this case, this array is represented as a mapping of language names to text items corresponding to each language. The language name should be a value as defined in RFC 3066, composed of a primary language subtag and an optional series of subsequent subtags.

The default value, if known, should be the first item in the dictionary. A default value may also be explicitly marked by setting its language to ‘x-default’.

See https://developer.adobe.com/xmp/docs/XMPNamespaces/XMPDataTypes/#language-alternative.

class pdfnaut.objects.xmp.XMPListProperty[source]

Bases: XMPProperty

An array valued XMP property – in this context, either an RDF sequence, used for ordered arrays, or an RDF bag, used for unordered arrays.

See § 7.7 “Array valued XMP properties” in Part 1 of the XMP specification.

class pdfnaut.objects.xmp.XMPProperty[source]

Bases: object

An XMP property included in an XMP packet.

__init__(namespace_uri: str, local_name: str, **extra: Any) None[source]
extra

Any additional property-specific values.

local_name

The local name of this property.

namespace_uri

The namespace URI of this property.

class pdfnaut.objects.xmp.XMPTextProperty[source]

Bases: XMPProperty

An XMP Text property – a possibly empty Unicode string.

class pdfnaut.objects.xmp.XmpMetadata[source]

Bases: object

An object representing Extensible Metadata Platform (XMP) metadata, either pertaining to an entire document or to a particular resource.

For information about XMP, see https://developer.adobe.com/xmp/docs/.

Parameters:

stream (PdfStream, optional) – The XMP packet to parse as a PDF stream. If stream is None, a new stream containing a packet will be created.

Raises:

PdfParseError – If stream does not contain a valid XMP packet.

__init__(stream: PdfStream | None = None) None[source]
dc_creator

The entities primarily responsible for creating this resource.

dc_description

Textual descriptions of this resource as a mapping of language names to items.

dc_format

The MIME type of this resource.

dc_rights

Rights statements pertaining to this resource.

dc_subject

The topics or descriptions specifying the content of this resource.

dc_title

The titles or names given to this resource as a mapping of language names to titles.

packet

The XMP packet as an XML document.

pdf_keywords

Keywords associated with the document.

pdf_pdfversion

The PDF file version. For example, ‘1.0’ or ‘1.3’.

pdf_producer

The name of the tool that produced this PDF document.

pdf_trapped

Whether the document has been modified to include trapping information (see § 14.11.6, “Trapping support”).

rdf_root

The RDF root of the packet being parsed.

stream: PdfStream

The XMP packet as a string.

xmp_create_date

The datetime this resource was created. This need not match the file system creation date.

xmp_creator_tool

The name of the first known tool that created this resource.

xmp_metadata_date

The datetime this metadata was last modified. It should be the same or more recent than modify_date.

xmp_modify_date

The datetime this resource was last modified.

pdfnaut.objects.xmp.get_full_text(element: Element) str[source]

Returns the full text content within element.

pdfnaut.objects.xmp.lookup_prefix_for_ns(node: Node, namespace: str) tuple[str, Node] | None[source]

Locates a namespace prefix matching the namespace URI in node. Returns either a tuple of two items containing, in order, the prefix of the namespace URI and the node where it was found, or None, if no prefix is registered for the namespace URI.

This is an implementation of https://dom.spec.whatwg.org/#locate-a-namespace-prefix.