COS Objects Reference

The PDF 2.0 specification defines the following basic object types:

pdfnaut Object Mapping

PDF Object

Python Object

Booleans (true/false)

bool

Integers (123)

int

Real numbers (123.456)

float

Literal strings ((hello world))

bytes

Hexadecimal strings (<616263>)

PdfHexString

Names (/Type)

PdfName

Arrays ([1 2 3])

PdfArray

Dictionaries (<< /Type /Catalog ... >>)

PdfDictionary

Streams

PdfStream

Null

PdfNull

Indirect references (1 0 R)

PdfReference

The spec also defines general-purpose data structures built from the basic object types.

  • Strings are divided into:

    • ASCII strings.

    • Byte strings: hex strings or literal strings containing binary data.

    • PDFDocEncoded strings.

    • Text strings: encoded in either PDFDocEncoding, UTF-16BE or UTF-8; the latter of which was introduced in PDF 2.0.

  • Dates: implemented via encode_iso8824() and parse_iso8824().

  • Name trees and number trees: see NameTree and NumberTree.

  • The following data structures do not currently have a dedicated type:

    • File specifications

    • Functions

    • Rectangles

    • Text streams

Base Objects

pdfnaut.cos.objects.base.PdfObject

alias of bool | int | float | bytes | PdfArray | PdfDictionary | PdfStream | PdfHexString | PdfName | PdfReference | PdfNull

class pdfnaut.cos.objects.base.PdfComment[source]

Bases: object

A comment introduced by the presence of the percent sign (%) outside a string or inside a content stream. Comments have no syntactical meaning and shall be interpreted as whitespace (see ISO 32000-2:2020 § 7.2.4 “Comments”).

__init__(value: bytes) None
value: bytes

The value of this comment.

class pdfnaut.cos.objects.base.PdfHexString[source]

Bases: object

A string of characters encoded in hexadecimal useful for including arbitrary binary data in a PDF (see ISO 32000-2:2020 § 7.3.4.3 “Hexadecimal Strings”).

__init__(raw: bytes) None
classmethod from_raw(data: bytes) Self[source]

Creates a hexadecimal string from data.

raw: bytes

The hex value of the string.

property value: bytes

The decoded value of the hex string.

class pdfnaut.cos.objects.base.PdfInlineImage[source]

Bases: object

A PDF inline image within a content stream (see ISO 32000-2:2020 § 8.9.7 “Inline images”).

__init__(details: PdfDictionary, raw: bytes) None
details: PdfDictionary

Details about the inline image.

raw: bytes

The raw contents of the inline image.

class pdfnaut.cos.objects.base.PdfName[source]

Bases: Generic[T]

An atomic symbol uniquely defined by a sequence of 8-bit characters (see ISO 32000-2:2020 § 7.3.5 “Name Objects”).

__init__(value: T) None
value: T

The value of this name.

class pdfnaut.cos.objects.base.PdfNull[source]

Bases: object

A PDF ‘null’ object, distinct from all other PDF objects (see ISO 32000-2:2020 § 7.3.9 “Null Object”).

class pdfnaut.cos.objects.base.PdfOperator[source]

Bases: object

A PDF operator within a content stream (see ISO 32000-2:2020 § 7.8.2 “Content streams”).

__init__(name: bytes, args: list[PdfObject] | list[PdfInlineImage]) None
args: list[PdfObject] | list[PdfInlineImage]

The arguments or operands provided to this operator.

name: bytes

The name of this operator.

class pdfnaut.cos.objects.base.PdfReference[source]

Bases: Generic[T]

A reference to a PDF indirect object (see ISO 32000-2:2020 § 7.3.10 “Indirect objects”).

__init__(object_number: int, generation: int) None
generation: int

The generation of the object being referenced.

get() T[source]

Returns the object this reference points to. If unable to resolve, returns PdfResolutionError

object_number: int

The object number of the object being referenced.

with_resolver(resolver: Callable[[PdfReference], T]) Self[source]

Sets a resolution method resolver for this reference.

pdfnaut.cos.objects.base.encode_text_string(text: str, *, utf8: bool = False) bytes[source]

Encodes a text string to either PDFDocEncoding or UTF-16BE. Strings are encoded with PDFDoc first then UTF-16BE if text cannot be encoded with PDFDoc.

If utf8 is True, text will be encoded in UTF-8 as fallback instead of UTF-16BE. Note that UTF-8 text strings are a PDF 2.0 feature which may not be supported by all PDF processors.

pdfnaut.cos.objects.base.parse_text_string(encoded: PdfHexString | bytes) str[source]

Parses a text string as described in ISO 32000-2:2020 § 7.9.2.2 “Text string type”.

Text strings may either be encoded in PDFDocEncoding, UTF-16BE, or (PDF 2.0) UTF-8. Each encoding is indicated by a byte-order mark at the beginning (FE FF for UTF-16BE and EF BB BF for UTF-8). PDFDocEncoded strings have no such mark.

Container Objects

class pdfnaut.cos.objects.containers.PdfArray[source]

Bases: UserList[ArrVal]

A heterogeneous collection of sequentially arranged items (see ISO 32000-2:2020 § 7.3.6 “Array objects”).

PdfArray is effectively a Python list. The main difference from a typical list is that PdfArray automatically resolves references when indexing.

The underlying data in unresolved form is stored in PdfArray.data.

class pdfnaut.cos.objects.containers.PdfDictionary[source]

Bases: UserDict[DictKey, DictVal]

An associative table containing pairs of objects or entries where each entry is composed of a key which is a name object and a value which is any PDF object (see ISO 32000-2:2020 § 7.3.7 “Dictionary objects”).

PdfDictionary is effectively a Python dictionary. Its keys are strings and its values are any PDF object. The main difference from a typical dictionary is that PdfDictionary automatically resolves references on key access.

The underlying data in unresolved form is stored in PdfDictionary.data.

Stream Objects

class pdfnaut.cos.objects.stream.PdfStream[source]

Bases: object

A sequence of bytes that may be of unlimited length. Objects with a large amount of data like images or fonts are usually represented by streams (see ISO 32000-2:2020 § 7.3.8 “Stream objects”).

__init__(details: ~pdfnaut.cos.objects.containers.PdfDictionary[str, bool | int | float | bytes | ~pdfnaut.cos.objects.containers.PdfArray | ~pdfnaut.cos.objects.containers.PdfDictionary | ~pdfnaut.cos.objects.stream.PdfStream | ~pdfnaut.cos.objects.base.PdfHexString | ~pdfnaut.cos.objects.base.PdfName | ~pdfnaut.cos.objects.base.PdfReference | ~pdfnaut.cos.objects.base.PdfNull], raw: bytes, _crypt_params: ~pdfnaut.cos.objects.containers.PdfDictionary[str, ~typing.Any] = <factory>) None
classmethod create(raw: bytes, details: PdfDictionary | None = None, crypt_params: PdfDictionary | None = None) Self[source]

Creates a stream from unencoded data raw applying the filter(s) specified in details. The length of the encoded output will automatically be appended to details.

Raises pdfnaut.exceptions.PdfFilterError if a filter used is unsupported.

decode() bytes[source]

Returns the decoded contents of the stream. If no filter is defined, it returns the original contents.

Raises pdfnaut.exceptions.PdfFilterError if a filter used is unsupported.

details: PdfDictionary[str, bool | int | float | bytes | PdfArray | PdfDictionary | PdfStream | PdfHexString | PdfName | PdfReference | PdfNull]

2020 § 7.3.8.2 “Stream extent”.

Type:

The stream extent dictionary as described in ISO 32000-2

modify(raw: bytes) None[source]

Modifies this stream in place by encoding the raw data according to the parameters specified in the stream’s extent.

raw: bytes

The raw data in the stream.

Tree Objects

class pdfnaut.cos.objects.trees.NameTree[source]

Bases: Generic[_V], _NNTree[bytes, _V]

A tree object associating string keys with values. See ISO 32000-2:2020 § 7.9.6 “Name trees”.

Initializes a tree object.

All arguments are optional. items and kids are mutually exclusive and shall not be specified both at once. If limits is not provided, it is computed based on the provided items or children, where applicable.

Parameters:
  • items – The mapping of items contained in this node.

  • kids – The immediate children nodes of this node.

  • limits – The least and greatest keys of this node.

__init__(items: dict[_K, _V] | None = None, kids: MutableSequence[Self] | None = None, limits: tuple[_K, _V] | None = None, *, parent: _NNTree[_K, _V] | None = None) None

Initializes a tree object.

All arguments are optional. items and kids are mutually exclusive and shall not be specified both at once. If limits is not provided, it is computed based on the provided items or children, where applicable.

Parameters:
  • items – The mapping of items contained in this node.

  • kids – The immediate children nodes of this node.

  • limits – The least and greatest keys of this node.

clear() None.  Remove all items from D.
find_leaf(target_key: _K) Self

Finds the leaf node that contains target_key or otherwise the closest leaf node that can contain target_key.

classmethod from_dict(data: PdfDictionary, *, parent: _NNTree[_K, _V] | None = None) Self
get(k[, d]) D[k] if k in D, else d.  d defaults to None.
items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
property kids: _NNTreeKidList[Self] | None

The immediate children of this node.

property limits: tuple[_K, _K] | None

Two items representing the least and greatest keys included in the key-value pairs of the tree and any of its descendants.

property names: MutableMapping[bytes, _V] | None

The key-value pairs of this tree node.

pop(k[, d]) v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) None.  Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() an object providing a view on D's values
walk(compare_key: _K | None = None) Generator[tuple[_K, _V]]

Walks the tree and yields the key-value pairs as found.

When compare_key is specified, trees will be skipped if the comparison key does not fall within the range of the tree’s limits value.

Raises ValueError if the tree contains both nodes and key-value pairs.

class pdfnaut.cos.objects.trees.NumberTree[source]

Bases: Generic[_V], _NNTree[int, _V]

A tree object associating integer keys with values. See ISO 32000-2:2020 § 7.9.7 “Number trees”.

Initializes a tree object.

All arguments are optional. items and kids are mutually exclusive and shall not be specified both at once. If limits is not provided, it is computed based on the provided items or children, where applicable.

Parameters:
  • items – The mapping of items contained in this node.

  • kids – The immediate children nodes of this node.

  • limits – The least and greatest keys of this node.

__init__(items: dict[_K, _V] | None = None, kids: MutableSequence[Self] | None = None, limits: tuple[_K, _V] | None = None, *, parent: _NNTree[_K, _V] | None = None) None

Initializes a tree object.

All arguments are optional. items and kids are mutually exclusive and shall not be specified both at once. If limits is not provided, it is computed based on the provided items or children, where applicable.

Parameters:
  • items – The mapping of items contained in this node.

  • kids – The immediate children nodes of this node.

  • limits – The least and greatest keys of this node.

clear() None.  Remove all items from D.
find_leaf(target_key: _K) Self

Finds the leaf node that contains target_key or otherwise the closest leaf node that can contain target_key.

classmethod from_dict(data: PdfDictionary, *, parent: _NNTree[_K, _V] | None = None) Self
get(k[, d]) D[k] if k in D, else d.  d defaults to None.
items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
property kids: _NNTreeKidList[Self] | None

The immediate children of this node.

property limits: tuple[_K, _K] | None

Two items representing the least and greatest keys included in the key-value pairs of the tree and any of its descendants.

property nums: MutableMapping[int, _V] | None

The key-value pairs of this tree node.

pop(k[, d]) v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) None.  Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() an object providing a view on D's values
walk(compare_key: _K | None = None) Generator[tuple[_K, _V]]

Walks the tree and yields the key-value pairs as found.

When compare_key is specified, trees will be skipped if the comparison key does not fall within the range of the tree’s limits value.

Raises ValueError if the tree contains both nodes and key-value pairs.

XRef Objects

pdfnaut.cos.objects.xref.PdfXRefEntry

alias of FreeXRefEntry | InUseXRefEntry | CompressedXRefEntry

class pdfnaut.cos.objects.xref.CompressedXRefEntry[source]

Bases: object

A Type 2 or compressed entry. Compressed entries refer to objects stored within an object stream.

__init__(objstm_number: int, index_within: int) None
index_within: int

The index of the object within the object stream.

objstm_number: int

The object number of the object stream containing this object.

class pdfnaut.cos.objects.xref.FreeXRefEntry[source]

Bases: object

A Type 0 (f) or free entry. Free entries are entries not currently in use and are members of the linked list of free objects.

__init__(next_free_object: int, gen_if_used_again: int) None
gen_if_used_again: int

The generation to apply to an object if this entry is used again.

next_free_object: int

The object number of the next free object in the linked list.

class pdfnaut.cos.objects.xref.InUseXRefEntry[source]

Bases: object

A Type 1 (n) or in-use entry. In-use entries refer to the objects stored in a document.

__init__(offset: int, generation: int) None
generation: int

The generation of the object.

offset: int

The byte offset of the object in the file (starting after the %PDF marker).

class pdfnaut.cos.objects.xref.PdfXRefSection[source]

Bases: object

A cross-reference section in a XRef table representing an incremental update.

Each section is comprised of one or multiple subsections containing XRef entries.

__init__(subsections: list[PdfXRefSubsection], trailer: PdfDictionary) None
subsections: list[PdfXRefSubsection]

The subsections conforming this XRef section.

trailer: PdfDictionary

The trailer dictionary specified within this XRef section.

class pdfnaut.cos.objects.xref.PdfXRefSubsection[source]

Bases: object

A cross-reference subsection in an XRef section.

__init__(first_obj_number: int, count: int, entries: list[PdfXRefEntry]) None
count: int

The number of entries in this subsection.

entries: list[PdfXRefEntry]

The entries contained in this subsection.

first_obj_number: int

The object number of the first entry in this section. Each entry’s object number starts here and is incremented by one.