COS Objects Reference¶
The PDF 2.0 specification defines the following basic object types:
PDF Object |
Python Object |
|---|---|
Booleans (true/false) |
|
Integers (123) |
|
Real numbers (123.456) |
|
Literal strings ( |
|
Hexadecimal strings ( |
|
Names ( |
|
Arrays ( |
|
Dictionaries ( |
|
Streams |
|
Null |
|
Indirect references (1 0 R) |
The spec also defines general-purpose data structures built from the basic object types.
Strings are divided into:
ASCII strings.
Byte strings: hex strings or literal strings containing binary data.
PDFDocEncoded strings.
Text strings: encoded in either PDFDocEncoding, UTF-16BE or UTF-8; the latter of which was introduced in PDF 2.0.
Dates: implemented via
encode_iso8824()andparse_iso8824().Name trees and number trees: see
NameTreeandNumberTree.The following data structures do not currently have a dedicated type:
File specifications
Functions
Rectangles
Text streams
Base Objects¶
- pdfnaut.cos.objects.base.PdfObject¶
alias of
bool|int|float|bytes|PdfArray|PdfDictionary|PdfStream|PdfHexString|PdfName|PdfReference|PdfNull
- class pdfnaut.cos.objects.base.PdfComment[source]¶
Bases:
objectA comment introduced by the presence of the percent sign (
%) outside a string or inside a content stream. Comments have no syntactical meaning and shall be interpreted as whitespace (see ISO 32000-2:2020 § 7.2.4 “Comments”).
- class pdfnaut.cos.objects.base.PdfHexString[source]¶
Bases:
objectA string of characters encoded in hexadecimal useful for including arbitrary binary data in a PDF (see ISO 32000-2:2020 § 7.3.4.3 “Hexadecimal Strings”).
- class pdfnaut.cos.objects.base.PdfInlineImage[source]¶
Bases:
objectA PDF inline image within a content stream (see ISO 32000-2:2020 § 8.9.7 “Inline images”).
- __init__(details: PdfDictionary, raw: bytes) None¶
- details: PdfDictionary¶
Details about the inline image.
- class pdfnaut.cos.objects.base.PdfName[source]¶
Bases:
Generic[T]An atomic symbol uniquely defined by a sequence of 8-bit characters (see ISO 32000-2:2020 § 7.3.5 “Name Objects”).
- value: T¶
The value of this name.
- class pdfnaut.cos.objects.base.PdfNull[source]¶
Bases:
objectA PDF ‘null’ object, distinct from all other PDF objects (see ISO 32000-2:2020 § 7.3.9 “Null Object”).
- class pdfnaut.cos.objects.base.PdfOperator[source]¶
Bases:
objectA PDF operator within a content stream (see ISO 32000-2:2020 § 7.8.2 “Content streams”).
- args: list[PdfObject] | list[PdfInlineImage]¶
The arguments or operands provided to this operator.
- class pdfnaut.cos.objects.base.PdfReference[source]¶
Bases:
Generic[T]A reference to a PDF indirect object (see ISO 32000-2:2020 § 7.3.10 “Indirect objects”).
- get() T[source]¶
Returns the object this reference points to. If unable to resolve, returns
PdfResolutionError
- with_resolver(resolver: Callable[[PdfReference], T]) Self[source]¶
Sets a resolution method
resolverfor this reference.
- pdfnaut.cos.objects.base.encode_text_string(text: str, *, utf8: bool = False) bytes[source]¶
Encodes a text string to either PDFDocEncoding or UTF-16BE. Strings are encoded with PDFDoc first then UTF-16BE if
textcannot be encoded with PDFDoc.If
utf8is True,textwill be encoded in UTF-8 as fallback instead of UTF-16BE. Note that UTF-8 text strings are a PDF 2.0 feature which may not be supported by all PDF processors.
- pdfnaut.cos.objects.base.parse_text_string(encoded: PdfHexString | bytes) str[source]¶
Parses a text string as described in ISO 32000-2:2020 § 7.9.2.2 “Text string type”.
Text strings may either be encoded in PDFDocEncoding, UTF-16BE, or (PDF 2.0) UTF-8. Each encoding is indicated by a byte-order mark at the beginning (
FE FFfor UTF-16BE andEF BB BFfor UTF-8). PDFDocEncoded strings have no such mark.
Container Objects¶
- class pdfnaut.cos.objects.containers.PdfArray[source]¶
Bases:
UserList[ArrVal]A heterogeneous collection of sequentially arranged items (see ISO 32000-2:2020 § 7.3.6 “Array objects”).
PdfArrayis effectively a Python list. The main difference from a typical list is that PdfArray automatically resolves references when indexing.The underlying data in unresolved form is stored in
PdfArray.data.
- class pdfnaut.cos.objects.containers.PdfDictionary[source]¶
Bases:
UserDict[DictKey,DictVal]An associative table containing pairs of objects or entries where each entry is composed of a key which is a name object and a value which is any PDF object (see ISO 32000-2:2020 § 7.3.7 “Dictionary objects”).
PdfDictionaryis effectively a Python dictionary. Its keys are strings and its values are any PDF object. The main difference from a typical dictionary is that PdfDictionary automatically resolves references on key access.The underlying data in unresolved form is stored in
PdfDictionary.data.
Stream Objects¶
- class pdfnaut.cos.objects.stream.PdfStream[source]¶
Bases:
objectA sequence of bytes that may be of unlimited length. Objects with a large amount of data like images or fonts are usually represented by streams (see ISO 32000-2:2020 § 7.3.8 “Stream objects”).
- __init__(details: ~pdfnaut.cos.objects.containers.PdfDictionary[str, bool | int | float | bytes | ~pdfnaut.cos.objects.containers.PdfArray | ~pdfnaut.cos.objects.containers.PdfDictionary | ~pdfnaut.cos.objects.stream.PdfStream | ~pdfnaut.cos.objects.base.PdfHexString | ~pdfnaut.cos.objects.base.PdfName | ~pdfnaut.cos.objects.base.PdfReference | ~pdfnaut.cos.objects.base.PdfNull], raw: bytes, _crypt_params: ~pdfnaut.cos.objects.containers.PdfDictionary[str, ~typing.Any] = <factory>) None¶
- classmethod create(raw: bytes, details: PdfDictionary | None = None, crypt_params: PdfDictionary | None = None) Self[source]¶
Creates a stream from unencoded data
rawapplying the filter(s) specified indetails. The length of the encoded output will automatically be appended todetails.Raises
pdfnaut.exceptions.PdfFilterErrorif a filter used is unsupported.
- decode() bytes[source]¶
Returns the decoded contents of the stream. If no filter is defined, it returns the original contents.
Raises
pdfnaut.exceptions.PdfFilterErrorif a filter used is unsupported.
- details: PdfDictionary[str, bool | int | float | bytes | PdfArray | PdfDictionary | PdfStream | PdfHexString | PdfName | PdfReference | PdfNull]¶
2020 § 7.3.8.2 “Stream extent”.
- Type:
The stream extent dictionary as described in ISO 32000-2
Tree Objects¶
- class pdfnaut.cos.objects.trees.NameTree[source]¶
Bases:
Generic[_V],_NNTree[bytes,_V]A tree object associating string keys with values. See ISO 32000-2:2020 § 7.9.6 “Name trees”.
Initializes a tree object.
All arguments are optional.
itemsandkidsare mutually exclusive and shall not be specified both at once. Iflimitsis not provided, it is computed based on the provided items or children, where applicable.- Parameters:
items – The mapping of items contained in this node.
kids – The immediate children nodes of this node.
limits – The least and greatest keys of this node.
- __init__(items: dict[_K, _V] | None = None, kids: MutableSequence[Self] | None = None, limits: tuple[_K, _V] | None = None, *, parent: _NNTree[_K, _V] | None = None) None¶
Initializes a tree object.
All arguments are optional.
itemsandkidsare mutually exclusive and shall not be specified both at once. Iflimitsis not provided, it is computed based on the provided items or children, where applicable.- Parameters:
items – The mapping of items contained in this node.
kids – The immediate children nodes of this node.
limits – The least and greatest keys of this node.
- clear() None. Remove all items from D.¶
- find_leaf(target_key: _K) Self¶
Finds the leaf node that contains
target_keyor otherwise the closest leaf node that can containtarget_key.
- classmethod from_dict(data: PdfDictionary, *, parent: _NNTree[_K, _V] | None = None) Self¶
- get(k[, d]) D[k] if k in D, else d. d defaults to None.¶
- items() a set-like object providing a view on D's items¶
- keys() a set-like object providing a view on D's keys¶
- property limits: tuple[_K, _K] | None¶
Two items representing the least and greatest keys included in the key-value pairs of the tree and any of its descendants.
- property names: MutableMapping[bytes, _V] | None¶
The key-value pairs of this tree node.
- pop(k[, d]) v, remove specified key and return the corresponding value.¶
If key is not found, d is returned if given, otherwise KeyError is raised.
- popitem() (k, v), remove and return some (key, value) pair¶
as a 2-tuple; but raise KeyError if D is empty.
- setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D¶
- update([E, ]**F) None. Update D from mapping/iterable E and F.¶
If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
- values() an object providing a view on D's values¶
- walk(compare_key: _K | None = None) Generator[tuple[_K, _V]]¶
Walks the tree and yields the key-value pairs as found.
When
compare_keyis specified, trees will be skipped if the comparison key does not fall within the range of the tree’slimitsvalue.Raises
ValueErrorif the tree contains both nodes and key-value pairs.
- class pdfnaut.cos.objects.trees.NumberTree[source]¶
Bases:
Generic[_V],_NNTree[int,_V]A tree object associating integer keys with values. See ISO 32000-2:2020 § 7.9.7 “Number trees”.
Initializes a tree object.
All arguments are optional.
itemsandkidsare mutually exclusive and shall not be specified both at once. Iflimitsis not provided, it is computed based on the provided items or children, where applicable.- Parameters:
items – The mapping of items contained in this node.
kids – The immediate children nodes of this node.
limits – The least and greatest keys of this node.
- __init__(items: dict[_K, _V] | None = None, kids: MutableSequence[Self] | None = None, limits: tuple[_K, _V] | None = None, *, parent: _NNTree[_K, _V] | None = None) None¶
Initializes a tree object.
All arguments are optional.
itemsandkidsare mutually exclusive and shall not be specified both at once. Iflimitsis not provided, it is computed based on the provided items or children, where applicable.- Parameters:
items – The mapping of items contained in this node.
kids – The immediate children nodes of this node.
limits – The least and greatest keys of this node.
- clear() None. Remove all items from D.¶
- find_leaf(target_key: _K) Self¶
Finds the leaf node that contains
target_keyor otherwise the closest leaf node that can containtarget_key.
- classmethod from_dict(data: PdfDictionary, *, parent: _NNTree[_K, _V] | None = None) Self¶
- get(k[, d]) D[k] if k in D, else d. d defaults to None.¶
- items() a set-like object providing a view on D's items¶
- keys() a set-like object providing a view on D's keys¶
- property limits: tuple[_K, _K] | None¶
Two items representing the least and greatest keys included in the key-value pairs of the tree and any of its descendants.
- property nums: MutableMapping[int, _V] | None¶
The key-value pairs of this tree node.
- pop(k[, d]) v, remove specified key and return the corresponding value.¶
If key is not found, d is returned if given, otherwise KeyError is raised.
- popitem() (k, v), remove and return some (key, value) pair¶
as a 2-tuple; but raise KeyError if D is empty.
- setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D¶
- update([E, ]**F) None. Update D from mapping/iterable E and F.¶
If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
- values() an object providing a view on D's values¶
- walk(compare_key: _K | None = None) Generator[tuple[_K, _V]]¶
Walks the tree and yields the key-value pairs as found.
When
compare_keyis specified, trees will be skipped if the comparison key does not fall within the range of the tree’slimitsvalue.Raises
ValueErrorif the tree contains both nodes and key-value pairs.
XRef Objects¶
- pdfnaut.cos.objects.xref.PdfXRefEntry¶
alias of
FreeXRefEntry|InUseXRefEntry|CompressedXRefEntry
- class pdfnaut.cos.objects.xref.CompressedXRefEntry[source]¶
Bases:
objectA Type 2 or compressed entry. Compressed entries refer to objects stored within an object stream.
- class pdfnaut.cos.objects.xref.FreeXRefEntry[source]¶
Bases:
objectA Type 0 (
f) or free entry. Free entries are entries not currently in use and are members of the linked list of free objects.
- class pdfnaut.cos.objects.xref.InUseXRefEntry[source]¶
Bases:
objectA Type 1 (
n) or in-use entry. In-use entries refer to the objects stored in a document.
- class pdfnaut.cos.objects.xref.PdfXRefSection[source]¶
Bases:
objectA cross-reference section in a XRef table representing an incremental update.
Each section is comprised of one or multiple subsections containing XRef entries.
- __init__(subsections: list[PdfXRefSubsection], trailer: PdfDictionary) None¶
- subsections: list[PdfXRefSubsection]¶
The subsections conforming this XRef section.
- trailer: PdfDictionary¶
The trailer dictionary specified within this XRef section.