COS Objects Reference¶
The PDF 2.0 specification defines the following basic object types:
PDF Object |
Python Object |
|---|---|
Booleans (true/false) |
|
Integers (123) |
|
Real numbers (123.456) |
|
Literal strings ( |
|
Hexadecimal strings ( |
|
Names ( |
|
Arrays ( |
|
Dictionaries ( |
|
Streams |
|
Null |
|
Indirect references (1 0 R) |
The spec also defines general-purpose data structures built from the basic object types.
Strings are divided into:
ASCII strings.
Byte strings: hex strings or literal strings containing binary data.
PDFDocEncoded strings
Text strings: encoded in either PDFDocEncoding, UTF-16BE or UTF-8. The latter was introduced in PDF 2.0
Dates: implemented via
encode_iso8824()andparse_iso8824().The following data structures do not currently have a dedicated type:
File specifications
Functions
Name trees
Number trees
Rectangles
Text streams
Base Objects¶
- pdfnaut.cos.objects.base.PdfObject¶
alias of
bool|int|float|bytes|PdfArray|PdfDictionary|PdfHexString|PdfName|PdfReference|PdfNull
- class pdfnaut.cos.objects.base.PdfComment[source]¶
Bases:
objectA comment introduced by the presence of the percent sign (
%) outside a string or inside a content stream. Comments have no syntactical meaning and shall be interpreted as whitespace (see ISO 32000-2:2020 § 7.2.4 “Comments”).
- class pdfnaut.cos.objects.base.PdfHexString[source]¶
Bases:
objectA string of characters encoded in hexadecimal useful for including arbitrary binary data in a PDF (see ISO 32000-2:2020 § 7.3.4.3 “Hexadecimal Strings”).
- class pdfnaut.cos.objects.base.PdfInlineImage[source]¶
Bases:
objectA PDF inline image within a content stream (see ISO 32000-2:2020 § 8.9.7 “Inline images”).
- __init__(details: PdfDictionary, raw: bytes) None¶
- details: PdfDictionary¶
Details about the inline image.
- class pdfnaut.cos.objects.base.PdfName[source]¶
Bases:
Generic[T]An atomic symbol uniquely defined by a sequence of 8-bit characters (see ISO 32000-2:2020 § 7.3.5 “Name Objects”).
- value: T¶
The value of this name.
- class pdfnaut.cos.objects.base.PdfNull[source]¶
Bases:
objectA PDF ‘null’ object, distinct from all other PDF objects (see ISO 32000-2:2020 § 7.3.9 “Null Object”).
- class pdfnaut.cos.objects.base.PdfOperator[source]¶
Bases:
objectA PDF operator within a content stream (see ISO 32000-2:2020 § 7.8.2 “Content streams”).
- args: list[PdfObject] | list[PdfInlineImage]¶
The arguments or operands provided to this operator.
- class pdfnaut.cos.objects.base.PdfReference[source]¶
Bases:
Generic[T]A reference to a PDF indirect object (see ISO 32000-2:2020 § 7.3.10 “Indirect objects”).
- get() T[source]¶
Returns the object this reference points to. If unable to resolve, returns
PdfResolutionError
- with_resolver(resolver: Callable[[PdfReference], T]) Self[source]¶
Sets a resolution method
resolverfor this reference.
- pdfnaut.cos.objects.base.encode_text_string(text: str, *, utf8: bool = False) bytes[source]¶
Encodes a text string to either PDFDocEncoding or UTF-16BE. Strings are encoded with PDFDoc first then UTF-16BE if
textcannot be encoded with PDFDoc.If
utf8is True,textwill be encoded in UTF-8 as fallback instead of UTF-16BE. Note that UTF-8 text strings are a PDF 2.0 feature which may not be supported by all PDF processors.
- pdfnaut.cos.objects.base.parse_text_string(encoded: PdfHexString | bytes) str[source]¶
Parses a text string as described in ISO 32000-2:2020 § 7.9.2.2 “Text string type”.
Text strings may either be encoded in PDFDocEncoding, UTF-16BE, or (PDF 2.0) UTF-8. Each encoding is indicated by a byte-order mark at the beginning (
FE FFfor UTF-16BE andEF BB BFfor UTF-8). PDFDocEncoded strings have no such mark.
Stream Objects¶
- class pdfnaut.cos.objects.stream.PdfStream[source]¶
Bases:
objectA sequence of bytes that may be of unlimited length. Objects with a large amount of data like images or fonts are usually represented by streams (see ISO 32000-2:2020 § 7.3.8 “Stream objects”).
- __init__(details: ~pdfnaut.cos.objects.containers.PdfDictionary[str, bool | int | float | bytes | ~pdfnaut.cos.objects.containers.PdfArray | ~pdfnaut.cos.objects.containers.PdfDictionary | ~pdfnaut.cos.objects.base.PdfHexString | ~pdfnaut.cos.objects.base.PdfName | ~pdfnaut.cos.objects.base.PdfReference | ~pdfnaut.cos.objects.base.PdfNull], raw: bytes, _crypt_params: ~pdfnaut.cos.objects.containers.PdfDictionary[str, ~typing.Any] = <factory>) None¶
- classmethod create(raw: bytes, details: PdfDictionary | None = None, crypt_params: PdfDictionary | None = None) Self[source]¶
Creates a stream from unencoded data
rawapplying the filter(s) specified indetails. The length of the encoded output will automatically be appended todetails.Raises
pdfnaut.exceptions.PdfFilterErrorif a filter used is unsupported.
- decode() bytes[source]¶
Returns the decoded contents of the stream. If no filter is defined, it returns the original contents.
Raises
pdfnaut.exceptions.PdfFilterErrorif a filter used is unsupported.
- details: PdfDictionary[str, bool | int | float | bytes | PdfArray | PdfDictionary | PdfHexString | PdfName | PdfReference | PdfNull]¶
2020 § 7.3.8.2 “Stream extent”.
- Type:
The stream extent dictionary as described in ISO 32000-2
Container Objects¶
- class pdfnaut.cos.objects.containers.PdfArray[source]¶
Bases:
UserList[ArrVal]A heterogeneous collection of sequentially arranged items (see ISO 32000-2:2020 § 7.3.6 “Array objects”).
PdfArrayis effectively a Python list. The main difference from a typical list is that PdfArray automatically resolves references when indexing.The underlying data in unresolved form is stored in
PdfArray.data.
- class pdfnaut.cos.objects.containers.PdfDictionary[source]¶
Bases:
UserDict[DictKey,DictVal]An associative table containing pairs of objects or entries where each entry is composed of a key which is a name object and a value which is any PDF object (see ISO 32000-2:2020 § 7.3.7 “Dictionary objects”).
PdfDictionaryis effectively a Python dictionary. Its keys are strings and its values are any PDF object. The main difference from a typical dictionary is that PdfDictionary automatically resolves references on key access.The underlying data in unresolved form is stored in
PdfDictionary.data.
XRef Objects¶
- pdfnaut.cos.objects.xref.PdfXRefEntry¶
alias of
FreeXRefEntry|InUseXRefEntry|CompressedXRefEntry
- class pdfnaut.cos.objects.xref.CompressedXRefEntry[source]¶
Bases:
objectA Type 2 or compressed entry. Compressed entries refer to objects stored within an object stream.
- class pdfnaut.cos.objects.xref.FreeXRefEntry[source]¶
Bases:
objectA Type 0 (
f) or free entry. Free entries are entries not currently in use and are members of the linked list of free objects.
- class pdfnaut.cos.objects.xref.InUseXRefEntry[source]¶
Bases:
objectA Type 1 (
n) or in-use entry. In-use entries refer to the objects stored in a document.
- class pdfnaut.cos.objects.xref.PdfXRefSection[source]¶
Bases:
objectA cross-reference section in a XRef table representing an incremental update.
Each section is comprised of one or multiple subsections containing XRef entries.
- __init__(subsections: list[PdfXRefSubsection], trailer: PdfDictionary) None¶
- subsections: list[PdfXRefSubsection]¶
The subsections conforming this XRef section.
- trailer: PdfDictionary¶
The trailer dictionary specified within this XRef section.