Filters

Filters allow PDF authors to encode or compress the contents of streams into more compact formats.

pdfnaut can encode and/or decode the following formats:

  • The ASCII family: ASCII85Decode (Adobe’s implementation) and ASCIIHexDecode

  • The Crypt filter (decode only, requires dependency, untested)

  • FlateDecode (uses zlib/deflate)

  • RunLengthDecode (byte-oriented scheme similar to PackBits)

class pdfnaut.filters.ASCII85Filter[source]

Bases: PdfFilter

Filter for Adobe’s ASCII85 implementation. EOD is ‘~>’.

See ISO 32000-2:2020 § 7.4.3 “ASCII85Decode Filter” for details.

This filter does not take any parameters. params will be ignored.

decode(contents: bytes, *, params: PdfDictionary | None = None) bytes[source]
encode(contents: bytes, *, params: PdfDictionary | None = None) bytes[source]
class pdfnaut.filters.ASCIIHexFilter[source]

Bases: PdfFilter

Filter for hexadecimal strings. EOD is ‘>’.

See ISO 32000-2:2020 § 7.4.2 “ASCIIHexDecode Filter” for details.

This filter does not take any parameters. params will be ignored.

decode(contents: bytes, *, params: PdfDictionary | None = None) bytes[source]
encode(contents: bytes, *, params: PdfDictionary | None = None) bytes[source]
class pdfnaut.filters.CryptFetchFilter[source]

Bases: PdfFilter

Filter for encrypted streams (see ISO 32000-2:2020 § 7.4.10 “Crypt Filter”).

This filter takes two optional parameters: Type, which defines the decode parameters as being for this filter; and Name, which defines what filter should be used to decrypt the stream.

This filter requires 3 additional parameters. These parameters are for use exclusively within the PDF processor and shall not be written to the document.

  • Handler: An instance of the security handler.

  • EncryptionKey: The encryption key generated from the security handler.

  • Reference: The indirect reference of the object to decrypt.

decode(contents: bytes, *, params: PdfDictionary | None = None) bytes[source]
encode(contents: bytes, *, params: PdfDictionary | None = None) bytes[source]
class pdfnaut.filters.FlateFilter[source]

Bases: PdfFilter

Filter for zlib/deflate compression (see ISO 32000-2:2020 § 7.4.4 “LZWDecode and FlateDecode Filters”).

This filter supports predictors which can increase predictability of data and hence improve compression. 2 predictor groups are supported by the spec: the PNG filters defined in § 9. Filtering of the PNG spec and TIFF Predictor 2 defined in the TIFF 6.0 spec and which is currently unimplemented.

The predictor is specified by means of the Predictor key in params (default: 1). If the Predictor is not 1, the following parameters can be provided:

  • Colors: Amount of color components per sample. Can be any value greater than 1 (default: 1).

  • BitsPerComponent: Bit length of each of the color components. Possible values are: 1, 2, 4, 8 (default), and 16.

  • Columns: Amount of samples per row. Can be any value greater than 1 (default: 1).

Given these values, the length of a sample in bytes is given by

Length(Sample) = ceil((Colors * BitsPerComponent) / 8)

and the length of a row is given by

Length(Row) = Length(Sample) * Columns

decode(contents: bytes, *, params: PdfDictionary[str, int] | None = None) bytes[source]
encode(contents: bytes, *, params: PdfDictionary[str, int] | None = None) bytes[source]
class pdfnaut.filters.PdfFilter[source]

Bases: Protocol

__init__(*args, **kwargs)
decode(contents: bytes, *, params: PdfDictionary | None = None) bytes[source]
encode(contents: bytes, *, params: PdfDictionary | None = None) bytes[source]
class pdfnaut.filters.RunLengthFilter[source]

Bases: PdfFilter

Filter for a form of byte-oriented run-length encoding (RLE) scheme resembling the Apple PackBits format (see ISO 32000-2:2020 § 7.4.5 “RunLengthDecode Filter”).

In this filter, data is formatted as a sequence of runs. Each run starts with a length byte and is followed by 1 to 128 bytes of data.

  • If the length byte is in the range 0 to 127, the following length byte + 1 bytes shall be copied exactly.

  • If the length byte is in the range 129 to 255, the following byte shall be copied 257 - length bytes.

  • A length byte of 128 means EOD.

Implementation note: encoding is performed using a threshold determined by the average of the lengths of each run. Values under such threshold are copied. Values over such threshold are repeated.

This filter does not take any parameters. params will be ignored.

decode(contents: bytes, *, params: PdfDictionary | None = None) bytes[source]
encode(contents: bytes, *, params: PdfDictionary | None = None) bytes[source]