pdfnaut

Warning

This library is currently in a very early stage of development. It has only been tested with a small set of known to be spec-compliant documents.

pdfnaut aims to become a PDF processor for Python – a library capable of reading and writing PDF documents.

pdfnaut can currently read and write PDF documents at a low level. No high-level APIs are currently provided.

Features

  • Low level PDF manipulation

  • Encryption (AES/ARC4)

  • Document building/serializations

Install

pdfnaut can be installed from PyPI:

python3 -m pip install pdfnaut
python -m pip install pdfnaut

Important

While pdfnaut supports encryption, it does not implement these algorithms. You must either supply your own implementations or preferably install a package like pycryptodome that includes these algorithms.

Examples

The next example illustrates how pdfnaut can currently be used to read an existing PDF. Note that, due to the low-level nature of pdfnaut, reading and extracting data from each document will require existing knowledge of its structure.

from pdfnaut import PdfParser

with open("tests/docs/sample.pdf", "rb") as doc:
   pdf = PdfParser(doc.read())
   pdf.parse()

   # Get the pages object from the trailer
   root = pdf.resolve_reference(pdf.trailer["Root"])
   page_tree = pdf.resolve_reference(root["Pages"])

   # Get the contents of the first page
   page = pdf.resolve_reference(page_tree["Kids"][0])
   page_stream = pdf.resolve_reference(page["Contents"])
   print(page_stream.decompress())

Indices and tables