pdfnaut

Warning

This library is currently in a very early stage of development. Expect bugs or issues.

pdfnaut aims to become a PDF processor for Python capable of reading and writing PDF documents.

pdfnaut provides high-level APIs for performing the following actions:

  • Reading compressed & encrypted PDF documents (AES/ARC4, see note below).

  • Inspecting PDF structure.

  • Viewing and editing document metadata.

  • Viewing and editing document outlines.

  • Appending, inserting, and removing pages.

  • Building PDFs from scratch.

Install

pdfnaut can be installed from PyPI. pdfnaut requires at least Python 3.10 or later.

python3 -m pip install pdfnaut
python -m pip install pdfnaut

Important

To use pdfnaut for reading encrypted or protected documents, you must also install a crypt provider as described in Standard Security Handler.

Examples

pdfnaut provides its core API via the PdfDocument class which allows performing common actions within a PDF. For example, to access the content stream of the first page in the document, you can do as follows:

from pdfnaut import PdfDocument

pdf = PdfDocument.from_filename("tests/docs/sample.pdf")

for operator in pdf.pages[0].content_stream:
   print(operator)

Reading document information from a PDF is also simple:

from pdfnaut import PdfDocument

pdf = PdfDocument.from_filename("tests/docs/sample.pdf")
print(pdf.doc_info.title)
print(pdf.doc_info.author)

Indices and tables