pdfnaut¶
Warning
This library is currently in a very early stage of development. It has only been tested with a small set of known to be spec-compliant documents.
pdfnaut aims to become a PDF processor for Python – a library capable of reading and writing PDF documents.
pdfnaut can currently read and write PDF documents at a low level. No high-level APIs are currently provided.
Features¶
Low level PDF manipulation
Encryption (AES/ARC4)
Document building/serializations
Install¶
pdfnaut
can be installed from PyPI:
python3 -m pip install pdfnaut
python -m pip install pdfnaut
Important
While pdfnaut
supports encryption, it does not implement these algorithms. You must either supply your own implementations or preferably install a package like pycryptodome
that includes these algorithms.
Examples¶
The next example illustrates how pdfnaut
can currently be used to read an existing PDF. Note that, due to the low-level nature of pdfnaut
, reading and extracting data from each document will require existing knowledge of its structure.
from pdfnaut import PdfParser
with open("tests/docs/sample.pdf", "rb") as doc:
pdf = PdfParser(doc.read())
pdf.parse()
# Get the pages object from the trailer
root = pdf.resolve_reference(pdf.trailer["Root"])
page_tree = pdf.resolve_reference(root["Pages"])
# Get the contents of the first page
page = pdf.resolve_reference(page_tree["Kids"][0])
page_stream = pdf.resolve_reference(page["Contents"])
print(page_stream.decompress())