PDF Comparison Tools: How to Find Changes in Documents
How to compare PDF documents and find changes — legal contracts, financial reports, policy documents. Tools, workflows, and gotchas.
Priya Sharma
Platform Engineer
The PDF Comparison Problem
PDFs are designed for display, not comparison. The same text can be stored as different byte sequences depending on the PDF producer. Two visually identical PDFs can be binary-different. Scanned PDFs are images with no extractable text at all. These factors make PDF comparison significantly harder than text or code comparison.
Despite these challenges, PDF comparison is critical in many domains: legal teams tracking contract revisions, compliance teams auditing policy documents, finance teams comparing quarterly reports, and developers verifying generated PDF output.
Approaches to PDF Comparison
1. Text Extraction + Text Diff
Extract text from both PDFs and compare the extracted content:
# Extract text with pdftotext (part of poppler)
pdftotext -layout contract-v1.pdf - > v1.txt
pdftotext -layout contract-v2.pdf - > v2.txt
diff -u v1.txt v2.txt
# Or with Python
pip install pdfplumber
python3 -c "
import pdfplumber
with pdfplumber.open('contract.pdf') as pdf:
text = '
'.join(p.extract_text() for p in pdf.pages)
print(text)"
Limitation: text extraction loses formatting, tables become garbled, and footnotes may appear in unexpected positions.
2. Visual / Pixel Comparison
Render both PDFs to images and compare visually:
# Convert PDF pages to images with pdftoppm
pdftoppm -r 150 contract-v1.pdf page-v1
pdftoppm -r 150 contract-v2.pdf page-v2
# Compare corresponding pages
for i in $(seq -w 1 10); do
diff <(identify -quiet page-v1-$i.ppm) <(identify -quiet page-v2-$i.ppm)
done
3. Dedicated PDF Diff Tools
For production use, dedicated tools handle the complexity:
- DiffChecker Pro — Upload two PDFs, get a side-by-side visual diff with text change highlighting and page navigation
- Adobe Acrobat Pro — Built-in "Compare Files" feature, excellent for legal/compliance use
- draftable.com — Online tool specialized for legal document comparison
- diff-pdf — Open-source CLI tool that renders pages to images and highlights pixel differences
Handling Scanned PDFs
Scanned PDFs require OCR before text comparison:
pip install pytesseract pdf2image
python3 -c "
from pdf2image import convert_from_path
import pytesseract
pages = convert_from_path('scanned.pdf', dpi=300)
text = '
'.join(pytesseract.image_to_string(p) for p in pages)
print(text)"
OCR-extracted text will have minor errors — compare with a higher diff threshold and expect some noise in character-level comparisons.
Workflow for Contract Review
- Upload both PDF versions to DiffChecker Pro's PDF diff mode
- Navigate to changed pages using the page change summary
- Use text diff mode for precise word-level changes
- Use visual mode to verify formatting changes (margins, fonts, table layout)
- Export the diff report as PDF for audit trail
Automating PDF Comparison in CI
For teams that generate PDFs (invoices, reports, documents), add visual regression tests:
# Install diff-pdf
brew install diff-pdf
# Compare PDF outputs
diff-pdf --output-diff=diff.pdf expected.pdf actual.pdf
if [ $? -ne 0 ]; then
echo "PDF output changed — review diff.pdf"
exit 1
fi
Share this article
Was this article helpful?
Ready to try it? Start a free comparison →
Priya Sharma
Platform Engineer
Priya Sharma writes about developer tools, software engineering best practices, and productivity for the DiffChecker Pro blog. With extensive experience in software development, Priya focuses on practical guides that help developers work more effectively.