PDF Optimization: Errata, Directory, Sharpening, Blank Page, Deskew, De-shadow, OCR, Cover Back Cover, Split Double Page into Single Page
This article is translated by AI.
Preface
This article will introduce PDF optimization methods, including errata, directory, sharpening, blank page, deskew, de-shadow, OCR, cover back cover, and simply classify PDF.
PDF Classification
According to my experience, PDF can be roughly divided into two categories based on the production method. One is scanned pdf, produced using tools such as scanners, similar to taking photos, and the other is True PDF, produced using professional typesetting tools such as LATEX, which I also call text version PDF.
The quality of scanned PDF varies, with low upper and lower limits.
List a few poor cases.

The picture above has skewed pages, see-through text, black borders, and unclear handwriting. It can be said to be a collection of various problems.

This picture actually looks fine at first glance, but after zooming in, you will find that the handwriting is like this.

So it looks blurry.
Of course there are good cases.

Like this, even if magnified to 1600%, the handwriting is still very clear.
True PDF has a high lower limit of quality. The problems that exist are often lack of directory bookmarks, missing blank pages, missing cover and back cover, no errata, etc.
A feature of True PDF is that no matter how you zoom in, the handwriting is clear, and the background is white and clean, and graphics can often be zoomed in infinitely while maintaining clarity.

Good PDF Standards
What counts as a good PDF? I summarize the following criteria.
- With directory bookmarks
- Clear handwriting
- Background without impurities, no see-through text
- No page skew
- No wrong pages, missing pages, duplicate pages
- Text can be copied and pasted (after OCR)
- Errata has been made according to the errata table
- Page size is basically consistent
- Retain hyperlinks, click to jump
If printing is also required, then:
- Moderate page margin blank space
- Have cover and back cover
Recommended Tools
QuickOutline
Used to add directory.
Adobe Acrobat Pro
Main tool, rich in functions.
PDF 24
Free, can partially replace Adobe Acrobat Pro, and also has some functions that Adobe Acrobat Pro does not have.
Xournal++
Used to edit mathematical formulas.
Google Gemini
Gemini 2.5 pro has outstanding visual capabilities and can be used to OCR directories.
Quark Mobile App
Rich document processing functions.
Optimization
Next, we will break down the problems one by one according to the criteria proposed above.
Directory Bookmarks
Refer to an answer I wrote before.
PDF File Automatically Generate Directory Bookmarks
Handwriting Processing, Page Skew, Background Impurities
Mainly use the "Enhance Scanned PDF" function of Adobe Acrobat Pro. The "Text Sharpening" intensity has no universal parameters and needs to be debugged repeatedly according to the actual situation.

It is worth noting that the "Output" item of "Recognize Text" provides the option "Editable Text and Images". After actual testing, this option is effective for English PDF, but not good for Chinese PDF.


This is the effect of English PDF.
But Chinese PDF is weird, and the overall look and feel is not good, it is better not to enable this option.

The function "Editable Text and Images" is quite metaphysical. In "Optimize Scanned Pages", there may be a problem of recognizing interline formulas as images, as follows.


But using this function in "Scan & OCR" does not have this problem (it will also automatically deskew when used), and batch OCR is much faster than individual OCR one by one.
Wrong Pages, Missing Pages, Duplicate Pages
Wrong pages are difficult to check quickly and need to be viewed page by page.
For example, Dummit D.S., Foote R.M. Abstract Algebra[M/OL]. Wiley, 2003. https://books.google.com/books?id=KJDBQgAACAAJ. A certain electronic version of this book has the following wrong page situation, the table is incomplete, which is hard to find.

This page can be repaired by cutting PDF pages from other electronic versions.

Missing pages and duplicate pages can be determined based on the change in the difference between the PDF page number and the actual page number of the book. For missing pages, in addition to finding other electronic versions for repair, you can also scan it yourself and add it to the PDF. For duplicate pages, just delete the page with poorer quality.
Of course, for extreme cases of missing or wrong pages, you can make PDF pages yourself and add them to the original PDF.
Take Isaacs Irving Martin. Character theory of finite groups[M]. New York: Academic Press, Inc., 1976. as an example. I found two electronic versions. One is clearly scanned but has no directory page, not meaning no directory bookmarks, but no directory PDF page. The other has poor scanning quality, but fortunately has a directory page.


So I used LATEX to make the directory PDF page.


There is a special case where blank pages are missing. Even Number Theory (Henri Cohen) downloaded from the springer official website will have missing blank pages. This will cause misalignment when adding directory bookmarks, so it also needs to be repaired.
A certain electronic version of Zhang Zhusheng's "New Lectures on Mathematical Analysis" has all the above situations. I completed the repair work by scanning the book myself.
OCR
Adobe Acrobat Pro's "Scan & OCR" has the "Recognize Text" function.

Errata
Mainly discuss two situations.
One is pure text, just use the "Edit PDF" function of Adobe Acrobat Pro.
The other is editing mathematical formulas. I haven't found a very satisfactory method yet.
A feasible method is to use the tool Xournal++. See the official website tutorial for specific operation methods.
In the errata page of Richter Birgit. From categories to homotopy theory[M]. Cambridge: Cambridge University press, 2020., for the following item.

Example as follows.

This is the part before modification.

This is the result after modification.
Another feasible way is to use LATEX to generate the corrected PDF page and then replace it in the original PDF. This method is time-consuming and laborious, and is really a bad strategy.
Page Size
Use the "Change PDF Page Size" function in PDF24.

Hyperlinks
There is currently no satisfactory repair method.
Page Margin Blank Space
Too large or too small blank space is not suitable for printing. Too large will make the text in the middle too small, and too small will make the page cramped and inconvenient for annotation.
For example, the following page, in my opinion, has too much blank space.

The solution is to use the "Crop Pages" function in "Organize Pages" of Adobe Acrobat Pro.

Cover and Back Cover
Generally speaking, covers are easy to obtain, just search on Google. But the back cover is not so easy.
For existing cover and back cover pictures, if the clarity is not high, you can use the following two websites to improve the image quality.
- Free AI Image Upscaler: Increase Image Resolution
- AI Image Upscaler - Enlarge and Enhance Your Photos for Free
For a certain book in a certain series, we can use the back cover of other books in this series to make the back cover of this book. Mainly achieved through the tool Adobe Acrobat Pro. If color picking is involved, Adobe Illustrator software may also be needed.
For an isolated book, that is, a book without a back cover for reference, my approach is to make the back cover myself. First pick the color of the cover, then add the book introduction obtained from the book's official website to the back cover, and finally get the avatar and introduction from the author's homepage and paste it on the back cover.
Show some covers and back covers I made myself.
Richter Birgit. From categories to homotopy theory[M]. Cambridge: Cambridge University press, 2020.

Modern Real Analysis (William P. Ziemer).

Dummit D.S., Foote R.M. Abstract Algebra[M/OL]. Wiley, 2003.

Li Wenwei "Lectures on Algebra".

Non-scanned Copy to Scanned Copy
Use the scan file function of Quark mobile app.


Split Double Page into Single Page
Use Quark mobile app, scan file, test paper, A 3 to A 4.


Appeal
I appeal to everyone to share the optimized PDF, such as uploading to Z-Library, to benefit others.
