How to Extract Text from Scanned PDFs

You open a scanned PDF, try to select some text, and... nothing happens. You can't copy it, can't search it, can't do anything with it. The words are right there on the screen, but your computer treats them as a picture of words, not actual text.

This is the #1 frustration with scanned documents. But there's a straightforward fix: OCR.

Why You Can't Copy Text from Scanned PDFs

When you scan a paper document, the scanner takes a photograph of each page. That photograph gets saved inside a PDF file. But it's still just an image — a grid of colored pixels.

Your computer doesn't "see" the letters. It sees patterns of light and dark. That's why:

Ctrl+F doesn't work: No text to search
Copy-paste fails: There's nothing to select
Screen readers can't read it: The document is inaccessible
File size is huge: Images take far more space than text

What Is OCR?

OCR (Optical Character Recognition) is technology that looks at images of text and converts them into actual, selectable, searchable text.

It works like this:

OCR examines the image pixel by pixel
Identifies shapes that look like letters and numbers
Determines which character each shape represents
Outputs real text you can copy, search, and edit

Modern OCR is remarkably accurate — typically 95-99% for clean scans of printed text.

How to Extract Text from a Scanned PDF

Method 1: Use Our OCR Tool (Recommended)

The fastest way to make a scanned PDF searchable:

Go to our PDF OCR tool
Upload your scanned PDF
The tool processes each page with OCR
Download your searchable PDF

What you get: A PDF that looks identical to the original, but with an invisible text layer underneath. You can select text, search with Ctrl+F, and copy-paste.

Time: 5-30 seconds depending on page count. Privacy: Everything runs in your browser. Your document never leaves your device.

Method 2: Convert to Word

If you need to edit the text (not just search/copy):

Use our PDF to Word converter
Upload the scanned PDF
Download the editable Word document
Edit as needed

Best for: When you need to modify the content, not just read it.

Method 3: Google Drive (Free Alternative)

Google Drive has built-in OCR:

Upload your scanned PDF to Google Drive
Right-click → Open with → Google Docs
Google automatically runs OCR
Text appears in a Google Doc

Pros: Free, decent accuracy Cons: Formatting gets destroyed. You get raw text, not a nicely formatted document. Also, your file goes to Google's servers.

Method 4: Adobe Acrobat

If you have Acrobat Pro:

Open the scanned PDF
Tools → Enhance Scans → Recognize Text
Choose language and output settings
Run OCR

Pros: Excellent accuracy, preserves formatting Cons: Requires expensive subscription

What Affects OCR Accuracy?

Not all scans are created equal. Here's what impacts how well OCR works:

Scan Quality

Quality	Expected Accuracy	Notes
300 DPI, clean	98-99%	Ideal for OCR
200 DPI, clean	95-98%	Good enough
150 DPI or less	85-95%	Accuracy drops
Blurry/skewed	70-85%	May need manual correction
Poor photocopy	60-80%	OCR struggles significantly

Rule of thumb: 300 DPI produces the best OCR results. If you haven't scanned yet, use 300 DPI.

Document Characteristics

OCR works best with:

Printed text (not handwriting)
Standard fonts (Times New Roman, Arial, etc.)
Black text on white background
Clean, straight pages
Common languages (English, Spanish, French, German, etc.)

OCR struggles with:

Handwritten text (accuracy drops to 60-80%)
Decorative or unusual fonts
Colored backgrounds or watermarks
Skewed or rotated pages
Mixed languages on one page
Very small text (under 8pt)

Page Orientation

If pages are skewed (slightly rotated from scanning), OCR accuracy drops. Many OCR tools auto-correct for slight skew, but if your pages are significantly rotated, rotate them first before running OCR.

Common OCR Use Cases

Digitizing Old Documents

Situation: You have boxes of paper documents that need to be searchable.

Approach:

Scan everything at 300 DPI
Run OCR in batches
File the searchable PDFs in your document management system

Result: Decades of paper documents become instantly searchable.

Making Legal Documents Searchable

Situation: Court filings, contracts, or case files scanned as images.

Approach:

OCR the documents
Use Ctrl+F to find specific clauses, dates, or names
Copy relevant text for briefs or summaries

Time saved: Hours of manual reading replaced by seconds of searching.

Processing Receipts and Invoices

Situation: Scanned receipts for expense reports or tax filings.

Approach:

Scan or photograph receipts
Convert to PDF if needed
Run OCR to extract amounts, dates, and vendor names

Result: Searchable financial records.

Academic Research

Situation: Older journal articles or books only available as scanned PDFs.

Approach:

Download the scanned PDF
Run OCR
Search for keywords, copy quotes with proper citations

Time saved: Instead of reading 50 pages to find one quote, search in 2 seconds.

Accessibility Compliance

Situation: Your organization needs documents to be accessible to screen readers.

Approach:

OCR all scanned documents
The text layer makes documents screen-reader compatible
Meet accessibility requirements (ADA, WCAG, Section 508)

Result: Inclusive documents that everyone can access.

OCR Tips for Better Results

Before Scanning

Use 300 DPI: The sweet spot for file size vs. OCR accuracy
Use a flatbed scanner: Better quality than phone cameras for multi-page documents
Ensure clean glass: Dust and smudges cause OCR errors
Align pages straight: Skew reduces accuracy

Before Running OCR

Check page orientation: Rotate any sideways or upside-down pages first
Remove blank pages: They waste processing time
Crop margins: Large dark borders can confuse OCR

After OCR

Spot-check accuracy: Read a few paragraphs and compare to the original
Check numbers carefully: OCR sometimes confuses 0/O, 1/l, 5/S
Verify special characters: Symbols like @, #, & can be misread
Compress the result: OCR adds a text layer, which slightly increases file size — compression offsets this

OCR Accuracy by Content Type

Content	Typical Accuracy	Notes
Typed business letters	99%+	Best-case scenario
Book pages	97-99%	Very reliable
Magazine/newspaper	95-98%	Column layouts can cause issues
Tables and spreadsheets	90-95%	Structure may need manual fixing
Forms with checkboxes	85-95%	Checkmarks sometimes misread
Handwritten notes	60-80%	Highly variable
Faded or aged documents	70-90%	Depends on contrast
Receipts (thermal paper)	80-90%	Fading is the main problem

Troubleshooting Common OCR Problems

Problem: OCR Returns Gibberish

Cause: Image is too low quality, heavily compressed, or in a script the OCR engine doesn't support.

Fix:

Re-scan at higher DPI if possible
Increase image contrast before OCR
Make sure you've selected the correct language

Problem: Text Is Extracted but Formatting Is Wrong

Cause: OCR reads text in the wrong order (e.g., reading across columns instead of down).

Fix:

Use OCR tools that understand document layout
For complex layouts, try converting to Word first, then fix formatting

Problem: Numbers Are Wrong

Cause: OCR commonly confuses similar characters (0/O, 1/l/I, 8/B).

Fix:

Always proofread numbers manually
For financial documents, double-check every figure

Problem: OCR Is Very Slow

Cause: Large files with many pages, or low-powered device.

Fix:

Process in smaller batches (split, OCR, then merge)
Close other browser tabs to free up memory
Use a desktop/laptop instead of a phone

OCR vs. Manual Retyping

When does OCR beat manual data entry?

Factor	OCR	Manual Retyping
Speed	1-30 seconds per page	5-15 minutes per page
Accuracy	95-99% (clean scans)	95-99% (human error exists too)
Cost	Free with our tool	Your time, or hiring a typist
Formatting	Mostly preserved	Requires recreation
Best for	Any volume	Very short documents (<1 page)

Bottom line: OCR wins for anything longer than a paragraph.

Ready to Extract Text?

Stop squinting at scanned PDFs and manually retyping content. OCR handles it in seconds.

Extract Text with OCR — upload your scanned PDF, get searchable text back. Free, private, no account needed.

Need to edit the extracted text? Convert your scanned PDF directly to Word for full editing capabilities.

How to Extract Text from Scanned PDFs

Why You Can't Copy Text from Scanned PDFs

What Is OCR?

How to Extract Text from a Scanned PDF

Method 1: Use Our OCR Tool (Recommended)

Method 2: Convert to Word

Method 3: Google Drive (Free Alternative)

Method 4: Adobe Acrobat

What Affects OCR Accuracy?

Scan Quality

Document Characteristics

Page Orientation

Common OCR Use Cases

Digitizing Old Documents

Making Legal Documents Searchable

Processing Receipts and Invoices

Academic Research

Accessibility Compliance

OCR Tips for Better Results

Before Scanning

Before Running OCR

After OCR

OCR Accuracy by Content Type

Troubleshooting Common OCR Problems

Problem: OCR Returns Gibberish

Problem: Text Is Extracted but Formatting Is Wrong

Problem: Numbers Are Wrong

Problem: OCR Is Very Slow

OCR vs. Manual Retyping

Ready to Extract Text?

AttendPad — Attendance, unclipped.

Ready to try it yourself?

Related Articles

PDF OCR: Turn Scanned Documents into Searchable, Editable Text

PDF Editing Basics: How to Edit Text, Images, and More in PDFs

How to Rotate PDF Pages: Quick Fix Guide