How to Extract Text from Scanned PDFs
Scanned PDFs are just images β you can't copy text from them. Learn how OCR extracts text from scanned documents quickly and accurately to make them searchable.
You open a scanned PDF, try to select some text, and... nothing happens. You can't copy it, can't search it, can't do anything with it. The words are right there on the screen, but your computer treats them as a picture of words, not actual text.
This is the #1 frustration with scanned documents. But there's a straightforward fix: OCR.
Why You Can't Copy Text from Scanned PDFs
When you scan a paper document, the scanner takes a photograph of each page. That photograph gets saved inside a PDF file. But it's still just an image β a grid of colored pixels.
Your computer doesn't "see" the letters. It sees patterns of light and dark. That's why:
- Ctrl+F doesn't work: No text to search
- Copy-paste fails: There's nothing to select
- Screen readers can't read it: The document is inaccessible
- File size is huge: Images take far more space than text
What Is OCR?
OCR (Optical Character Recognition) is technology that looks at images of text and converts them into actual, selectable, searchable text.
It works like this:
- OCR examines the image pixel by pixel
- Identifies shapes that look like letters and numbers
- Determines which character each shape represents
- Outputs real text you can copy, search, and edit
Modern OCR is remarkably accurate β typically 95-99% for clean scans of printed text.
How to Extract Text from a Scanned PDF
Method 1: Use Our OCR Tool (Recommended)
The fastest way to make a scanned PDF searchable:
- Go to our PDF OCR tool
- Upload your scanned PDF
- The tool processes each page with OCR
- Download your searchable PDF
What you get: A PDF that looks identical to the original, but with an invisible text layer underneath. You can select text, search with Ctrl+F, and copy-paste.
Time: 5-30 seconds depending on page count. Privacy: Everything runs in your browser. Your document never leaves your device.
Method 2: Convert to Word
If you need to edit the text (not just search/copy):
- Use our PDF to Word converter
- Upload the scanned PDF
- Download the editable Word document
- Edit as needed
Best for: When you need to modify the content, not just read it.
Method 3: Google Drive (Free Alternative)
Google Drive has built-in OCR:
- Upload your scanned PDF to Google Drive
- Right-click β Open with β Google Docs
- Google automatically runs OCR
- Text appears in a Google Doc
Pros: Free, decent accuracy Cons: Formatting gets destroyed. You get raw text, not a nicely formatted document. Also, your file goes to Google's servers.
Method 4: Adobe Acrobat
If you have Acrobat Pro:
- Open the scanned PDF
- Tools β Enhance Scans β Recognize Text
- Choose language and output settings
- Run OCR
Pros: Excellent accuracy, preserves formatting Cons: Requires expensive subscription
What Affects OCR Accuracy?
Not all scans are created equal. Here's what impacts how well OCR works:
Scan Quality
| Quality | Expected Accuracy | Notes |
|---|---|---|
| 300 DPI, clean | 98-99% | Ideal for OCR |
| 200 DPI, clean | 95-98% | Good enough |
| 150 DPI or less | 85-95% | Accuracy drops |
| Blurry/skewed | 70-85% | May need manual correction |
| Poor photocopy | 60-80% | OCR struggles significantly |
Rule of thumb: 300 DPI produces the best OCR results. If you haven't scanned yet, use 300 DPI.
Document Characteristics
OCR works best with:
- Printed text (not handwriting)
- Standard fonts (Times New Roman, Arial, etc.)
- Black text on white background
- Clean, straight pages
- Common languages (English, Spanish, French, German, etc.)
OCR struggles with:
- Handwritten text (accuracy drops to 60-80%)
- Decorative or unusual fonts
- Colored backgrounds or watermarks
- Skewed or rotated pages
- Mixed languages on one page
- Very small text (under 8pt)
Page Orientation
If pages are skewed (slightly rotated from scanning), OCR accuracy drops. Many OCR tools auto-correct for slight skew, but if your pages are significantly rotated, rotate them first before running OCR.
Common OCR Use Cases
Digitizing Old Documents
Situation: You have boxes of paper documents that need to be searchable.
Approach:
- Scan everything at 300 DPI
- Run OCR in batches
- File the searchable PDFs in your document management system
Result: Decades of paper documents become instantly searchable.
Making Legal Documents Searchable
Situation: Court filings, contracts, or case files scanned as images.
Approach:
- OCR the documents
- Use Ctrl+F to find specific clauses, dates, or names
- Copy relevant text for briefs or summaries
Time saved: Hours of manual reading replaced by seconds of searching.
Processing Receipts and Invoices
Situation: Scanned receipts for expense reports or tax filings.
Approach:
- Scan or photograph receipts
- Convert to PDF if needed
- Run OCR to extract amounts, dates, and vendor names
Result: Searchable financial records.
Academic Research
Situation: Older journal articles or books only available as scanned PDFs.
Approach:
- Download the scanned PDF
- Run OCR
- Search for keywords, copy quotes with proper citations
Time saved: Instead of reading 50 pages to find one quote, search in 2 seconds.
Accessibility Compliance
Situation: Your organization needs documents to be accessible to screen readers.
Approach:
- OCR all scanned documents
- The text layer makes documents screen-reader compatible
- Meet accessibility requirements (ADA, WCAG, Section 508)
Result: Inclusive documents that everyone can access.
OCR Tips for Better Results
Before Scanning
- Use 300 DPI: The sweet spot for file size vs. OCR accuracy
- Use a flatbed scanner: Better quality than phone cameras for multi-page documents
- Ensure clean glass: Dust and smudges cause OCR errors
- Align pages straight: Skew reduces accuracy
Before Running OCR
- Check page orientation: Rotate any sideways or upside-down pages first
- Remove blank pages: They waste processing time
- Crop margins: Large dark borders can confuse OCR
After OCR
- Spot-check accuracy: Read a few paragraphs and compare to the original
- Check numbers carefully: OCR sometimes confuses 0/O, 1/l, 5/S
- Verify special characters: Symbols like @, #, & can be misread
- Compress the result: OCR adds a text layer, which slightly increases file size β compression offsets this
OCR Accuracy by Content Type
| Content | Typical Accuracy | Notes |
|---|---|---|
| Typed business letters | 99%+ | Best-case scenario |
| Book pages | 97-99% | Very reliable |
| Magazine/newspaper | 95-98% | Column layouts can cause issues |
| Tables and spreadsheets | 90-95% | Structure may need manual fixing |
| Forms with checkboxes | 85-95% | Checkmarks sometimes misread |
| Handwritten notes | 60-80% | Highly variable |
| Faded or aged documents | 70-90% | Depends on contrast |
| Receipts (thermal paper) | 80-90% | Fading is the main problem |
Troubleshooting Common OCR Problems
Problem: OCR Returns Gibberish
Cause: Image is too low quality, heavily compressed, or in a script the OCR engine doesn't support.
Fix:
- Re-scan at higher DPI if possible
- Increase image contrast before OCR
- Make sure you've selected the correct language
Problem: Text Is Extracted but Formatting Is Wrong
Cause: OCR reads text in the wrong order (e.g., reading across columns instead of down).
Fix:
- Use OCR tools that understand document layout
- For complex layouts, try converting to Word first, then fix formatting
Problem: Numbers Are Wrong
Cause: OCR commonly confuses similar characters (0/O, 1/l/I, 8/B).
Fix:
- Always proofread numbers manually
- For financial documents, double-check every figure
Problem: OCR Is Very Slow
Cause: Large files with many pages, or low-powered device.
Fix:
- Process in smaller batches (split, OCR, then merge)
- Close other browser tabs to free up memory
- Use a desktop/laptop instead of a phone
OCR vs. Manual Retyping
When does OCR beat manual data entry?
| Factor | OCR | Manual Retyping |
|---|---|---|
| Speed | 1-30 seconds per page | 5-15 minutes per page |
| Accuracy | 95-99% (clean scans) | 95-99% (human error exists too) |
| Cost | Free with our tool | Your time, or hiring a typist |
| Formatting | Mostly preserved | Requires recreation |
| Best for | Any volume | Very short documents (<1 page) |
Bottom line: OCR wins for anything longer than a paragraph.
Ready to Extract Text?
Stop squinting at scanned PDFs and manually retyping content. OCR handles it in seconds.
Extract Text with OCR β upload your scanned PDF, get searchable text back. Free, private, no account needed.
Need to edit the extracted text? Convert your scanned PDF directly to Word for full editing capabilities.
Ready to try it yourself?
Put what you learned into practice with our free tools.
Related Articles
PDF OCR: Turn Scanned Documents into Searchable, Editable Text
Complete guide to PDF OCR technology. Learn how to convert scanned PDFs and images into searchable, editable text with high accuracy.
PDF Editing Basics: How to Edit Text, Images, and More in PDFs
Learn how to edit PDFs like a pro. Modify text, replace images, add content, and make changes to your PDFs without specialized software.
How to Rotate PDF Pages: Quick Fix Guide
Learn how to rotate PDF pages correctlyβfix upside down, sideways, and orientation issues. Quick guide with tips for bulk rotation.