Copying OCR'd hidden text from one PDF to another while retaining original images?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Copying OCR'd hidden text from one PDF to another while retaining original images?

Arthur Murray
Is there an example snippet that can help with this or a pointer on
how to approach this?

I have a scanned book as a PDF, for example this google one:
http://ia600307.us.archive.org/21/items/lightsandshadow00whipgoog/lightsandshadow00whipgoog.pdf

When I OCR this in AcrobatX the filesize grows from 12 megs to 54 megs
(the images get bigger even though I use Searchable Image "Exact").
I'd like to open the original non-OCRd PDF and copy the OCRed hidden
text from the second larger OCR'd PDF into it, hopefully retaining a
smaller image filesize but gaining the ability to search and highlight
the PDF.

Thanks.

------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Copying OCR'd hidden text from one PDF to another while retaining original images?

Leonard Rosenthol-3
You could do a Save as Optimized PDF in Acrobat to get the size back down.

There  isn't anything in iText to make it easy to do what you want, though it would be possible (with a detailed understanding of PDF constructs).

Leonard

-----Original Message-----
From: Arthur Murray [mailto:[hidden email]]
Sent: Thursday, January 12, 2012 6:07 PM
To: [hidden email]
Subject: [iText-questions] Copying OCR'd hidden text from one PDF to another while retaining original images?

Is there an example snippet that can help with this or a pointer on how to approach this?

I have a scanned book as a PDF, for example this google one:
http://ia600307.us.archive.org/21/items/lightsandshadow00whipgoog/lightsandshadow00whipgoog.pdf

When I OCR this in AcrobatX the filesize grows from 12 megs to 54 megs (the images get bigger even though I use Searchable Image "Exact").
I'd like to open the original non-OCRd PDF and copy the OCRed hidden text from the second larger OCR'd PDF into it, hopefully retaining a smaller image filesize but gaining the ability to search and highlight the PDF.

Thanks.

------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php

------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Copying OCR'd hidden text from one PDF to another while retaining original images?

JonyGreen
In reply to this post by Arthur Murray
you can try this free online pdf to text converter to convert pdf to text online, only ocr text in the pdf document, the image will not be converted.
Loading...