Sibstitute image in PDF with OCR text

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Sibstitute image in PDF with OCR text

schildi
while looking for a way to substitute images in PDF documents I was pointed to
the itext library. So, please help me in solving the following situation:

There are some scanned books containing black/white images and the according
OCR text.
* the images are of type JPG and have a size of about 1MB for each page
* extracting images (extraction done by "pdfimages")
* and converting them to PNG by
  "convert $JPG -colors 2 -transparent none -quality 90 $PNG"
* reduces the size to about 110 KB

Now my question is:
How can I substitute the huga eJPG image by the PNG one without loosing the
OCR text?


Mit freundlichen Grüssen
Reiner Miericke

------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sibstitute image in PDF with OCR text

Paulo Soares-3
See chapter 16.1.2 in the book. See also http://www.javabeat.net/articles/327-resizing-an-image-in-an-existing-document-using-itext-1.html.

Paulo

-----Original Message-----
From: Reiner Miericke [mailto:[hidden email]]
Sent: Wednesday, May 18, 2011 12:14 PM
To: [hidden email]
Subject: [iText-questions] Sibstitute image in PDF with OCR text

while looking for a way to substitute images in PDF documents I was pointed to
the itext library. So, please help me in solving the following situation:

There are some scanned books containing black/white images and the according
OCR text.
* the images are of type JPG and have a size of about 1MB for each page
* extracting images (extraction done by "pdfimages")
* and converting them to PNG by
  "convert $JPG -colors 2 -transparent none -quality 90 $PNG"
* reduces the size to about 110 KB

Now my question is:
How can I substitute the huga eJPG image by the PNG one without loosing the
OCR text?


Mit freundlichen Grüssen
Reiner Miericke

------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php


Aviso Legal:
Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem.

Disclaimer:
This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, please send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message.

------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sibstitute image in PDF with OCR text

wenbuyi
In reply to this post by schildi
You can try this free online ocr to extract text from jpeg and png image.
Loading...