Inserting searchable text layer beneath image

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Inserting searchable text layer beneath image

nycsingle3
Hi,

We use iText to produce a PDF document that is a combination of data, user supplied PDFs and system generated PDFs, all combined into one large PDF.  Some of the user supplied PDF files contain pages that are scanned documents and contain no searchable text, only an image of the scanned document.  I have been looking for OCR products that can OCR/extract text from these pages that are just images, with limited success. I have found one, which I am experimenting with from Asprise.  Anyone have experience with OCR/extracting text from scanned pages inside a PDF?  Supposing I can OCR/extract this text, can I use iText to insert a hidden text layer beneath these images that would make these pages text searchable? I don't want to change the appearance of the page, but simply insert the text underneath so it becomes a searchable page.

Thanks,
Jody

More new features than ever. Check out the new AIM(R) Mail!

-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Inserting searchable text layer beneath image

Leonard Rosenthol
You can do this, but you'll need more than just the text.  You'll also need the POSITION of each character/word on the page, so that you properly position the hidden text along with the image to ensure proper selection of the text by viewers (like Acrobat).

Leonard

On Dec 14, 2007, at 7:29 AM, [hidden email] wrote:

Hi,

We use iText to produce a PDF document that is a combination of data, user supplied PDFs and system generated PDFs, all combined into one large PDF.  Some of the user supplied PDF files contain pages that are scanned documents and contain no searchable text, only an image of the scanned document.  I have been looking for OCR products that can OCR/extract text from these pages that are just images, with limited success. I have found one, which I am experimenting with from Asprise.  Anyone have experience with OCR/extracting text from scanned pages inside a PDF?  Supposing I can OCR/extract this text, can I use iText to insert a hidden text layer beneath these images that would make these pages text searchable? I don't want to change the appearance of the page, but simply insert the text underneath so it becomes a searchable page.

Thanks,
Jody

More new features than ever. Check out the new AIM(R) Mail!
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
iText-questions mailing list


-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Inserting searchable text layer beneath image

Kavitha_Govindaraj
Hi Leonard,
I have the same question as Jody and I have a text file with position information. Could you tell me which class or package i need to use in order to accomplish this?

Thanks in advance,
Kavitha.


Leonard Rosenthol wrote
You can do this, but you'll need more than just the text.  You'll  
also need the POSITION of each character/word on the page, so that  
you properly position the hidden text along with the image to ensure  
proper selection of the text by viewers (like Acrobat).

Leonard

On Dec 14, 2007, at 7:29 AM, nycsingle3@aim.com wrote:

> Hi,
>
> We use iText to produce a PDF document that is a combination of  
> data, user supplied PDFs and system generated PDFs, all combined  
> into one large PDF.  Some of the user supplied PDF files contain  
> pages that are scanned documents and contain no searchable text,  
> only an image of the scanned document.  I have been looking for OCR  
> products that can OCR/extract text from these pages that are just  
> images, with limited success. I have found one, which I am  
> experimenting with from Asprise.  Anyone have experience with OCR/
> extracting text from scanned pages inside a PDF?  Supposing I can  
> OCR/extract this text, can I use iText to insert a hidden text  
> layer beneath these images that would make these pages text  
> searchable? I don't want to change the appearance of the page, but  
> simply insert the text underneath so it becomes a searchable page.
>
> Thanks,
> Jody
> More new features than ever. Check out the new AIM(R) Mail!
> ----------------------------------------------------------------------
> ---
> SF.Net email is sponsored by:
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services
> for just about anything Open Source.
> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/ 
> marketplace_______________________________________________
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> Buy the iText book: http://itext.ugent.be/itext-in-action/


-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Inserting searchable text layer beneath image

iText mailing list
Kavitha_Govindaraj wrote:
> Hi Leonard,
> I have the same question as Jody and I have a text file with position
> information. Could you tell me which class or package i need to use in order
> to accomplish this?

This will get you started:
http://www.1t3xt.info/examples/browse/?page=example&id=185
--
This answer is provided by 1T3XT BVBA

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Inserting searchable text layer beneath image

Kavitha_Govindaraj
Thank you Leonard for the helpful link.

1T3XT info wrote
Kavitha_Govindaraj wrote:
> Hi Leonard,
> I have the same question as Jody and I have a text file with position
> information. Could you tell me which class or package i need to use in order
> to accomplish this?

This will get you started:
http://www.1t3xt.info/examples/browse/?page=example&id=185
--
This answer is provided by 1T3XT BVBA

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Inserting searchable text layer beneath image

Leonard Rosenthol
In reply to this post by Kavitha_Govindaraj
Use PDFStamper and associated classes to add the text.

Leonard

On Apr 15, 2008, at 7:29 PM, Kavitha_Govindaraj wrote:

>
> Hi Leonard,
> I have the same question as Jody and I have a text file with position
> information. Could you tell me which class or package i need to use  
> in order
> to accomplish this?
>
> Thanks in advance,
> Kavitha.
>
>
>
> Leonard Rosenthol wrote:
>>
>> You can do this, but you'll need more than just the text.  You'll
>> also need the POSITION of each character/word on the page, so that
>> you properly position the hidden text along with the image to ensure
>> proper selection of the text by viewers (like Acrobat).
>>
>> Leonard
>>
>> On Dec 14, 2007, at 7:29 AM, [hidden email] wrote:
>>
>>> Hi,
>>>
>>> We use iText to produce a PDF document that is a combination of
>>> data, user supplied PDFs and system generated PDFs, all combined
>>> into one large PDF.  Some of the user supplied PDF files contain
>>> pages that are scanned documents and contain no searchable text,
>>> only an image of the scanned document.  I have been looking for OCR
>>> products that can OCR/extract text from these pages that are just
>>> images, with limited success. I have found one, which I am
>>> experimenting with from Asprise.  Anyone have experience with OCR/
>>> extracting text from scanned pages inside a PDF?  Supposing I can
>>> OCR/extract this text, can I use iText to insert a hidden text
>>> layer beneath these images that would make these pages text
>>> searchable? I don't want to change the appearance of the page, but
>>> simply insert the text underneath so it becomes a searchable page.
>>>
>>> Thanks,
>>> Jody
>>> More new features than ever. Check out the new AIM(R) Mail!
>>> --------------------------------------------------------------------
>>> --
>>> ---
>>> SF.Net email is sponsored by:
>>> Check out the new SourceForge.net Marketplace.
>>> It's the best place to buy or sell services
>>> for just about anything Open Source.
>>> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/
>>> marketplace_______________________________________________
>>> iText-questions mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/itext-questions
>>> Buy the iText book: http://itext.ugent.be/itext-in-action/
>>
>>
>> ---------------------------------------------------------------------
>> ----
>> SF.Net email is sponsored by:
>> Check out the new SourceForge.net Marketplace.
>> It's the best place to buy or sell services
>> for just about anything Open Source.
>> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/ 
>> marketplace
>> _______________________________________________
>> iText-questions mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/itext-questions
>> Buy the iText book: http://itext.ugent.be/itext-in-action/
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Inserting- 
> searchable-text-layer-beneath-image-tp14338145p16713584.html
> Sent from the iText - General mailing list archive at Nabble.com.
>
>
> ----------------------------------------------------------------------
> ---
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> Don't miss this year's exciting event. There's still time to save  
> $100.
> Use priority code J8TL2D2.
> http://ad.doubleclick.net/clk;198757673;13503038;p?http:// 
> java.sun.com/javaone
> _______________________________________________
> iText-questions mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> Do you like iText?
> Buy the iText book: http://www.1t3xt.com/docs/book.php
> Or leave a tip: https://tipit.to/itexttipjar
>


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Inserting searchable text layer beneath image

mikelilin
This post was updated on .
In reply to this post by nycsingle3
There is a free online pdf ocr to convert pdf to text and ms word. And if you want to extract text from image, you can try this free online ocr, it can save recognized text to searchable pdf.
Loading...