Quantcast

Re: ITextSharp doesn´t read this pdf

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ITextSharp doesn´t read this pdf

David Lestón

Sorry, Here is the psf file.

 

Thank you

 

De: David Lestón [mailto:[hidden email]]
Enviado el: miércoles, 29 de mayo de 2013 12:57
Para: '[hidden email]'
Asunto: ITextSharp doesn´t read this pdf

 

Hi,

 

I have one problem with this pdf I have attached. With other pdf files I execute this code and it works ok. It gives me the text of the page:

 

PdfReader inputDocument = new PdfReader(fileName);

StringBuilder text = new StringBuilder();

                for (int page = 1; page <= inputDocument.NumberOfPages; page++)

                {

                    

                    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

                    string currentText = PdfTextExtractor.GetTextFromPage(inputDocument, page, strategy);

 

                    currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));

                    text.Append(currentText);

                }

                inputDocument.Close();

                MessageBox.Show(text.ToString());

 

But with the pdf attached it doesn´t work. The pdf attached is from a scannner and generated with ghostscript and other examples are generated from Word.

 

Could anybody help me, please?

 

Thank you. Regards


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php

AcusesYcopias.pdf (684K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ITextSharp doesn´t read this pdf

Alexis Pigeon
Hi David,

Are you actually expecting this snippet of code to do OCR?

I've not inspected in detail the PDF you attached, but I think it only consists of images (resulting from the scanning), and no text at all.

Could it be that you are making some wrong assumptions about what the text extraction in iText does?

Cheers,
alexis

On 29 May 2013 13:07, David Lestón <[hidden email]> wrote:

Sorry, Here is the psf file.

 

Thank you

 

De: David Lestón [mailto:[hidden email]]
Enviado el: miércoles, 29 de mayo de 2013 12:57
Para: '[hidden email]'
Asunto: ITextSharp doesn´t read this pdf

 

Hi,

 

I have one problem with this pdf I have attached. With other pdf files I execute this code and it works ok. It gives me the text of the page:

 

PdfReader inputDocument = new PdfReader(fileName);

StringBuilder text = new StringBuilder();

                for (int page = 1; page <= inputDocument.NumberOfPages; page++)

                {

                    

                    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

                    string currentText = PdfTextExtractor.GetTextFromPage(inputDocument, page, strategy);

 

                    currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));

                    text.Append(currentText);

                }

                inputDocument.Close();

                MessageBox.Show(text.ToString());

 

But with the pdf attached it doesn´t work. The pdf attached is from a scannner and generated with ghostscript and other examples are generated from Word.

 

Could anybody help me, please?

 

Thank you. Regards


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ITextSharp doesn´t read this pdf

Iliadis Yannis
In reply to this post by David Lestón
Hi.

To extract text from the PDF, first of all there must be any text.

What you have is just a pdf with 2 scanned images.


2013/5/29 David Lestón <[hidden email]>

Sorry, Here is the psf file.

 

Thank you

 

De: David Lestón [mailto:[hidden email]]
Enviado el: miércoles, 29 de mayo de 2013 12:57
Para: '[hidden email]'
Asunto: ITextSharp doesn´t read this pdf

 

Hi,

 

I have one problem with this pdf I have attached. With other pdf files I execute this code and it works ok. It gives me the text of the page:

 

PdfReader inputDocument = new PdfReader(fileName);

StringBuilder text = new StringBuilder();

                for (int page = 1; page <= inputDocument.NumberOfPages; page++)

                {

                    

                    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

                    string currentText = PdfTextExtractor.GetTextFromPage(inputDocument, page, strategy);

 

                    currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));

                    text.Append(currentText);

                }

                inputDocument.Close();

                MessageBox.Show(text.ToString());

 

But with the pdf attached it doesn´t work. The pdf attached is from a scannner and generated with ghostscript and other examples are generated from Word.

 

Could anybody help me, please?

 

Thank you. Regards


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ITextSharp doesn´t read this pdf

Vishal S D
In reply to this post by Alexis Pigeon
I Need help on ligature pdf  examples using  Itextsharp  can any one help me in this ?

On Wed, May 29, 2013 at 5:02 PM, Alexis Pigeon <[hidden email]> wrote:
Hi David,

Are you actually expecting this snippet of code to do OCR?

I've not inspected in detail the PDF you attached, but I think it only consists of images (resulting from the scanning), and no text at all.

Could it be that you are making some wrong assumptions about what the text extraction in iText does?

Cheers,
alexis

On 29 May 2013 13:07, David Lestón <[hidden email]> wrote:

Sorry, Here is the psf file.

 

Thank you

 

De: David Lestón [mailto:[hidden email]]
Enviado el: miércoles, 29 de mayo de 2013 12:57
Para: '[hidden email]'
Asunto: ITextSharp doesn´t read this pdf

 

Hi,

 

I have one problem with this pdf I have attached. With other pdf files I execute this code and it works ok. It gives me the text of the page:

 

PdfReader inputDocument = new PdfReader(fileName);

StringBuilder text = new StringBuilder();

                for (int page = 1; page <= inputDocument.NumberOfPages; page++)

                {

                    

                    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

                    string currentText = PdfTextExtractor.GetTextFromPage(inputDocument, page, strategy);

 

                    currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));

                    text.Append(currentText);

                }

                inputDocument.Close();

                MessageBox.Show(text.ToString());

 

But with the pdf attached it doesn´t work. The pdf attached is from a scannner and generated with ghostscript and other examples are generated from Word.

 

Could anybody help me, please?

 

Thank you. Regards


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ITextSharp doesn´t read this pdf

David Lestón
In reply to this post by Alexis Pigeon

Hi.

 

I suposed that an Image could be red and extract text from it. And If I want to obtain text from the images, waht is the appropiated form or its impossible?

 

Thank you

 

De: Alexis Pigeon [mailto:[hidden email]]
Enviado el: miércoles, 29 de mayo de 2013 13:33
Para: Post all your questions about iText here
Asunto: Re: [iText-questions] ITextSharp doesn´t read this pdf

 

Hi David,

 

Are you actually expecting this snippet of code to do OCR?

I've not inspected in detail the PDF you attached, but I think it only consists of images (resulting from the scanning), and no text at all.

Could it be that you are making some wrong assumptions about what the text extraction in iText does?

Cheers,
alexis

 

On 29 May 2013 13:07, David Lestón <[hidden email]> wrote:

Sorry, Here is the psf file.

 

Thank you

 

De: David Lestón [mailto:[hidden email]]
Enviado el: miércoles, 29 de mayo de 2013 12:57
Para: '[hidden email]'
Asunto: ITextSharp doesn´t read this pdf

 

Hi,

 

I have one problem with this pdf I have attached. With other pdf files I execute this code and it works ok. It gives me the text of the page:

 

PdfReader inputDocument = new PdfReader(fileName);

StringBuilder text = new StringBuilder();

                for (int page = 1; page <= inputDocument.NumberOfPages; page++)

                {

                    

                    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

                    string currentText = PdfTextExtractor.GetTextFromPage(inputDocument, page, strategy);

 

                    currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));

                    text.Append(currentText);

                }

                inputDocument.Close();

                MessageBox.Show(text.ToString());

 

But with the pdf attached it doesn´t work. The pdf attached is from a scannner and generated with ghostscript and other examples are generated from Word.

 

Could anybody help me, please?

 

Thank you. Regards


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php

 


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ITextSharp doesn´t read this pdf

Alexis Pigeon
In reply to this post by Vishal S D
Hi Vishal,

On 29 May 2013 13:44, Vishal S D <[hidden email]> wrote:
I Need help on ligature pdf  examples using  Itextsharp  can any one help me in this ?

Please don't hijack threads from other issues, but rather create your own.

Thanks,
alexis

------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ITextSharp doesn´t read this pdf

Alexis Pigeon
In reply to this post by David Lestón



On 29 May 2013 13:45, David Lestón <[hidden email]> wrote:

Hi.

 

I suposed that an Image could be red and extract text from it. And If I want to obtain text from the images, waht is the appropiated form or its impossible?


Then you supposed incorrectly :)
You'll have to use OCR software for that task.

http://en.wikipedia.org/wiki/Comparison_of_optical_character_recognition_software

HTH,
alexis

------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ITextSharp doesn´t read this pdf

JonyGreen
This post has NOT been accepted by the mailing list yet.
In reply to this post by David Lestón
you can try this free online pdf to text converter to convert pdf to text online.
Loading...