Quantcast

How to check if a PDF is OCR recognized

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to check if a PDF is OCR recognized

Bernhard
Dear all,

I've a lot of all pdf Files - some of them are bitmaps some of them are ocr recognized.
Now I plan to let alle pfiles be ocr recognized but I dont want to scan all documents if this is possible because I think the biggest part of them is already recognized.

Is there a way to check with the iText library if a existing pdf has a ocr layer or not?

Please let me know :-)
Maybe there is another possibiliy (than iText) to solve my problem?

Thanks in advance
bernhard
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to check if a PDF is OCR recognized

iText mailing list
On 19/07/2011 14:58, Bernhard Haslinger wrote:
> Is there a way to check with the iText library if a existing pdf has a ocr
> layer or not?
iText can parse PDFs into plain text, provided that the text doesn't
consist of image.
- if you use iText to parse your PDFs, and there's no text; then the PDF
doesn't have an OCR layer.
- if you use iText to parse your PDFs, and most pages have text; then it
probably has an OCR layer.
Hope this helps.

------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to check if a PDF is OCR recognized

AJ Weber
In reply to this post by Bernhard
I use iText for everything I can.  For this specific case, I use pdfbox to
extract the text from the first few pages (I first check how many pages are
in the PDF), and if the number of words exceeds a preset threshold, I assume
the PDF is text-indexible.

It's not foolproof, but it's part of my OCR solution, so if the PDF has less
than the threshold number of words, I send it for OCR so it's an
optimization more than anything (if it really is text-based, and the first
page or two happens to be a coverpage or something that happens to have very
few words by design, it won't hurt that I send it for OCR anyway -- just
takes a little longer).

-AJ

----- Original Message -----
From: "Bernhard Haslinger" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, July 19, 2011 8:58 AM
Subject: [iText-questions] How to check if a PDF is OCR recognized


> Dear all,
>
> I've a lot of all pdf Files - some of them are bitmaps some of them are
> ocr
> recognized.
> Now I plan to let alle pfiles be ocr recognized but I dont want to scan
> all
> documents if this is possible because I think the biggest part of them is
> already recognized.
>
> Is there a way to check with the iText library if a existing pdf has a ocr
> layer or not?
>
> Please let me know :-)
> Maybe there is another possibiliy (than iText) to solve my problem?
>
> Thanks in advance
> bernhard
>
> --
> View this message in context:
> http://itext-general.2136553.n4.nabble.com/How-to-check-if-a-PDF-is-OCR-recognized-tp3678057p3678057.html
> Sent from the iText - General mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Magic Quadrant for Content-Aware Data Loss Prevention
> Research study explores the data loss prevention market. Includes in-depth
> analysis on the changes within the DLP market, and the criteria used to
> evaluate the strengths and weaknesses of these DLP solutions.
> http://www.accelacomm.com/jaw/sfnl/114/51385063/
> _______________________________________________
> iText-questions mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> iText(R) is a registered trademark of 1T3XT BVBA.
> Many questions posted to this list can (and will) be answered with a
> reference to the iText book: http://www.itextpdf.com/book/
> Please check the keywords list before you ask for examples:
> http://itextpdf.com/themes/keywords.php
>


------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to check if a PDF is OCR recognized

iText mailing list
On 19/07/2011 15:29, AJ Weber wrote:
> It's not foolproof
True, it's an "educated guess", but I'm pretty sure the error margin is
very low.
By the way: iText can extract text from a PDF too ;-)

------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to check if a PDF is OCR recognized

AJ Weber
Yeah, I'm starting to realize that.  I don't use the library every day, but
have used it in a bunch of projects.  The base of my knowledge is probably
dated back to v2.0 code and at that time, I think the standard message was
"iText doesn't do text extraction, we don't want to do it, use pdfbox or
jpedal to do that."

In my spare time, I will try and go back and convert my code to use the
latest iText classes to do that.  As I said, I always prefer to use iText
over other libraries whenever possible.

Thanks,
AJ


----- Original Message -----
From: "1T3XT BVBA" <[hidden email]>
To: "Post all your questions about iText here"
<[hidden email]>
Sent: Tuesday, July 19, 2011 9:34 AM
Subject: Re: [iText-questions] How to check if a PDF is OCR recognized


> On 19/07/2011 15:29, AJ Weber wrote:
>> It's not foolproof
> True, it's an "educated guess", but I'm pretty sure the error margin is
> very low.
> By the way: iText can extract text from a PDF too ;-)
>
> ------------------------------------------------------------------------------
> Magic Quadrant for Content-Aware Data Loss Prevention
> Research study explores the data loss prevention market. Includes in-depth
> analysis on the changes within the DLP market, and the criteria used to
> evaluate the strengths and weaknesses of these DLP solutions.
> http://www.accelacomm.com/jaw/sfnl/114/51385063/
> _______________________________________________
> iText-questions mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> iText(R) is a registered trademark of 1T3XT BVBA.
> Many questions posted to this list can (and will) be answered with a
> reference to the iText book: http://www.itextpdf.com/book/
> Please check the keywords list before you ask for examples:
> http://itextpdf.com/themes/keywords.php
>


------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to check if a PDF is OCR recognized

iText mailing list
On 19/07/2011 15:47, AJ Weber wrote:
> I think the standard message was
> "iText doesn't do text extraction, we don't want to do it, use pdfbox or
> jpedal to do that."
Yes, that sounds like a literal quote from the first book.
But that was written 5 to 6 years ago. Time flies ;-)

------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to check if a PDF is OCR recognized

AJ Weber
I bought the second edition recently....I will catch-up as soon as I can! :)

----- Original Message -----
From: "1T3XT BVBA" <[hidden email]>
To: "Post all your questions about iText here"
<[hidden email]>
Sent: Tuesday, July 19, 2011 9:51 AM
Subject: Re: [iText-questions] How to check if a PDF is OCR recognized


> On 19/07/2011 15:47, AJ Weber wrote:
>> I think the standard message was
>> "iText doesn't do text extraction, we don't want to do it, use pdfbox or
>> jpedal to do that."
> Yes, that sounds like a literal quote from the first book.
> But that was written 5 to 6 years ago. Time flies ;-)
>
> ------------------------------------------------------------------------------
> Magic Quadrant for Content-Aware Data Loss Prevention
> Research study explores the data loss prevention market. Includes in-depth
> analysis on the changes within the DLP market, and the criteria used to
> evaluate the strengths and weaknesses of these DLP solutions.
> http://www.accelacomm.com/jaw/sfnl/114/51385063/
> _______________________________________________
> iText-questions mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> iText(R) is a registered trademark of 1T3XT BVBA.
> Many questions posted to this list can (and will) be answered with a
> reference to the iText book: http://www.itextpdf.com/book/
> Please check the keywords list before you ask for examples:
> http://itextpdf.com/themes/keywords.php
>


------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to check if a PDF is OCR recognized

Bernhard
In reply to this post by Bernhard
To all of your - thank you so much.
Your solutions seem to help me :-)

Best regards
bernhard
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

PDF to PDF/A (b)

AJ Weber
In reply to this post by iText mailing list
I have taken note of the FAQ inserted in the Second Ed of the iText In
Action book about this, but would like to ask and understand the
implications a little better...

If I wanted to try and automate the conversion of (various versions of)
existing PDF files to PDF/A (level B), shouldn't this be "doable" in an
automated fashion?  I can understand the limitations that "Level A"
might pose would probably need additional, manual intervention.  But
shouldn't we be able to parse an existing PDF file, strip any/all
javascript, embedded images/video, etc., and read the required fonts and
embed them into an output file to make it compliant?

I know that might take a few steps to parse the input file to a
legitimate, PDF/A output file, but it seems like there's no roadblock to
doing so with the toolset contained in iText.  Just a bit of work.

What am I missing?  What is there that iText "can't do" with regard to
this objective?

Thanks in advance,
AJ


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PDF to PDF/A (b)

Leonard Rosenthol-3
Is it doable - yes.

Is it doable with iText, "out of the box" - no.  iText doesn't have a lot
of the necessary machinery built it.

Could you build a tool to do the conversion using iText - pretty much.
You'd need to learn a LOT about PDF, PDF/A and relevant associated things
(like fonts, color management, etc.)

So yes, you could certainly do this - just going to require a lot of work.
 (and be aware, it's usually the little things that will get you, such as
CIDToGIDMaps and not the big ones).

Leonard

On 10/12/11 2:10 PM, "AJ Weber" <[hidden email]> wrote:

>I have taken note of the FAQ inserted in the Second Ed of the iText In
>Action book about this, but would like to ask and understand the
>implications a little better...
>
>If I wanted to try and automate the conversion of (various versions of)
>existing PDF files to PDF/A (level B), shouldn't this be "doable" in an
>automated fashion?  I can understand the limitations that "Level A"
>might pose would probably need additional, manual intervention.  But
>shouldn't we be able to parse an existing PDF file, strip any/all
>javascript, embedded images/video, etc., and read the required fonts and
>embed them into an output file to make it compliant?
>
>I know that might take a few steps to parse the input file to a
>legitimate, PDF/A output file, but it seems like there's no roadblock to
>doing so with the toolset contained in iText.  Just a bit of work.
>
>What am I missing?  What is there that iText "can't do" with regard to
>this objective?
>
>Thanks in advance,
>AJ
>
>
>--------------------------------------------------------------------------
>----
>All the data continuously generated in your IT infrastructure contains a
>definitive record of customers, application performance, security
>threats, fraudulent activity and more. Splunk takes this data and makes
>sense of it. Business sense. IT sense. Common sense.
>http://p.sf.net/sfu/splunk-d2d-oct
>_______________________________________________
>iText-questions mailing list
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/itext-questions
>
>iText(R) is a registered trademark of 1T3XT BVBA.
>Many questions posted to this list can (and will) be answered with a
>reference to the iText book: http://www.itextpdf.com/book/
>Please check the keywords list before you ask for examples:
>http://itextpdf.com/themes/keywords.php


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PDF to PDF/A (b)

AJ Weber
I suppose there isn't a "PDF Specification for Dummies" book written yet? ;)

Appreciate the detailed reply.

-AJ


On 10/12/2011 2:41 PM, Leonard Rosenthol wrote:

> Is it doable - yes.
>
> Is it doable with iText, "out of the box" - no.  iText doesn't have a lot
> of the necessary machinery built it.
>
> Could you build a tool to do the conversion using iText - pretty much.
> You'd need to learn a LOT about PDF, PDF/A and relevant associated things
> (like fonts, color management, etc.)
>
> So yes, you could certainly do this - just going to require a lot of work.
>   (and be aware, it's usually the little things that will get you, such as
> CIDToGIDMaps and not the big ones).
>
> Leonard
>
> On 10/12/11 2:10 PM, "AJ Weber"<[hidden email]>  wrote:
>
>> I have taken note of the FAQ inserted in the Second Ed of the iText In
>> Action book about this, but would like to ask and understand the
>> implications a little better...
>>
>> If I wanted to try and automate the conversion of (various versions of)
>> existing PDF files to PDF/A (level B), shouldn't this be "doable" in an
>> automated fashion?  I can understand the limitations that "Level A"
>> might pose would probably need additional, manual intervention.  But
>> shouldn't we be able to parse an existing PDF file, strip any/all
>> javascript, embedded images/video, etc., and read the required fonts and
>> embed them into an output file to make it compliant?
>>
>> I know that might take a few steps to parse the input file to a
>> legitimate, PDF/A output file, but it seems like there's no roadblock to
>> doing so with the toolset contained in iText.  Just a bit of work.
>>
>> What am I missing?  What is there that iText "can't do" with regard to
>> this objective?
>>
>> Thanks in advance,
>> AJ
>>
>>
>> --------------------------------------------------------------------------
>> ----
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2d-oct
>> _______________________________________________
>> iText-questions mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/itext-questions
>>
>> iText(R) is a registered trademark of 1T3XT BVBA.
>> Many questions posted to this list can (and will) be answered with a
>> reference to the iText book: http://www.itextpdf.com/book/
>> Please check the keywords list before you ask for examples:
>> http://itextpdf.com/themes/keywords.php
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct
> _______________________________________________
> iText-questions mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> iText(R) is a registered trademark of 1T3XT BVBA.
> Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
> Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PDF to PDF/A (b)

Leonard Rosenthol-3
Nope - sorry :)

On 10/12/11 3:00 PM, "AJ Weber" <[hidden email]> wrote:

>I suppose there isn't a "PDF Specification for Dummies" book written yet?
>;)
>
>Appreciate the detailed reply.
>
>-AJ
>
>
>On 10/12/2011 2:41 PM, Leonard Rosenthol wrote:
>> Is it doable - yes.
>>
>> Is it doable with iText, "out of the box" - no.  iText doesn't have a
>>lot
>> of the necessary machinery built it.
>>
>> Could you build a tool to do the conversion using iText - pretty much.
>> You'd need to learn a LOT about PDF, PDF/A and relevant associated
>>things
>> (like fonts, color management, etc.)
>>
>> So yes, you could certainly do this - just going to require a lot of
>>work.
>>   (and be aware, it's usually the little things that will get you, such
>>as
>> CIDToGIDMaps and not the big ones).
>>
>> Leonard
>>
>> On 10/12/11 2:10 PM, "AJ Weber"<[hidden email]>  wrote:
>>
>>> I have taken note of the FAQ inserted in the Second Ed of the iText In
>>> Action book about this, but would like to ask and understand the
>>> implications a little better...
>>>
>>> If I wanted to try and automate the conversion of (various versions of)
>>> existing PDF files to PDF/A (level B), shouldn't this be "doable" in an
>>> automated fashion?  I can understand the limitations that "Level A"
>>> might pose would probably need additional, manual intervention.  But
>>> shouldn't we be able to parse an existing PDF file, strip any/all
>>> javascript, embedded images/video, etc., and read the required fonts
>>>and
>>> embed them into an output file to make it compliant?
>>>
>>> I know that might take a few steps to parse the input file to a
>>> legitimate, PDF/A output file, but it seems like there's no roadblock
>>>to
>>> doing so with the toolset contained in iText.  Just a bit of work.
>>>
>>> What am I missing?  What is there that iText "can't do" with regard to
>>> this objective?
>>>
>>> Thanks in advance,
>>> AJ
>>>
>>>
>>>
>>>------------------------------------------------------------------------
>>>--
>>> ----
>>> All the data continuously generated in your IT infrastructure contains
>>>a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2d-oct
>>> _______________________________________________
>>> iText-questions mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/itext-questions
>>>
>>> iText(R) is a registered trademark of 1T3XT BVBA.
>>> Many questions posted to this list can (and will) be answered with a
>>> reference to the iText book: http://www.itextpdf.com/book/
>>> Please check the keywords list before you ask for examples:
>>> http://itextpdf.com/themes/keywords.php
>>
>>
>>-------------------------------------------------------------------------
>>-----
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2d-oct
>> _______________________________________________
>> iText-questions mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/itext-questions
>>
>> iText(R) is a registered trademark of 1T3XT BVBA.
>> Many questions posted to this list can (and will) be answered with a
>>reference to the iText book: http://www.itextpdf.com/book/
>> Please check the keywords list before you ask for examples:
>>http://itextpdf.com/themes/keywords.php
>
>--------------------------------------------------------------------------
>----
>All the data continuously generated in your IT infrastructure contains a
>definitive record of customers, application performance, security
>threats, fraudulent activity and more. Splunk takes this data and makes
>sense of it. Business sense. IT sense. Common sense.
>http://p.sf.net/sfu/splunk-d2d-oct
>_______________________________________________
>iText-questions mailing list
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/itext-questions
>
>iText(R) is a registered trademark of 1T3XT BVBA.
>Many questions posted to this list can (and will) be answered with a
>reference to the iText book: http://www.itextpdf.com/book/
>Please check the keywords list before you ask for examples:
>http://itextpdf.com/themes/keywords.php


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

drawing a checkmark

Vahid
In reply to this post by AJ Weber
Hello,
I am using iTextSharp and I want to draw (move) and fit (scale) the following checkmark inside an arbitrary Rectangle with a given with & height anywhere in the page. Is it possible?
 
    class Program
    {
        static void Main(string[] args)
        {
            using (var pdfDoc = new Document(PageSize.A4))
            {
                var pdfWriter = PdfWriter.GetInstance(pdfDoc, new FileStream("Test.pdf", FileMode.Create));
                pdfDoc.Open();
 
                var cb = pdfWriter.DirectContent;
                cb.SaveState();
 
                cb.MoveTo(38.33889376f, 67.35513328f);
                cb.CurveTo(39.90689547f, 67.35509017f, 41.09296342f, 66.03921993f, 41.89711165f, 63.40748424f);
                cb.CurveTo(43.50531445f, 58.47289182f, 44.65118131f, 56.00562195f, 45.33470755f, 56.0056459f);
                cb.CurveTo(45.85735449f, 56.00562195f, 46.40013944f, 56.41682961f, 46.96305772f, 57.23928802f);
                cb.CurveTo(58.2608517f, 75.74384316f, 68.7143666f, 90.71198997f, 78.32362116f, 102.14379168f);
                cb.CurveTo(80.81631349f, 105.10443984f, 84.77658911f, 106.58480942f, 90.20445269f, 106.58489085f);
                cb.CurveTo(91.49097185f, 106.58480942f, 92.35539361f, 106.46145048f, 92.79773204f, 106.21480444f);
                cb.CurveTo(93.23991593f, 105.96799555f, 93.4610547f, 105.65958382f, 93.46113432f, 105.28956447f);
                cb.CurveTo(93.4610547f, 104.71379041f, 92.7976618f, 103.58294901f, 91.47094155f, 101.89705463f);
                cb.CurveTo(75.95141033f, 82.81670149f, 61.55772504f, 62.66726353f, 48.28984822f, 41.44869669f);
                cb.CurveTo(47.36506862f, 39.96831273f, 45.47540199f, 39.22812555f, 42.62081088f, 39.22813992f);
                cb.CurveTo(39.72597184f, 39.22812555f, 38.0172148f, 39.35149407f, 37.49457722f, 39.5982407f);
                cb.CurveTo(36.12755286f, 40.2150402f, 34.51931728f, 43.36081778f, 32.66987047f, 49.03557823f);
                cb.CurveTo(30.57914689f, 55.32711903f, 29.53378743f, 59.27475848f, 29.53381085f, 60.87852533f);
                cb.CurveTo(29.53378743f, 62.60558406f, 30.94099884f, 64.27099685f, 33.75542165f, 65.87476369f);
                cb.CurveTo(35.48425582f, 66.86164481f, 37.01207517f, 67.35509017f, 38.33889376f, 67.35513328f);
 
                cb.SetRGBColorFill(0, 0, 0);
                cb.Fill();
 
                cb.RestoreState();
            }
        }
    }

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PDF to PDF/A (b)

Balder VC
In reply to this post by Leonard Rosenthol-3
Only http://www.amazon.com/Adobe-Acrobat-6-PDF-Dummies/dp/0764537601
not what you're looking for lol

On 12/10/2011 21:06, Leonard Rosenthol wrote:
Nope - sorry :)

On 10/12/11 3:00 PM, "AJ Weber" [hidden email] wrote:

I suppose there isn't a "PDF Specification for Dummies" book written yet?
;)





------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to check if a PDF is OCR recognized

JonyGreen
This post has NOT been accepted by the mailing list yet.
In reply to this post by Bernhard
I find a free online pdf ocr to recognize and extract text to editable txt file and ms word document.
Loading...