Quantcast

Replacing and removing images

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Replacing and removing images

A Cheung
Hello,
I have spent quite a while investigating the automated removal/replacement of images in a PDF. (ex: 100 page pdf, 1 image per page needs replacing, but each page also has text which I want to keep)
 

Find the XObject, KillIindirect() it, remove it from the XObject Dictionary.
But when you open this file in Acrobat it says there are still references to it.
I don't know how to cleanly remove them from the Content stream (I didn't add them so they aren't Marked like that old message suggests).  Conceptually, it (Im0 in this case) is somewhere in one of the Arrays of the Page's Contents like "q 612.2400055 0 0 792 0 0 cm /Im0 Do Q"  I don't know what is safe to remove, how to remove it or what I'm looking for, besides the /Im0 in this one case.
 
Strategy 2: Replace it,
 
 
But the original image was a 2 bit (black/white = CCITTFAXDECODE) and the new image is a 256 color tiff (FLATEDECODE) (or maybe some other # of colors, or some DCTDECODE filter type) and this seems to need a custom Colorspace that I don't know how to create. 
PdfContentByte p = stamper.getOverContent(1); p.addImage(Image.getInstance("my.tiff"), width,0,0,height,0,0) will create the image object with this Colorspace but (a) I'm stuck wth the original that I can't remove (and setting the original's width/height to 0 and its setData to an empty byte[] causes Acrobat to complain when it is opened)  and (b) my images never seem to be the right size (due to DPI I assume, my images aren't 72dpi, but even with img.setDpi(300,300) they don't show up right but I think I can eventually get this part right)
Is there a way to use this Image.getinstance object (and its colorspace/+other fields) as a template to modify the Xobject of the original image that I want to get rid of?
 
I hope I'm making sense. So much time on this one problem have given me tunnel vision and I might be leaving some things out.
 
Is there some newer process that makes this easy?  Snippets of existing code?
 

------------------------------------------------------------------------------

_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Replacing and removing images

iText mailing list
A Cheung wrote:
> Hello,
> I have spent quite a while investigating the
> automated removal/replacement of images in a PDF. (ex: 100 page pdf, 1
> image per page needs replacing, but each page also has text which I want
> to keep)
>  
> Strategy 1:
> Find the XObject, KillIindirect() it, remove it from the XObject Dictionary.

Won't work.

You can't remove an image stream if it's referred to, for instance
from a Resources dictionary of a page.

You can't remove an object /Im0 from the Resources dictionary,
if you don't remove the /Im0 Do from the content stream.

This is way to complicated for what you need.

> Strategy 2: Replace it,
>  
> Similar to http://1t3xt.info/examples/browse/?page=example&id=421 
> <http://1t3xt.info/examples/browse/?page=example&id=421>

This is a better strategy, but the example only works for JPEG images.

> Is there some newer process that makes this easy?  Snippets of existing
> code?

This is an update of the example you refer to:
http://itextpdf.com/examples/index.php?page=example&id=286

Instead of using the old image to "resize" the image, you need
to create the BufferedImage and use drawString() instead of
drawRenderedImage(). You may also want to create another type
of image, for instance a PNG instead of a JPEG, in which case
you'll need to use a different Filter and stuff.

Retrieving the text from the original image isn't possible.
--
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info

------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Replacing and removing images

A Cheung


On Tue, Apr 27, 2010 at 8:59 AM, 1T3XT info <[hidden email]> wrote:
A Cheung wrote:
> Hello,
> I have spent quite a while investigating the
> automated removal/replacement of images in a PDF. (ex: 100 page pdf, 1
> image per page needs replacing, but each page also has text which I want
> to keep)
>
> Strategy 1:
> Find the XObject, KillIindirect() it, remove it from the XObject Dictionary.

Won't work.

You can't remove an image stream if it's referred to, for instance
from a Resources dictionary of a page.

You can't remove an object /Im0 from the Resources dictionary,
if you don't remove the /Im0 Do from the content stream.

This is way to complicated for what you need.
 
Complicated yes, but is it theoretically possible with iText?  Can it modify these streams and save the modifications?  
GUI Editors/Applications that do allow the removal of images must be doing it somehow.
This is a better strategy, but the example only works for JPEG images.

> Is there some newer process that makes this easy?  Snippets of existing
> code?

This is an update of the example you refer to:
http://itextpdf.com/examples/index.php?page=example&id=286

Instead of using the old image to "resize" the image, you need
to create the BufferedImage and use drawString() instead of
drawRenderedImage(). You may also want to create another type
of image, for instance a PNG instead of a JPEG, in which case
you'll need to use a different Filter and stuff.

Retrieving the text from the original image isn't possible.
Thanks, but my question is then about Colorspaces. I am trying to keep the PDF small so making images of 16M colours makes it too large.
 
PdfContentByte p = stamper.getOverContent(1);
p.addImage(Image.getInstance("my.tiff"), width,0,0,height,0,0)
 
The above code adds an image (Xi0) to my file with the Colorspace that is correct.  (It isn't DeviceGray or DeviceRGB). I use the Enfocus Browser to see the PDF object tree structure and this new "Xi0" image has a ColorSpace array[4] and item [3] is the stream data for the colorspace of 256 specific colors in use.  What I'd like to do is have iText create this Colorspace structure for me based on an image file (along with Filter/Width/BitsPerComponent) so I can replace my original Im0 structure with that data (as in the sample at your URL), but _without_ having to add an Xi0 to the PDF to do it since that just leaves me with another image that I'd want to remove.
 
Possible?
 

------------------------------------------------------------------------------

_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Replacing and removing images

iText mailing list
A Cheung wrote:
> Complicated yes, but is it theoretically possible with iText?  Can it
> modify these streams and save the modifications?

Yes, it's possible, but it's a lot of work.

> Thanks, but my question is then about Colorspaces. I am trying to keep
> the PDF small so making images of 16M colours makes it too large.

I didn't say you had to use an image if 16M colors.
You can easily use a black and white image.
You probably want to use CCITT.

> PdfContentByte p = stamper.getOverContent(1);
> p.addImage(Image.getInstance("my.tiff"), width,0,0,height,0,0)
>  
> The above code adds an image (Xi0) to my file with the Colorspace that
> is correct.  (It isn't DeviceGray or DeviceRGB). I use the Enfocus
> Browser to see the PDF object tree structure and this new "Xi0" image
> has a ColorSpace array[4] and item [3] is the stream data for the
> colorspace of 256 specific colors in use.

And Indexed Colorspace is fine too.

>  What I'd like to do is have
> iText create this Colorspace structure for me based on an image
> file (along with Filter/Width/BitsPerComponent) so I can replace my
> original Im0 structure with that data (as in the sample at your URL),
> but _without_ having to add an Xi0 to the PDF to do it since that just
> leaves me with another image that I'd want to remove.
>  
> Possible?

Yes.
--
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info

------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Replacing and removing images

distes
This post was updated on .
In reply to this post by A Cheung
I know this is an old post but I wanted to post the way I got this to work. No one posting was very helpful.

This is my code using ItextSharp. I hope it's helpful to someone. I needed to move an existing barcode on the page. Instead what you do is copy the barcode, delete the existing barcode and place a copy back on the page at an absolute position. This should in theory at least work for anything. The tricky part will be locating the element.

public class BarcodeMover
    {
        public void MoveDuploBarcode(string inputfile, string outputfile)
        {
            using (FileStream outputstreampdf = new FileStream(outputfile, FileMode.Create))
            {
                using (PdfReader inputstreampdf = new PdfReader(inputfile))
                {
                    PdfStamper pdfstamper = new PdfStamper(inputstreampdf, outputstreampdf);

                    for (int pagenum = 1; pagenum <= inputstreampdf.NumberOfPages; pagenum++)
                    {
                        if (IsEven(pagenum))
                        {
                            using (MemoryStream pagememorystream = new MemoryStream())
                            {
                                PdfDictionary pagexobjects = GetAllXObjectsDictionaryFromPage(inputstreampdf, pagenum);
                                PdfContentParser pagecontentparser = GetContentParserForPage(inputstreampdf, pagenum);

                                PdfName barcodestreamobject = null;

                                while (true)
                                {
                                    List<PdfObject> currentstreamobjects = GetNextSectionOfContent(pagecontentparser);
                                    if (currentstreamobjects.Count == 0)
                                    {
                                        break;
                                    }

                                    bool ismatrixbarcode = DoesStreamContainMatrixBarcode(currentstreamobjects, pagexobjects);

                                    if (ismatrixbarcode)
                                    {
                                        barcodestreamobject = (PdfName)currentstreamobjects.First();
                                    }
                                    else
                                    {
                                        WriteToMemoryStream(currentstreamobjects, pagememorystream);
                                    }
                                }

                                if (barcodestreamobject != null)
                                {
                                    PdfObject barcodeobject = pagexobjects.Get((PdfName)barcodestreamobject);
                                    PdfDictionary xobjectdictionary = (PdfDictionary)PdfReader.GetPdfObject(barcodeobject);

                                    int xrefIdx = ((PRIndirectReference)barcodeobject).Number;
                                    PdfObject pdfObj = inputstreampdf.GetPdfObject(xrefIdx);
                                    PdfStream streamobject = (PdfStream)pdfObj;
                                    byte[] imagestream = PdfReader.GetStreamBytesRaw((PRStream)streamobject);

                                    PdfReader.KillIndirect(barcodeobject);

                                    ImgCCITT timg = BuildNewImage(xobjectdictionary, imagestream, inputstreampdf, pagenum);

                                    PlaceNewImageOnPage(pdfstamper, pagenum, pagememorystream, timg);

                                    barcodestreamobject = null;
                                }
                            }
                        }
                    }

                    pdfstamper.Close();
                }
            }
        }

        private List<PdfObject> GetNextSectionOfContent(PdfContentParser pagecontentparser)
        {
            return pagecontentparser.Parse(null);
        }

        private bool IsEven(int pagenum)
        {
            return pagenum % 2 == 0;
        }

        private void WriteToMemoryStream(List<PdfObject> pagecontentobjects, MemoryStream memoryStream)
        {
            foreach (PdfObject o in pagecontentobjects)
            {
                o.ToPdf(null, memoryStream);
                memoryStream.WriteByte((byte)'\n');
            }
        }

        private PdfDictionary GetAllXObjectsDictionaryFromPage(PdfReader pdfreader, int pagenum)
        {
            PdfDictionary pagedictionary = pdfreader.GetPageN(pagenum);
            PdfDictionary pageresources = (PdfDictionary)PdfReader.GetPdfObject(pagedictionary.Get(PdfName.RESOURCES));
            return (PdfDictionary)PdfReader.GetPdfObject(pageresources.Get(PdfName.XOBJECT));
        }

        private PdfContentParser GetContentParserForPage(PdfReader pdfReader, int pagenum)
        {
            byte[] pagecontentstream = pdfReader.GetPageContent(pagenum);
            return new PdfContentParser(new PRTokeniser(new RandomAccessFileOrArray(pagecontentstream)));
        }

        private void PlaceNewImageOnPage(PdfStamper pdfStamper, int i, MemoryStream memoryStream, ImgCCITT timg)
        {
            pdfStamper.Reader.SetPageContent(i, memoryStream.GetBuffer());
            pdfStamper.GetOverContent(i).AddImage(timg);
        }

        private ImgCCITT BuildNewImage(PdfDictionary tg, byte[] bytes, PdfReader pdfReader, int i)
        {
            double width = Convert.ToInt32(tg.Get(PdfName.WIDTH).ToString());
            double height = Convert.ToInt32(tg.Get(PdfName.HEIGHT).ToString());

            ImgCCITT timg = new ImgCCITT((int)width, (int)height, false, ImgCCITT.CCITTG4, ImgCCITT.CCITT_ENDOFBLOCK, bytes);
            timg.ScaleToFit(24, 24);
            timg.SetAbsolutePosition(0, pdfReader.GetPageSize(i).Top - 140);

            return timg;
        }

        private bool DoesStreamContainMatrixBarcode(List<PdfObject> contentobjects, PdfDictionary pagexobjects)
        {
            if ("Do".Equals(contentobjects.Last().ToString()) && contentobjects.First().ToString().Contains("img"))
            {
                PdfObject possibleobject = pagexobjects.Get((PdfName)contentobjects.First());
                if (possibleobject.IsIndirect())
                {
                    if (possibleobject != null)
                    {
                        PdfDictionary xobjectdictionary = (PdfDictionary)PdfReader.GetPdfObject(possibleobject);
                        PdfName type = (PdfName)PdfReader.GetPdfObject(xobjectdictionary.Get(PdfName.SUBTYPE));
                        if (PdfName.IMAGE.Equals(type))
                        {
                            if (xobjectdictionary.Get(PdfName.FILTER).ToString() == "/CCITTFaxDecode")
                            {
                                return true;
                            }
                        }
                    }
                }
            }

            return false;
        }
    }
Loading...