Best way to alter text without losing formatting?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Best way to alter text without losing formatting?

Brodster1281
Hi all,

I'm using iText 5.5.9 since it seems to have the most information online on how to use, but I'm trying to figure out the best way to change text without losing the text's formatting. Its for a pretty basic puzzle project where every letter is the opposite (a is z, b is y, etc). I'm able to extract the text from the pdf as a string then create a new string with the opposite letters, but since its a string, i lose which letters are bolded, underlined, etc. Right now my program is creating a new pdf where it writes the string, but is there anyway to simply edit the text content on a pdf without losing everything else (bold text, underlined text, pictures even)?  Also, its all the same to me if the program creates a new pdf with the data or changes the preexisting pdf

Thanks for any help, its much appreciated
mkl
Reply | Threaded
Open this post in threaded view
|

Re: Best way to alter text without losing formatting?

mkl
Brodster1281 wrote
I'm using iText 5.5.9 since it seems to have the most information online on how to use, but I'm trying to figure out the best way to change text without losing the text's formatting. Its for a pretty basic puzzle project where every letter is the opposite (a is z, b is y, etc). I'm able to extract the text from the pdf as a string then create a new string with the opposite letters, but since its a string, i lose which letters are bolded, underlined, etc.
First of all, public iText support has been moved to stackoverflow.com.

That been said, "only changing the text without losing its formatting" is one of the most complicated things in PDF processing, your "pretty basic puzzle project where every letter is the opposite (a is z, b is y, etc)" for generic documents is no exception from that rule.

You have to be aware of the facts that

* fonts of the text may be only partially embedded; thus, if you want to replace 'a' by 'z', you might find that the font in question does not have a 'z';
* fonts may not contain the information required to recognize which glyph represents which character; thus, you'll probably not even recognize the glyph been drawn with the form of an 'a';
* your replacement of the original text may be smaller or wider than the original which may result in gaps or overlapping text;
* some of "text attributes" you want to keep aren't text attributes in PDF; e.g. underlined text in PDFs is text and a line which by choice of coordinates for their respective drawing operations happen to be positioned one above the other; differentiating a line underlining text from a line dividing text blocks can be impossible;
* ...

Your project might become implementable if you restrict the PDFs it shall function for, e.g. to use only monospaced fonts with the full alphabet present using a single standard encoding.

Regards,

Michael