I am looking for effective ways of storing text strings from a pdf document in a database. Are there effective pdf text string readers, or is there a way to "un" pdf the document (into text strings)?
My Product Information:
Acrobat Pro 8.1 / Windows
Offline

To 'unpdf' the document, open in Acrobat, File | SaveAs. Change the file type (dropdown) to .TXT. Save. Done.
As for some products... there is one called Text Extraction Toolkit.
A word of warning - when ever possible work with the data generator to have the output generated in a digestable format for later use. Most composition engines can also give a line data version of the document, saving you this step. I have seen a lot of output where the resulting text was fragmented due to how the composition engine created the document in the first place. The text will appear in the order it was applied to the PDF page - not in the order you might normally read it.
Offline
AcrobatUsers.com >> User Groups • News • Events • Articles • Blogs • How To • Resources • Member Log in