Acrobat User Community Forums

You are not logged in.     Log in to your AUC account.     Don't have an account? Sign up today

#1 2007-09-18 16:05:06

mh53
Member
Registered: 2007-09-18
Posts: 0

Text strings

I am looking for effective ways of storing text strings from a pdf document in a database.  Are there effective pdf text string readers, or is there a way to "un" pdf the document (into text strings)?


My Product Information:
Acrobat Pro 8.1 / Windows

Offline

 

#2 2007-09-25 10:10:01

dthanna
Member

Registered: 2007-04-25
Posts: 43

Re: Text strings

To 'unpdf' the document, open in Acrobat, File | SaveAs.  Change the file type (dropdown) to .TXT.  Save. Done.

As for some products... there is one called Text Extraction Toolkit.

A word of warning - when ever possible work with the data generator to have the output generated in a digestable format for later use.  Most composition engines can also give a line data version of the document, saving you this step.  I have seen a lot of output where the resulting text was fragmented due to how the composition engine created the document in the first place.  The text will appear in the order it was applied to the PDF page - not in the order you might normally read it.

Offline

 

Board footer

Powered by PunBB
© Copyright 2002–2005 Rickard Andersson

AcrobatUsers.com  >>  User Groups • News • Events • Articles • Blogs • How To • Resources • Member Log in