Archive for March, 2007

Converting PDF to Word: Understanding the Problem

Monday, March 26th, 2007

I hear some version of the following question over and over:

“Which software accurately converts PDF to Word?”

Converting PDF to Word (or other word-processing applications, HTML or whatever) is not a simple, push-button affair, as almost everyone who has ever tried it knows (thus the questions).

Even so, most people are looking for a simple, push-button way to get the contents of a PDF into a Word file.  What’s the typical experience? Documents with layouts even slightly more complex than vanilla paragraphs routinely convert into junk. End-users expect this task to be pretty easy - which explains why the tone of the typical inquiry may be characterized as “pained”.

Let\’s take a moment to understand why converting PDF to Word is so problematic.

The factors influencing the quality of conversion from PDF to Word are, in descending order of significance:

  1. The extent to which the document\’s logical structures are represented within the PDF (tagging)
  2. The complexity of the objects on the page (mathematics, charts, graphs, etc)
  3. The complexity of the document layout

Factor 1 is a property of the PDF file itself, not the software used to extract the contents to Word.  If the document is properly structured and tagged, predictable results may be had in converting to Word from Adobe Acrobat.

Beyond Factor 1, different software will guess at logical structure via analysis and assessment of the layout, fonts and objects on the page.  There is no magic bullet. The more complex the document, the lower the chance of high-quality output, no matter what software is used.

Adobe Acrobat PDF Conference, May 9-10, 2007

Friday, March 23rd, 2007

Adobe Acrobat PDF Conference logoAGI’s Adobe Acrobat PDF Conference 2007 is a premiere opportunity to get an in-depth perspective on all that PDF has to offer.

Come join former US Vice-President Al Gore, executives from Google, Walt Disney and Adobe, lots and lots of PDF experts and (yes) yours truly, in Orlando, Florida, May 9-10.

Industry professionals will discuss a wide variety of topics in PDF technology and utilization. Some may even advocate paperless PDF files as one significant way to help control global warming!

In addition to joining the ever-popular PDF Power Panel, I will present on Accessible PDF:  From Section 508 to a PDA.

For more information, including how to register, visit www.pdf2007.com

For those who need to take a (little) break from the grind, I’m planning to go diving in central Florida (maybe the springs, maybe the coast) for two days just prior to the conference.  Who will join me!

Are we Connect-ing?

Thursday, March 22nd, 2007

Much touted in the new Acrobat release is “Acrobat Connect“, formerly Macromedia’s Breeze.

By now, I’ve participated in several Connect “sessions” as both presenter and presentee, so I thought I’d offer a few observations.

Connect hasn’t really got anything to do with Acrobat, and I’m really unsure why Connect occupies a prominent place in Acrobat 8.0 at all. Connect is something like WebEx, with some clever interactive tweaks. The polling, chat and status facilities are good, but it’s not a trivial affair, and it’s not really about managing or using documents. It might someday benefit from being pressed into service in the “Acrobat family of products”, but that day isn’t here yet.

In the present incarnation, Connect can’t actually use PDF files except in a desktop-sharing (ie, bandwidth-intensive) mode. This simply serves to highlight the lack of any real connection between Acrobat and Connect, even though Connect is available “with” Acrobat, even allegedly “integrated” into it.

While I have certainly experienced a number of connection issues (especially with the VOIP), I understand this is not the norm. Regardless of my circumstances, in today’s world, one can’t really expect that all users have big or stable pipes, be optimized for VOIP, or have adequate speakers or microphones for their environment. Laptops suffering wireless interference is increasingly common. Adobe recommends that presenters and viewers shut down their other chat, email and other applications to allow Connect to hog the bandwidth, and further, that you shut down the Presenter’s video uplink as well. At what point wouldn’t you rather make a YouTube movie or email a PowerPoint?

Even with the current generation of the software, a carefully planned meeting using the polling, chat and other features, and POTS (Ma Bell) for audio can work well, even for remote users. Connect has real potential to be a useful conferencing system for experienced users enjoying 1st class connectivity. As presently constituted, new and infrequent users are going to stumble and fall to a degree that will deliver poor impressions when they count the most.

While Connect sessions may be easily recorded by the Presenter, disclosure of this fact should be made clear to the end-user, visually and otherwise. The Presenter should not be encumbered with the responsibility of reminding each and every attendee that “this session is being recorded”, especially if the session is interactive.

At least some who check out the Connect “offering” through Acrobat come away confused and/or a tad miffed. The general opinion seems to be that Adobe doesn’t make it clear that Connect is not actually a new feature of Acrobat, but a new service, with it’s own (again, non-trivial) fee structure.

Lastly, my accessibility creds force me to point out that there’s nothing remotely accessible about Connect - it’s a free-flowing Flash interface, and screen-readers aren’t welcome here. This will, in the long term, have to be addressed if this technology is to have a big future in government.

All that aside, I like Connect - right down to the nervous anticipation that comes from wondering if it will all fall apart midflight, or that I’ll lose the thread, or go blind from squinting at the non-resizable copy in the UI. I just wonder if it’s properly co-located with Acrobat.