PDF to HTML Conversions
Friday, March 30th, 2007Adobe Acrobat has never been good at converting PDF to HTML and this is still the case in version 8 of Acrobat. It’s like HTML conversion to PDF was something the engineering teams stopped working on in Acrobat 4 and we haven’t seen any improvements in this area.
I’ve been curious about how conversion to HTML from some of the CS3 applications is handled and what kinds of workarounds I might find using the newest releases of the Creative Suite applications and Acrobat 8. Perhaps we might be able to use the new CS3 applications to convert layouts to HTML —something we can do before creating a PDF file or maybe there are some workarounds using Acrobat 8.
After all, Adobe acquired the king of screen display developers when Macromedia was acquired, so it’s reasonable to think that the integration of Adobe Acrobat, Adobe InDesign, Adobe Illustrator, and Adobe Dreamweaver would marvel us on file format conversions.
To satisfy my curiosity I first took a look at an export direct from Adobe InDesign to HTML. InDesign has had this feature built into the program for several generations. Now, in InDesign CS3 (version 5 of InDesign) you select File > Cross-media Export > XHTML /Dreamweaver to convert your InDesign document to an HTML file. Wow! I thought. This has got to be a great feature —look at these fancy new menu commands!
To test conversions to HTML directly from InDesign CS3 I used a one-page layout shown in Figure 1. I knew the vertical type would be a problem, but I was interested in seeing how InDesign would handle this and the transparent background images and columnar text.
![]()
Figure 1 – Click for larger image.
After exporting the InDesign file to XHTML, I opened the document in Dreamweaver. As you can see in Figure 2 when I previewed the Dreamweaver file in a Web browser, the InDesign CS3 conversion was a big disappointment.
![]()
Figure 2 – Click for larger image.
Export to HTML from Acrobat
Acrobat 8 has a tool on the Tasks toolbar specifically to take advantage of the improved export features in Acrobat. Word exports received a lot of attention during the development of Acrobat 8 and Adobe claims Acrobat has a much better means for exporting text to Word. From the Export Task button pull-down menu you can also choose to export PDF documents to HTML.
I decided to take my InDesign file and convert it to a PDF document, then make my conversion from within Acrobat. From the Export Task button menu I exported the PDF file using the HTML Web Page menu command. Oddly enough, Acrobat exported the entire page as an image. The image integrity was preserved as far as the layout appearance goes, but the text appeared bitmapped (see Figure 3) and wasn’t editable. The entire page was one single image.
![]()
Figure 3 – Click for larger image.
That vertical text along the left side was preserved since the conversion made a bitmap image out of the page. This conversion is really no better than hosting the original PDF on a Web site. As a matter of fact, the original PDF looks much better and you can search the text —something you can’t do with the poor HTML conversion from Acrobat.
Third party applications and plug-ins
Acrobat wasn’t going to do the job, so I thought I’d take a look at one of the many third party products that support PDF to HTML conversions. My first stop was to download the newest version of Gemini from Iceni Technology (www.iceni.com/gemini.htm) for the Mac. This product has been a long time third party application for converting PDFs to HTML and it’s available for both Mac and Windows users.
I installed Gemini on my Mac and opened the same PDF document I used for the other conversions. The results were about as impressive as my conversion from InDesign as you can see in Figure 4. This may not be a fair assessment of the Gemini product since it does much more than convert PDF to HTML. But I was looking for a quick, no nonsense HTML conversion tool, and for that particular task, Gemini wasn’t my answer.
![]()
Figure 4 – Click for larger image.
Still in search of something fast and efficient for converting my PDF documents to HTML, I stumbled on to Smart PDF Converter from SmartSoft (www.smartpdfconverter.com). I ran the same file through this conversion utility and with much better success than all the other options I explored. My file was converted to HTML with the text and graphics in tact and editable as you can see in Figure 5.
![]()
Figure 5 – Click for larger image.
Note that currently Smart PDF Converter is a Windows only application.
The Bottom Line
In my tests I found conversion to HTML Web Pages very poor from InDesign and Acrobat. I tested several files and they all were much less usable than copying and pasting text in Dreamweaver.
After converting to PDF I tried several things in Acrobat. I optimized the PDFs; exported in both HTML supported formats (HTML v3.2 and HTML v4.01), and exported several PDF documents. The results were all the same —very poor conversion and when text is recognized, the document integrity is lost.
As has been the case with earlier versions of Acrobat, the best conversions are made by third party products —either Acrobat Plug-ins or stand alone applications. I tested just two products that are mentioned in this blog post.
My tests demonstrated that the Smart PDF Converter from SmartSoft produced the best results for PDF to HTML conversions. You can open the converted files directly in Dreamweaver and make edits to the HTML. Previewing the converted documents in Web browsers displays an HTML page very similar to the look you see in Acrobat when viewing the PDF.
There are a number of different conversion tools for converting PDFs to HTML. I spent a short time looking at very quick methods for converting PDFs to HTML using two different third party applications. I know there are more tools around. I’d be interested in hearing from others who have found workarounds and/or tools that work well with PDF conversion to HTML.
Post your comments here if you’ve found a workflow that produces good results when you want your PDFs to be displayed as Web pages.
ted
PS
For some enlightenment on one button conversions to MS Word and the problems associated with simple exports from Acrobat, see Duff Johnson’s Blog





