Archive for March, 2007

PDF to HTML Conversions

Friday, March 30th, 2007

Adobe Acrobat has never been good at converting PDF to HTML and this is still the case in version 8 of Acrobat. It’s like HTML conversion to PDF was something the engineering teams stopped working on in Acrobat 4 and we haven’t seen any improvements in this area.

I’ve been curious about how conversion to HTML from some of the CS3 applications is handled and what kinds of workarounds I might find using the newest releases of the Creative Suite applications and Acrobat 8. Perhaps we might be able to use the new CS3 applications to convert layouts to HTML —something we can do before creating a PDF file or maybe there are some workarounds using Acrobat 8.

After all, Adobe acquired the king of screen display developers when Macromedia was acquired, so it’s reasonable to think that the integration of Adobe Acrobat, Adobe InDesign, Adobe Illustrator, and Adobe Dreamweaver would marvel us on file format conversions.

To satisfy my curiosity I first took a look at an export direct from Adobe InDesign to HTML. InDesign has had this feature built into the program for several generations. Now, in InDesign CS3 (version 5 of InDesign) you select File > Cross-media Export > XHTML /Dreamweaver to convert your InDesign document to an HTML file. Wow! I thought. This has got to be a great feature —look at these fancy new menu commands!

To test conversions to HTML directly from InDesign CS3 I used a one-page layout shown in Figure 1. I knew the vertical type would be a problem, but I was interested in seeing how InDesign would handle this and the transparent background images and columnar text.

figure1.jpg
Figure 1 – Click for larger image.

After exporting the InDesign file to XHTML, I opened the document in Dreamweaver. As you can see in Figure 2 when I previewed the Dreamweaver file in a Web browser, the InDesign CS3 conversion was a big disappointment.

figure2.jpg
Figure 2 – Click for larger image.

Export to HTML from Acrobat

Acrobat 8 has a tool on the Tasks toolbar specifically to take advantage of the improved export features in Acrobat. Word exports received a lot of attention during the development of Acrobat 8 and Adobe claims Acrobat has a much better means for exporting text to Word. From the Export Task button pull-down menu you can also choose to export PDF documents to HTML.

I decided to take my InDesign file and convert it to a PDF document, then make my conversion from within Acrobat. From the Export Task button menu I exported the PDF file using the HTML Web Page menu command. Oddly enough, Acrobat exported the entire page as an image. The image integrity was preserved as far as the layout appearance goes, but the text appeared bitmapped (see Figure 3) and wasn’t editable. The entire page was one single image.

figure3.jpg
Figure 3 – Click for larger image.

That vertical text along the left side was preserved since the conversion made a bitmap image out of the page. This conversion is really no better than hosting the original PDF on a Web site. As a matter of fact, the original PDF looks much better and you can search the text —something you can’t do with the poor HTML conversion from Acrobat.

Third party applications and plug-ins

Acrobat wasn’t going to do the job, so I thought I’d take a look at one of the many third party products that support PDF to HTML conversions. My first stop was to download the newest version of Gemini from Iceni Technology (www.iceni.com/gemini.htm) for the Mac. This product has been a long time third party application for converting PDFs to HTML and it’s available for both Mac and Windows users.

I installed Gemini on my Mac and opened the same PDF document I used for the other conversions. The results were about as impressive as my conversion from InDesign as you can see in Figure 4. This may not be a fair assessment of the Gemini product since it does much more than convert PDF to HTML. But I was looking for a quick, no nonsense HTML conversion tool, and for that particular task, Gemini wasn’t my answer.

figure4.jpg
Figure 4 – Click for larger image.

Still in search of something fast and efficient for converting my PDF documents to HTML, I stumbled on to Smart PDF Converter from SmartSoft (www.smartpdfconverter.com). I ran the same file through this conversion utility and with much better success than all the other options I explored. My file was converted to HTML with the text and graphics in tact and editable as you can see in Figure 5.

figure5.jpg
Figure 5 – Click for larger image.

Note that currently Smart PDF Converter is a Windows only application.

The Bottom Line

In my tests I found conversion to HTML Web Pages very poor from InDesign and Acrobat. I tested several files and they all were much less usable than copying and pasting text in Dreamweaver.

After converting to PDF I tried several things in Acrobat. I optimized the PDFs; exported in both HTML supported formats (HTML v3.2 and HTML v4.01), and exported several PDF documents. The results were all the same —very poor conversion and when text is recognized, the document integrity is lost.

As has been the case with earlier versions of Acrobat, the best conversions are made by third party products —either Acrobat Plug-ins or stand alone applications. I tested just two products that are mentioned in this blog post.

My tests demonstrated that the Smart PDF Converter from SmartSoft produced the best results for PDF to HTML conversions. You can open the converted files directly in Dreamweaver and make edits to the HTML. Previewing the converted documents in Web browsers displays an HTML page very similar to the look you see in Acrobat when viewing the PDF.

There are a number of different conversion tools for converting PDFs to HTML. I spent a short time looking at very quick methods for converting PDFs to HTML using two different third party applications. I know there are more tools around. I’d be interested in hearing from others who have found workarounds and/or tools that work well with PDF conversion to HTML.

Post your comments here if you’ve found a workflow that produces good results when you want your PDFs to be displayed as Web pages.

ted

PS
For some enlightenment on one button conversions to MS Word and the problems associated with simple exports from Acrobat, see Duff Johnson’s Blog

Changing Font Colors in Comment Text Boxes

Thursday, March 15th, 2007

After hanging around the Ask the Expert forum for the past few months, I’ve noticed, among some other common recurring questions, one question that keeps popping up: how do I change a font, font color, font size, etc. in a Text box? Although the questions aren’t always clear about what text box a user is talking about, we generally nail it down to a comment tool.

In Figure 1 you can see a Text Box Comment added to a PDF page. Unfortunately, the original installation of Acrobat 8 has a similar type of default appearance. The Box has a screened background and the text color is red. How do I change that red color to black are the cries of the Acrobat 8 users.

fig01.jpg
Figure 1

Logic immediately points us to a Properties dialog box like we might use for many other Acrobat options settings. However, when you open the Properties dialog box for just about any type of command or markup you see a dialog box similar to Figure 2.

fig02.jpg
Figure 2

Notice in Figure 2 we can take care of the background fill by clicking on the Fill Color swatch to open the color swatches. Click on No Color or White to change the background fill to a transparent color or a white background. On the left you also see a color swatch for the border. Likewise you can select No Color or another color if you want a keyline border around the text box.

What you don’t have however, are options choices for changing font attributes. This is the problem that’s driving a good many users crazy. The settings aren’t there in the Properties dialog box. So where are they?

Dismiss the Properties dialog box and open the Properties Bar. To access the Properties Bar open a context menu on the Toolbar Well and select Properties Bar. When you click inside a Text Box comment so a blinking cursor appears, or you highlight text as you see in Figure 3, all the type attributes appear in the Properties Bar.

fig03.jpg
Figure 3

In Figure 3, I have the Font Color menu open to change the font color. You also have options for changing alignment, font sizes, font styles, and super/subscripts and paragraph formatting.

Whenever you engage in any type of commenting and review session or just simply use some comment tools to mark up a document, remember to always open that Properties Bar. Unlike so many other Acrobat features that are often redundant and can be found in several different places around the tools, palettes and menu commands, changing font attributes in comments can only be handled in one place —the Properties Bar.

ted