Archive for the 'PDF' Category

ISO 32000 is published

Thursday, July 3rd, 2008

As I’ve mentioned before, I’ve had two major projects during 2007 and 2008 - Acrobat 9, which is now shipping, and my work on the transitioning of PDF from Adobe to ISO, which I’ve spoken about as well.

However, while ISO 32000 (part 1) was ratified in January, the standard itself wasn’t actually published until yesterday.  But now you can run out, and for just a few Swiss Francs, get your own personalized copy of ISO 32000-1.

AIIM secretary, Betsy Fanning, writes in her blog:

While the committee’s initial work has been completed in getting the standard through the approval and pubication process, the committee’s work is far from over. The focus for the committee will now be to identify new features and functions that may be added to the PDF file and included in the standard. To follow the activities of the US committee for this standard, please visit http://www.aiim.org/Standards/article.aspx?ID=33223.

The committee is already at work on part 2 (or PDF 2.0, if you will), with our next meeting taking place in Beijing in October.  I look forward to seeing the first (of potentially many) submissions from others on what they’ve always wanted to see in PDF but have not had the chance to suggest before.   But now it really is "Everyone’s PDF".

Reference XObjects, PDF/X-5 and Acrobat 9

Thursday, June 12th, 2008

PDF 1.4 introduced Reference XObjects, but it has not been implemented in Acrobat in the intervening years. With the growing popularity of PDF/X-5g documents and interest in variable data printing (VDP), the Acrobat team decided to add support for Reference XObjects in 9.0.

Read more about this from Shradha, one of the Acrobat engineers who made it happen!

OOXML & ODF vs. PDF/A

Tuesday, February 27th, 2007

A colleague sent me the following statement today and asked me to respond to it. I first wrote only to him, but then thought that it was worth posting here.

He wrote:

Policymakers seem to cherish the ‘perception’ that the advent of ODF and/or OOXML will make PDF/A-1 a ‘redundant’ standard for long-term document preservation. Their ‘case’: it’s based on XML and XML-related open standards, so long-term accessibility is granted. They may reason as follows: just wait, implement a few Office plug-ins for conversion of the useful Office legacy files to ODF and/ or OOXML and ‘we’ have taken care for the long-term accessibility of these legacy files, adequately. This will keep ‘us’ away from ‘obscure’, PDF-derived, open standards. If it’s not XML, the format has no right to exist, at all. Even Adobe admits this by developing MARS. Please, let ‘us’ avoid the costs involved in developing and maintenance of PDF-infrastructure, PDF-training and PDF-knowledge.

Here is what I have to say.

Before one can argue over file formats, one needs to determine what it is that is being archived and why. For example, if I was interested in archiving an address book - I would probably focus on the data and not on a single presentation of that data. However, if I was archiving the Declaration of Independence, then I would focus on the presentation of the content in addition to the actual content. I also want to ensure that any content is maintained in its original “format” - so that vector diagrams from a CAD-generated floor plan would remain as rich vectors and not be converted to something like raster data. Finally, in all cases, I would want to ensure that any relevant marginalia (metadata, comments/markup, etc.) could be incorporated.

This is why the archival community approached Adobe about the use of PDF for long term archival storage of content containing text, images and raster data. PDF is the only format that encompasses ALL of the above needs - content, presentation and metadata for all standard content elements (text, vector, raster). Combined with that is a technical design that enables easy creation of a “reference implementation” at some point in the future without any ambiguities - thus ensuring that the content and its presentation will survive.

Neither OOXML or ODF address all these needs. In fact, they are focused primarily on the textual content and (limited) metadata - and in no way help preserve the presentation of that content. As such, they aren’t even acceptable for the archiving of simple Office documents They also do nothing to address the needs of those wishing to archive scans, CAD drawings, print publications and many other types of documents. Combine those limitations with the fact that neither was designed with the intent of ease of creation of a reference implementation (it’s IMPOSSIBLE to write a fully compliant OOXML viewer), also make their use as archival standards insufficient.

Becoming an Evangelist

Wednesday, November 1st, 2006

Well, I’ve returned to the “mothership”…

Starting last week at the Adobe Max conference, I am now a Technical Standards Evangelist for Adobe Systems - focusing on PDF-related standards such as PDF/X, PDF/A, PDF/E, etc. as well as the new Mars project that I mentioned in my last blog entry.

This means that I can continue to do my work to move the standardization of PDF technologies forward with all the resources of Adobe behind me. In addition, I get to help the engineers here at Adobe create the most standards-compliant PDF on the planet! It’s a lot of work - but I am already in the thick of things and loving every minute of it.

It should also mean that I’ve got more time to write about the various standards, since it’s now part of my job description ;) .

Oh, and if you’re in the Omaha area next week - come see me speak about PDF Standards and Mars at the new PDF Central Conference.

MARS Attacks!

Tuesday, October 24th, 2006

This post is for all those people who have avoided PDF because “if it’s not in XML, then it’s not good”, for those that think that Microsoft’s XPS is the “future of electronic paper”, and for those who won’t use anything not based on 100% OPEN STANDARDS…

Adobe announced today MARS - the Portable XML-based Document Format. The format that will give techno-geeks what they’ve been looking for in an electronic document format - based entirely on open standards.
Living inside of a ZIP archive is a collection of both custom and standard XML grammars (including a slightly extended SVG for page contents), standard image formats (JPEG, JP2K & PNG), font (Type 1, TrueType & OpenType), and other binary formats (ICC profiles, etc.).

BUT unlike some of the other options out there, it’s not just “electronic paper”! It has all of the interactive features that users of PDF have come to expect - forms, hyperlinks, annotations/markup, multimedia, etc.

Oh, and every copy of Acrobat AND Reader 8 will support it natively - just like they do with PDF today. No explict conversions necessary (unless that’s what you want to do).

Why would you use anything else?!?!

Hopefully this wets your appetite for MARS…and I plan to write more about it in the coming weeks.

NOTE: MARS is NOT a replacement for PDF - it is simply an alternative representation/serialization of the features and capabilities of PDF based on XML.

Why “refrying” a PDF is evil!

Friday, October 6th, 2006

I got into a discussion/argument today in an online forum about the process of conversion of PDF->PS->PDF to help “clean up” PDF files. This process, called refrying is one that used to be quite popular - but since Acrobat 5.0 has been frowned upon by Adobe and others.The reason for this process being avoided is due to the wide variety of PDF features that can NOT be represented in Postscript.  Remember that the last update to Postscript (Level 3) was in 1997, while PDF has undergone 3 revisions (1.4, 1.5 & 1.6) since then - with 1.7 coming shortly with the release of Acrobat 8.
Here are a list of things that you might use in the content of a PDF that don’t translate well to PostScript

  • Transparency
  • ICC-based colors
  • 16bit color
  • JBIG2 compression
  • JPEG2000 compression
  • Layers (Optional Content Groups)

Also consider that any additional information added to PDFs during a PDF-based workflow (such as from Creo, Agfa, Quite, etc.) will be removed during the PDF->PS downgrading…

Of course, there is also the myriad of non-content elements that can be found in a PDF that don’t translate to Postscript/print, such as

  • Hyperlinks
  • Annotations, Commenting and Markup
  • Forms
  • Multimedia (movies, sounds, etc.)
  • Bookmarks
  • Metadata
  • and more….

There are also concerns regarding fonts & text “searchability” that can be introduced into the refrying process DEPENDING on how the operation proceeds.  Differenet PDF->PS conversion tools, different OS platforms and even simply ‘printing to PS’ will produce wildly different Postscript output for the same PDF - thus producing wildly different output PDFs.

So in conclusion…

JUST SAY NO TO REFRYING

What’s new in PDF 1.7

Monday, September 18th, 2006

So Adobe has announced Acrobat 8!

With a new Acrobat, of course, always comes the latest revisions to PDF itself. For the first time in a while, Adobe hasn’t really made too many change to the file format. Let’s take a look at the changes…

  • MAJOR improvements to 3D!
    • Support for 3D (via a new 3D Annot) was added in PDF 1.6 and since Adobe has gotten lots of real-world feedback about what was still missing - so PDF 1.7 addresses many of those limitations.
    • Ability to annotate the 3D model
    • Control of visual appearance w/o resorting to JavaScript
    • Control over animated playback
  • Printer Controls!
    • Users have been begging Adobe for this feature for as long as I can remember…
    • A PDF can now include default print characteristics including paper selection and handling, page range, copies, and scaling
  • Portable Collections
    • Known in the Acrobat UI as “Packages” and detailed by my colleagues.
    • It expands on the existing embedded file mechanism (/Names/EmbeddedFiles) to support a variety of interesting new solutions - while maintaining backwards compatibility with Acrobat 6 & 7.
  • Improvement to dimensioning of annotations
    • Polyline & Polygon annotations can now have scale & measurement-aware dimensions attached to them
  • More Tags for Tagging
    • Interactive elements
    • Table improvements
    • Pagination objects such as headers & footers
  • Document Constraints
    • These enable a document author to specify certain criteria that must be met in order for the document to be usable in parts of a workflow.
      • Signature Constraints - is the signature valid, does it contain certain DN keys, etc.
      • Viewer Constraints - does the PDF viewer support and/or have enabled certain features?
        • this will help authors of complex document prevent it being loaded by older (or non-compliant) viewers!

And that’s it for PDF 1.7….for now…

A look at Adobe Illustrator & PDF editing

Wednesday, August 16th, 2006

There is a long standing bubbe meise among publishers & printers that Adobe Illustrator can be used to edit PDF documents. Guess what folks - that is simply NOT TRUE! And I’d like to look at two aspects of this.

Aspect 1 - PDF as AI’s native format
The bubbe meise may have come about due to a common misunderstanding about the “native file format” for Illustrator (since version 9). Even though the file extension is .ai, the file is, in reality, a 100% valid PDF document. Just change the extension and open in Reader - no problems!

However, even though this is true, Illustrator doesn’t actually use the “PDF parts” - it just uses PDF as a very nice envelope for it’s own private data. This is accomplished through the use of the /PieceInfo key on the /Page dictionary as documented in Section 10.4 of the PDF Reference. The actual Illustrator internal data is organized into the /Private key of /PieceInfo. Illustrator just reads this - ignoring the rest of the PDF. Photoshop also does the same thing with its “Photoshop PDF” format. This is why programs like PitStop, when you attempt to edit Illustrator or Photoshop documents, present a warning.

Aspect 2 - Illustrator’s ability to read/process & write PDFs
For many years, Illustrator has had the ability to open up PDFs and let you work with each “object” using the native AI toolbox - thus propogating the bubbe meise.

Although Illustrator CS2 supports most features of PDF, there are a variety of things that it is simply unable to handle correctly when opening. Fortunately, Illustrator will warn you about them - but most folks tend to ignore such warnings. Some (but not all) of the features not supported include:

  • Multiple colorspaces (AI only supports a single colorspace on its canvas)
  • All features of PDF transparency (groups & blending spaces, esp.)
  • Certain complex smooth shadings
  • Subset fonts using custom encodings
  • Embedded fonts not installed on the editing computer (including Type 3 & TeX fonts)

In addition, any non-content elements such as bookmarks, hyperlinks, metadata, annotations, etc. will all be “thrown on the floor” by Illustrator. So keep that in mind as well.

BUT WAIT - there is some light at the end of the tunnel…
Adobe Acrobat (both Standard and Professional) include a tool called the Touchup Object Tool, that enables you to take an entire “object” and have it edited with an external editor. By default, the editors are Photoshop for raster data and Illustrator for vector & text. To use, just select the Touchup Object Tool (it’s connected to the Touchup Text Tools on the Advanced Editing Palette), hilite the object you wish to edit, right/control-click your mouse and choose “Edit Object…”. Off you go to your editor, make your corrections, then save - and the updates will appear back in Acrobat. Cool huh??
[This tip courtesy of Ted Padova]

PDF Standards from ISO

Wednesday, July 5th, 2006

Back from the LONG weekend holidays here in the US (and in Canada, which I was visiting)…I thought I’d write a bit about some of the standards related to PDF from ISO (International Standards Organization).

PDF/X - this is the first of the standards the ISO built around PDF. The X is for “eXchange”, specficially blind-exchange among prepress providers (such as advertisers to magazines). There is current PDF/X-1a (for CMYK & Spot colors), PDF/X-3 (for color managed data) and PDF/X-2 (which no one has ever actually implemented!). PDF/X-4 and PDF/X-5, which introduce newer PDF features such as transparency and layers/optional content are on the way!

PDF/A - this is the recently finalized (Oct 2005) standard for “long term archival storage of electronic documents as PDF”. Where it took PDF/X a number of years to gain traction, PDF/A is getting adopted (or in the process of adoption) by both vendors and users VERY quickly! Currently, the version is PDF/A-1a (for tagged & highly metadata-aware documents) and PDF/A-1b (for the average document). PDF/A-2 is currently in discussion to match the advances of PDF/X-4.

PDF/E - focusing on the needs of the Engineering community, PDF/E is currently in late stages of standardization and coming together nicely. It leverages the latest and greatest features of PDF that are targetted for engineering, such as 3D and object-level metadata.

PDF/UA - still very early in discussion, this standard is focused on providing Universal Accessibility to PDF documents by building on the work already present in PDF for Section 508 compliance.

And there are even more that are just starting up…So keep your eyes peeled for more versions of PDF focused on specific market segments and needs.

A few of my favorite tools

Thursday, June 29th, 2006

I thought I’d start things off with something useful to all comers…

Using the “Pages” feature of this software, I’ve added a new permanent place where I’ll be keeping a list of my favorite PDF programs/tools.

The first six tools to make the list are:

  • PDF CanOpener
  • PitStop Professional
  • FTMaster
  • Redax
  • Enfocus Browser
  • PDFlib Font Reporter

Go here for more details about each of the listed programs.