OOXML & ODF vs. PDF/A

A colleague sent me the following statement today and asked me to respond to it. I first wrote only to him, but then thought that it was worth posting here.

He wrote:

Policymakers seem to cherish the ‘perception’ that the advent of ODF and/or OOXML will make PDF/A-1 a ‘redundant’ standard for long-term document preservation. Their ‘case’: it’s based on XML and XML-related open standards, so long-term accessibility is granted. They may reason as follows: just wait, implement a few Office plug-ins for conversion of the useful Office legacy files to ODF and/ or OOXML and ‘we’ have taken care for the long-term accessibility of these legacy files, adequately. This will keep ‘us’ away from ‘obscure’, PDF-derived, open standards. If it’s not XML, the format has no right to exist, at all. Even Adobe admits this by developing MARS. Please, let ‘us’ avoid the costs involved in developing and maintenance of PDF-infrastructure, PDF-training and PDF-knowledge.

Here is what I have to say.

Before one can argue over file formats, one needs to determine what it is that is being archived and why. For example, if I was interested in archiving an address book - I would probably focus on the data and not on a single presentation of that data. However, if I was archiving the Declaration of Independence, then I would focus on the presentation of the content in addition to the actual content. I also want to ensure that any content is maintained in its original “format” - so that vector diagrams from a CAD-generated floor plan would remain as rich vectors and not be converted to something like raster data. Finally, in all cases, I would want to ensure that any relevant marginalia (metadata, comments/markup, etc.) could be incorporated.

This is why the archival community approached Adobe about the use of PDF for long term archival storage of content containing text, images and raster data. PDF is the only format that encompasses ALL of the above needs - content, presentation and metadata for all standard content elements (text, vector, raster). Combined with that is a technical design that enables easy creation of a “reference implementation” at some point in the future without any ambiguities - thus ensuring that the content and its presentation will survive.

Neither OOXML or ODF address all these needs. In fact, they are focused primarily on the textual content and (limited) metadata - and in no way help preserve the presentation of that content. As such, they aren’t even acceptable for the archiving of simple Office documents They also do nothing to address the needs of those wishing to archive scans, CAD drawings, print publications and many other types of documents. Combine those limitations with the fact that neither was designed with the intent of ease of creation of a reference implementation (it’s IMPOSSIBLE to write a fully compliant OOXML viewer), also make their use as archival standards insufficient.

7 Responses to “OOXML & ODF vs. PDF/A”

  1. Article Feed » OOXML & ODF vs. PDF/A Says:

    […] Original post by leonardr and powered by Img Fly […]

  2. michaelejahn Says:

    I concur! The archived book analogy plays well - if the original source document exists, and if that source document is to be converted to both an eBook readable on a palm pilot and “ready to print’ version, PDF/A can handily be used for either instance. The challenge is in the TOOLS made available for these conversions - and Adobe has been a reliable resource for such tools. As the manufactures, suppliers and end users have historically shown, to make a standard ’stick’ you need tools to generate, view and verify that a file format to both see if it meets the requirement and fills the requirement. Without tools, standards are useless.

    Michael Jahn
    ELAN GMK

  3. Frank Spangenberg Says:

    Some questions to think about… How many tools are able to display a PDF/A with all possible features? How close is the converted PDF/A file to the “original”? Is it really important to save the visible appearance instead of saving the content structure and well known formating instructions? Both formats are not perfect in doing the archive job…

  4. leonardr Says:

    Excellent questions, Frank. Let’s look at each one.

    >Tools to display PDF/A

    Well, since PDF/A is a subset of PDF 1.4, the answer is A LOT! I would that pretty much every PDF viewer out there - from open source (Xpdf, Ghostscript, Kpdf, etc.) to commercial (FoxIt, Jaws, etc.) HOWEVER, it should be noted that ONLY Acrobat/Reader 8 actually adhere to all the requirements of a PDF/A compliant viewer.

    But taking that to the other products.
    >Tools to display ODF or OOXML

    Other than OpenOffice and Office 2007 - I am not aware of any :( .

    >How close is PDF/A to the “original”

    If it’s not a 100% faithful visual representation - then whatever created the PDF/A document should be publicly flogged. Seriously, the whole point of PDF (and by reference, PDF/A) is reliable visual reproduction. But again, you have a better chance of the PDF being 100% faithful than trying to render the same ODF/OOXML on a completely different computer.

    >Is it important to save visual appearance vs. content & structure

    As I noted above - to the archivists, the answer is a resounding YES! Why, because what the HUMAN saw when they authored it was the visual appearance and NOT the “computer representation of content & structure”. Therefore, future humans MUST see the same thing the author saw.

    But again, perhaps what you are trying to achieve is about data and not visual - and that’s fine. PDF/A is NOT a solution for everything..

  5. prolling Says:

    Regarding standards Leonard did a good presentation at the Pdf conference. This gives a clear overview of what is out there, at least for me who is in learnig mode. He mentioned that there would be a link to the presentations he did on this site. Where could I find the links to his presntations (also the pdf internals).

  6. adriand Says:

    Dear Leonard;

    What archiving capabilities exist right now in the latest version of adobe acrobat for companies that produce CAD drawings ?

    To my way of thinking, the ability to search multiple pdf files for textual information contained within the drawing title block would be a major benefit. If you have a CD archive of several thousand drawings in pdf format, generally you can only search them by their filename.

    Within Google you can search for pdf’s and find text information within the files. I understand that these pdfs were not created from CAD files and that CAD files would probably have to be OCR’d to get this kind of functionality.

    But perhaps this functionality exists now. Can you set me straight as to what CAD archiving functionality already exists with the latest versions of Acrobat ?

    Best Regards

    Adrian Dunevein

    www.aaadrafting.com

  7. praca Says:

    Yes great articile:) Thanks:)

Leave a Reply

You must be logged in to post a comment.