Send this page





JUNE 2006

Redacting PDF files: A survey of tools
by Duff Johnson, CEO, Document Solutions, Inc.

Introduction

The ability to conceal text or images using correction fluid is an essential feature of paper, the original portable document format. The process of information removal is formally known as redaction.

In a sense, redaction is a horizontal process, since almost everyone removes private or sensitive content from a document at one time or another. More common where national security, privacy and liability concerns are paramount, redaction is often a mandated workflow for releasing sensitive government documents, legal papers, financial and scientific documents, medical records, human resources and many other applications. The grease-pen, razor-blade, document tape and magic-marker industries still rely on redactors for some significant portion of their sales volume.

Adobe Systems launched the Portable Document Format in 1993, and the success of the format derives in large part on the inherent reliability of its “what you see is what you get” nature. One might be forgiven for thinking that after 13 years and close to a billion dollars in annual revenue, no-brainer PDF redaction would be a basic feature of the company’s Acrobat software. Such is not the case. “Adobe predicted significant potential liability and limited market opportunity in a professional-strength, PDF-redaction tool,” says Appligent founder Mark Gavin. To date, Adobe has opted not to include redaction tools directly in Acrobat. Instead, Gavin says, they asked Appligent to consider developing a plug-in that would not only redact PDF, but also provide powerful automation to assist in speeding the redaction process. The first version of Appligent’s Redax was released in 1996.

Even now, the only way to effectively redact using Acrobat Standard or Professional amounts to a clunky (if effective) workaround, described in exacting detail by Adobe’s Rick Borstein.

The inability to properly redact in Acrobat 7 seems strange because Adobe has paid so much attention (not all of it successfully) to Acrobat’s Commenting features, which in many user’s minds can easily “look like” proper redaction. The Highlight feature, for example, may be used to cover text with a black box, but nothing actually gets redacted unless the page is saved as an image and the PDF re-created from that image. Indeed, the word “redact” does not appear in the Acrobat Help file at all.

Confusion about what constitutes real redaction has led to some significant misunderstandings. Blacked-out text looks redacted on screen and in print – so is it?  No!  See this example [PDF: 13 KB]  

Responding to these concerns, the National Security Agency (NSA) recently released an advisory document entitled ‘Redacting with Confidence,’ [PDF: 665 KB]   but the advice simply ignores the commercial availability of fully capable redaction software. Following publication of NSA’s document, Appligent released Correcting the Record, [PDF: 297 KB]   a white paper that authoritatively details misconceptions about PDF redaction, and how to avoid them. In March, 2006, Adobe released Redaction of Confidential Information in Electronic Documents [PDF: 628 KB]   on the subject as well.

Thus far, the way has been left open for third-party developers to try their hands at meeting the needs of would-be PDF redactors. Appligent is one of the few companies to have delivered true PDF redaction. The technical issues in properly redacting text, raster images, vector images, tags and text-under-the-page-image are not trivial. PDF files can be very complex, which makes the mechanics of PDF redaction a challenge to develop.

As a result, most software developed for redaction purposes doesn’t deal in PDF per se at all, but must convert PDFs to images prior to the redaction process. This approach, while relatively easy to implement, strips all the non-redacted text from the PDF, usually damages the appearance of photographs and other images and often results in increased filesize.

For government applications, this method is also especially problematic because the output does not and cannot (absent an OCR-and-tagging process) comply with Section 508, which requires that all non-redacted text be available to screen-reader software, and further, requires that redactions themselves be accessible rather than simply treated as a blank spot in the document. So far, only Appligent delivers serious, stand-alone redaction software that keeps PDFs “as they were” in all respects – minus the redacted information, of course.

PDF redaction tools

There are three basic types of tools for redacting PDFs:

  1. Those that redact PDFs natively, without requiring conversion to an image: True PDF Redaction
  2. Those that supply purpose-built tools for performing redaction on image files: Advanced Redaction
  3. Those that leverage Acrobat’s Commenting features by burning comments or other overlays onto a rasterized PDF: Redaction via Comments

Let’s survey the current offerings:

True PDF Redaction:
Redax 4.0, by Appligent (US)
Windows and Mac

Redax is the standard against which all PDF redaction software must be measured. The application rewards either a structured or ad hoc approach with the most efficient, powerful and flexible PDF redaction option available. Now in version 4.0, Appligent offers two Acrobat plug-ins, Redax and Redax Lite, and the stand-alone Redax Enterprise Server. The software is also licensed for integration into several high-end document-management and litigation-support packages.

Intended (and priced) for professional use, Redax offers a crisp, no-nonsense method fully integrated into Acrobat. Redax permits complete control over the appearance of redacted sections, workgroup management, the inclusion of Freedom of Information Act (FOIA) and other exemption codes, multi-page and annotation redaction, text and pattern matching and complete reporting. The plug-in works equally well on both electronic-source and scanned-source PDF files with OCRed text, redacting only what’s needed without burning the entire file’s text to an image. Redax Enterprise Server deploys the power of Redax using command-line, watched folders and web-services models for automated redaction processing.

Redax is unique in that it redacts PDF files properly, without converting the entire page to a raster image. Thus, only with Redax can you redact a document and retain searchable text, scalable fonts and all the other advantages of electronic documents on the unredacted content. Because Redax works natively within Acrobat, PDF documents may be redacted over time by different users, with the redactions only committed to the file after a comprehensive review.

Since Redax copies the open document to a new file before actual redaction occurs, the tool protects the original PDF, and also eliminates all traces of metadata from the redacted document.

True PDF Redaction:
ISIToolBox Professional 5.5, by Image Solutions, Inc. (US)
Windows

ISIToolBox Professional is a multi-function Acrobat plug-in with many PDF management functions. One tool in the ToolBox is a modest redaction capability called iRedact.

Compared to Redax, the iRedact feature in is simplistic – there’s no facility for loading lists of words, pattern-matching, FOIA codes or any of Redax’s many other features. With iRedact, users draw boxes on the screen to redact content with white or black. They may also perform a find-and-replace, converting individual words to a standard text string or displaying the redaction via a white or black box.

Unlike, Redax, iRedact doesn’t use annotations to preview redaction work – redactions are applied directly, and therefore there’s no undo facility.

iRedact performs redactions in the currently open document, so document metadata goes unredacted. Use of iRedact should always occur on copies of the original PDF to minimize the risk of damage to the original document from thoughtless saving.

Advanced Redaction:
Rapidredact, by OnStream Systems (NZ)
Windows

Unlike Redax and ISIToolBox, Rapidredact is not an Acrobat plugin; it converts Office documents, images and PDF files to TIFF or JPEG images, displaying the results in an image-viewer intended for a manual-redaction process.

As such, the program is quite limited, since it rasterizes (converts to an image) any PDF loaded into it. Words or phrases to be automatically redacted must be input before the file loads – any redaction after loading into the viewer must occur manually.

Rapidredact’s output is a TIFF-based PDF, no longer scalable or searchable. The interface includes a variety of useful redaction tools and it’s easy to use. Because it converts every document into a image file before beginning manual redaction, Rapidredact works on both electronic-source PDFs and scanned-source PDFs.

The main advantage Rapidredact has over the redaction-via-Comments method in Acrobat is the text-search-and-redact feature. This, along with the nifty Scribble tool (that works just like a magic marker) and the ability to add exemption codes to redactions make Rapidredact the only desktop application to even approach Redax’s capabilities.

Redaction via Comments:
Acrobat Standard or Professional 7.0x, by Adobe Systems (US)
Windows and Mac

Adobe Acrobat, while not actually including a redaction tool per se, does make it possible to redact a PDF without sniffing grease-pencil or magic-marker fumes.

If occasional manual redaction is all you need, then Acrobat’s Commenting tools may be used to add black highlighter or boxes, followed by a “Save As” to TIFF, followed by converting the resulting TIFF back into a PDF. If that sounds tedious, your instinct is correct. It does work, however, and all that’s required is the full version of Acrobat. See Rick Borstein’s article on the subject.

Redaction via Comments:
PdfCompressor 3.1, by CVISION Technologies (US)
Windows

PdfCompressor isn’t a redaction tool per se, but focuses instead on image compression and OCR. Used in conjunction with blackout Acrobat comments, the application can open marked-up PDFs, convert them to image files, and optionally, perform high-quality OCR to revive the unredacted text before saving to PDF. On output, PdfCompressor allows the user to control all manner of PDF settings, and the engine may be run in batch, through watched folders, or via the command-line.

Redaction via Comments:
PDF Enhancer, by Apago (US)
Windows and Mac
Server versions available for Solaris and AIX

Like PdfCompressor, PDF Enhancer isn’t a redaction tool as such, and it doesn’t include a method for selecting text or images. Nonetheless, PDF Enhancer can be useful for redaction purposes because, like PdfCompressor, it can batch process marked-up PDF files, converting them (via rasterizing each page) directly to fully redacted PDFs while skipping the save-to-TIFF-and-covert-back-to-PDF step. Like PdfCompressor, PDF Enhancer facilitates control over the output image resolution and compression, so post-redaction file sizes are easily managed within the same application. Unlike PdfCompressor, PDF Enhancer does not include the ability to OCR your new redacted, image-based PDFs.


Conclusion
For those who need to remove information from electronic documents, it’s a shame that Adobe has yet to offer even a basic redaction solution that doesn’t depend on burning a perfectly good electronic-source PDF file to a crude TIFF or JPEG image. Even without a feature-rich solution like Redax, an elementary “select and black out” redaction option in Acrobat would be most welcome. Failing that, users requiring redaction solutions for PDF are limited to two basic options: true PDF redaction via a tool such as Redax, or converting PDF to TIFF to “burn in” Comments.

Either way, you can get the job done without printing to paper and breaking out the grease pens.

Article Feedback

Share your thoughts. Tell us what you think about this article.

JUNE 26, 2006
to just follow up on the recommendation of pdf enhancer in this space...it should be noted that enhancer offers the most comprehensive removal of metadata & application private data - to ensure that there is nothing left. using pdf enhancer on a pdf that has been processed with redax is the best way to go! leonard
— leonardr

JULY 26, 2006
would very much like to have all of these adobe acrobat user community articles available for download as pdfs!! rick gill rlagill@yahoo.com
— rlagill

Log in to leave comments


<< Back to Articles main menu.



AcrobatUsers.com  >>  User Groups • News • Events • Articles • Blogs • How To • Resources • Member Log in