PDF is an increasingly popular format for publishers to use for web versions of printed documents. What are they? How can reading impaired people read them?

Accessibility is increasingly important not only for fairness but also because the Disability Discrimination Act often makes it a legal obligation. The British Dyslexia Association, for whom this article was originally written, is particularly keen to set an example in being dyslexia and disability friendly and so to make its materials as accessible as reasonably possible.

Reading PDFs:

What is a PDF?

Portable Document Format (PDF) is a file format developed by Adobe Systems. Publishers and designers like PDF because they can make an electronic document for people to read on a computer, but which looks exactly like it did on paper. They have full and detailed control of what can be quite complex formatting.

They also like it because it is very convenient — the document will usually have been sent to the printer in PDF format from a publishing program like Quark. BDA Contact Magazine goes through this PDF stage. So it is very easy just to put this file on the web, or to adjust it slightly by, for example, making another version with the pictures at lower resolution to make a smaller file that is easier to download.

How do you read a PDF?

To read PDF files you need at least the free Adobe Acrobat Reader which you can download from the Adobe website. If you have reading difficulties and want to listen to the text you can use a screen reader like the ‘Read Out Loud’ option in the Acrobat Reader itself, which allows you to read whole pages or, more conveniently, use Texthelp PDFAloud which you get with Read & Write Gold or as a standalone product. It offers text highlighting and greater text to speech versatility with a choice of voices. Ironically, at the time of writing, even in the flagship document of Adobe’s in the illustration, Read Out Loud doesn’t read absolutely all the text (a little is image, part of the table of contents seems to be hidden) and PDFALoud doesn’t read any because of the document’s security settings.

Types of PDF:

There are three types of PDF file, which at first sight all look the same:

  • Image;
  • Searchable image;
  • Formatted text and graphics.

Image PDFs are rare on the web — fortunately so, because they are a pain. They are usually created by scanning pages of text and just contain an image of the text not the separate characters and words of the text itself. So the screen readers cannot read them as they are. Some of the advertisements in Contact Magazine are just images.

In principle you can convert image PDFs into textbased PDFs by feeding the file into an OCR (Optical Character Recognition) package like Abbyy FineReader. But the process can be frustrating and you may do better just to print it out and scan it back in again using an OCR package!

Searchable image files are better because they have a copy of the actual text behind the image. So screen readers can read the text and search engines like Google can index these files.

But it is formatted text and graphics that you need if the file is to be truly accessible.

Some PDFs can be in old formats. Authors can set security settings — e.g. to stop copying or printing. These can also stop screen readers working properly. If all else fails and you can’t read the PDF satisfactorily, you may be able to convert it into plain text or to HTML. Adobe have programs to do this at www.adobe.com/accessibility. Easiest of all, but not with the best layout options and with pictures removed, you find that the search engine ‘Google’ offers the document with a ‘View as HTML’ option.

What is an accessible PDF?

A formatted text and graphics PDF file is not necessarily accessible. Some aspects of accessibility are mainly relevant to visually impaired people (who may of course be dyslexic or interested in dyslexia). Others are also directly relevant to the needs of dyslexic people. Accessible PDFs should have:

  • Structure — styled section headings, table of contents linked to the headings for easy navigation;
  • logical reading order: unless the order is tagged a screen reader may read the different pieces of text in the wrong order.
  • bookmarks and cross-references.
  • “alt” text for images and links, so that people who cannot see the images can know what they are about.
  • natural language identification, so that the screen reader knows what language to read in.

Creating accessible PDFs:

Although it is possible to make existing PDFs accessible using Adobe’s Acrobat product it is frustrating, difficult and time-consuming to do so.

It is better, and ultimately easier, to build accessibility into the document as you are creating it. With a little bit of training and consistency for authors and designers it is possible to build accessibility into the workflow from writing the original content (usually in Microsoft Word) to assembling it in a desktop publishing package to publishing it as PDF. Unfortunately Quark Express, the most popular desktop publishing package, cannot handle PDF accessibility.

Adobe’s own Indesign product is better at preserving and correcting accessibility features as part of publishing workflow. The publishing and printing industry needs to gear itself up to handling and enhancing accessible documents. The pain of retrofitting accessibility to existing PDFs is not acceptable.

Interestingly, traditional accessibility experts don’t cover, at all, the issues that Jean Hutchins talks about in her ‘Hear Hear’ article in this issue of Contact. Text to Speech programs ought, by now, to be clever enough to read things correctly much more of the time. If there is enough information for humans to work out how it sounds, then there should be enough for computers to work with. Until then those of us who care about accessibility also need to adapt the way we write text to get it to read out loud better.

This is a longer version, with links, of an article originally written for the British Dyslexia Association Contact Magazine.


The resources often refer to things as they were a couple of years ago, so version numbers are obsolete. But maybe things haven’t changed enough since then!

By Ian Litterick and EA Draffan. Both authors are British Dyslexia Association New Technologies Committee members: Ian Litterick is chairman of iansyst Ltd/dyslexic.com and EA Draffan is a consultant and provider of the www.emptech.info assistive technology resource.

Article last updated: 8 December 2006

Author: Ian Litterick
Published: 08 Dec 2006