2. Related Work

Research in the area of adaptation of documents to different output devices and allocation of document information to different output channels has diverged into at least two directions. One direction concerns how to reformat document content for small devices, transforming information represented in one visual information channel into another visual channel. The other direction concerns making document information accessible to visually impaired users by transforming visual information into audio information.

In the visual-to-visual transformation category, some solutions focus mainly on readability of text, displaying some text in larger fonts and allowing users to select page elements to be zoomed in or collapsed [1], summarizing the text content [2], semantic grouping [3] and re-flowing content based on reading order [7]. Solutions that focus both on text and images include Enhanced Thumbnails [13] and SmartNails [14]. Enhanced Thumbnails contain keywords extracted from the source document and pasted onto a low-contrast downsampled page image. SmartNail technology [14] creates an alternative image visualization for a single document page by scaling, cropping, and reflowing page elements, subject to display size constraints. Both techniques include image and text, but the output is a static visual representation of each page. In Multimedia Thumbnails, the output consists of document information from multiple pages represented in a dynamic way using animation and audio.

Some of the most relevant prior art to our current work in the visual-to-visual category is described in [4][5], where a method for non-interactive picture browsing on mobile devices was proposed. The goal there was to find salient, face and text regions on a picture automatically and then apply zoom and pan motions on this picture to automatically provide informative close ups to the user. This method concentrates on representing photos, whereas our method focuses on representing high-resolution multipage document content. Moreover, the automated picture browsing technique shows only visual information, whereas we employ visual and audio channel for communicating document information.

There has also been some work on re-targeting audiovisual content, which was produced to be viewed in large displays, to small displays. The method in [6] converts high-resolution video clips to play on small displays by adding virtual zoom and pan operations in order to retain the recognizability of the content based on automatically extracted salient image regions and motion activity. Unlike our work where we transform visual information to audiovisual information, the research in [6] concentrates on conversion of audiovisual content to another audiovisual representation.

Works that fits into the visual-to-audio transformation category include document browsers that support synthesizing text to speech, such as Adobe Acrobat PDF reader for visually impaired users [15]. Furthermore, some work has been done on developing Web browsers for blind and visually impaired users [16][17]. The focus in [17] is to map a graphical HTML document into a 3D virtual sound space, where non-speech auditory cues differentiate HTML documents. Their goal is to transform as much information as possible into the audio channel. In contrast, MMNails contain document information optimally selected for communication through both, visual and audio channels.

Multimedia Thumbnails can be seen as being the first technique that transforms purely visual, multipage, formatted document information into an audiovisual representation that exploits both the visual and the audio channel of a mobile device. Allocation of the visual and audio channel is performed optimally with respect to information content, display size, time duration, and user's task.