1. Introduction

Devices with small displays, such as MFPs, PDAs, cellular phones, and digital cameras are increasingly being used to access documents, web pages, and images. Browsing and viewing of documents on such devices, however, is still very difficult. Currently this problem has limited solutions. For example, often web pages are re-designed for viewing on small displays. In digital cameras, the problem of browsing photos is usually solved by simply showing a low resolution version of photos and expecting the user to zoom into the picture for more details. Document viewers on PDAs employ a similar method, allowing user to zoom into the document and scroll to see the details. These solutions require interaction (zoom in, pan, etc.) with a device that has limited navigation capability, e.g., a cellular phone. Automatic re-flowing of text in documents and web pages is suggested by some researchers as a solution to fit them into small displays [1][2]. Moreover, automatic navigation of photos is presented in [3]. However, these solutions either do not support multipage document images or require changing the layout and appearance of the document.

We introduce a new document representation called Multimedia Thumbnail (MMNail) that is suitable for viewing documents on small displays. Input to the MMNail generation algorithm is a 2D document image and output is a multimedia clip that can be seen as a guided tour through a document. We animate the document pages, zoom into and pan over the most important visual elements, such as title and figures, automatically. This way, we utilize both spatial and time dimensions for presenting the documents. Moreover, the audio channel is used to communicate some of the textual information, so called audible information. While document contents are shown in the visual channel, the audio channel is used to speak important keywords, figure captions, etc. As a result, an MMNail utilizes both the visual and audio channel of the browsing device in order to present an overview of the document on a limited display and in a limited time-frame, while keeping the interaction required by the user to a minimum. An example of an MMNail of a two page document is shown in Figure 1. In this example, the MMNail representation shows the first page, then automatically zooms into the title, shows the second page, and then automatically zooms into the figure. The audio channel on the other hand, first communicates the important keywords from the document and then reads out the figure captions that are too small to read on the screen.

figure 1
Figure 1. Multimedia Thumbnail Example

Given that display devices have limited resolutions and typically limited playback time for a multimedia clip, it is often not possible to animate through an entire document or communicate all the audible information available via the audio channel. This leads to the following problem: Given the time and display constraints, which parts of the document should be included in the multimedia representation? Our paper addresses this problem with a three step algorithm for automatically generating MMNails, which consists of analysis of document image, optimization of document representation given a time constraint, and synthesis of Multimedia Thumbnails. In the next sections, we describe each of these steps in detail.

2. Algorithm Overview

Multimedia Thumbnails are created from electronic or scanned documents with a three step algorithm. In the analysis step, document contents are analyzed in order to identify important visual and audible document elements. Also, information and time attributes are computed for each of these elements. In the optimization step, a two stage knapsack-based algorithm is employed to determine the navigation path of MMNails, given a time constraint. In the last step, selected visual and audible information are synthesized into the audiovisual representation of documents.