5. Synthesis
After the visual, audible, and audiovisual elements to be included are identified in the optimization step, visual and audiovisual elements are ordered in the reading order. Audible elements are added in the time intervals occupied only by visual elements. Visual information is rendered to create animations such as page flipping, pan, and zoom to certain locations on a page and the audible information is synthesized into audio clips.
6. Implementation User Feedback
The optimization routine outputs an actions file which contains all the information needed by the synthesis step, such as the names of the document images to be included in the MMNail, visual animations to be performed (type, coordinates, and duration), and audible document elements to be synthesized. Visual animations are implemented in Flash using ActionScript 2.0. Speech synthesis is implemented using the AT&T Natural Voices Text-to-Speech SDK. After obtaining visual and audio streams, synchronization is performed using Action Script to obtain a playable MMNail.

Figure 3. Interface for (a) document browsing and (b) document viewing.
A document browser interface that displays the thumbnail of the each document is shown in Figure 3.a. The interface is implemented in Flash 6.0, and is compatible with Windows and Macintosh operating systems and PDAs running the Pocket PC OS. When a user selects a document thumbnail in order to view the MMNail representation, automated navigation is activated in the interface given in Figure 3.b. The user has control over playback with the "control bar", which he can use to start, stop, go backward and forward in the MMNail timeline.
We performed a preliminary user study with nine participants in order to understand the usefulness of Multimedia Thumbnails. The users first browsed some documents on a PDA-size display with limited manual navigation. Then they were asked to watch the MMNail clips created for another set of documents. Later users were interviewed on the usefulness of MMNails, particularly the communication of information through both audio and visual channels. They were also asked to give a score between 1 (not useful) and 10 (very useful) to each communication channel. Users give an average score of 7.2 (σ=3.6) and 7.1 (σ=2.7) to the usefulness of visual and audio channels, respectively. They generally liked the fact that zooming in and page flipping operations are automatically performed. Nevertheless, they asked for more control over the playback speed of MMNails. In terms of visual animations, some users pointed out that they sometimes felt that they were lost in the document. They suggested that page flipping operations could be animated explicitly. Additional user comments pointed out that usefulness of the audio depends on the quality of the synthesized speech. Particularly for very short audio segments, such as keywords, understanding the audio content was considered to be difficult. Moreover, some text such as author's names, were incorrectly synthesized in most cases, which caused distraction to the users. They suggested that such text could be just displayed visually without the audio channel. On the other hand, the users found the use of audio channel for figure captions very useful (average score=8.9).






