3.3 Composition into a Playable Form and MMNail Representation
The optimization output is the list of visual, audible, and audiovisual elements that are included in the MMNail as well as actions, such as hold, zoom, or pan, to be performed with each element. The visual and audiovisual document elements are sorted based on the reading order. Since synchronized audio and visual elements are combined and optimized as audiovisual elements, their synchronization is preserved. The remaining audio channel is filled by first sorting selected audio elements and available audio segments based on their durations. Then, starting with the element with the longest duration, each audio element is placed into the smallest audio channel segment that can fit the audio element.
In our implementation, audible information is converted to mp3 audio clips using the AT&T Natural Voices Text-to-Speech SDK [21], high-resolution document pages are stored in Flash format, and visual animations are rendered real-time using ActionScript 2.0. This composition allows us to use a standard Flash representation, avoid video compression artifacts, and keep the file small by performing visual rendering only during playback. Alternatively, an MMNail can be represented as a stand-alone video clip such as MPEG-4 [25]. The advantage of this type of representation is that any standard video player can be used to playback an MMNail visualization. It is important to note that MMNails can be stored as user data in the source document or in a separate file.
Even though the above representations are "user-friendly" in the sense of taking advantage of using standard playback software, results of our initial user study in [12] show that some flexibility in handling MMNail content is needed in order to allow easier adaptation to user, task, and application parameters. Flexibility can be addressed in various directions such as time duration of an MMNail, usage of audio and visual channels, content selection, etc., requiring a scalable representation. Scalability issues have been researched intensively in the area of still-image and video compression [24][25].
In this paper we present a way for representing MMNails in a time-scalable configuration. This allows users to view a few seconds or several minutes long MMNails without having to regenerate and store separate representations for a few seconds or a few minutes MMNails. Next, a modified optimization module that supports generating time-scalable MMNails is presented. A discussion on other scalable MMNail features is given in Section 7.
3.3.1. Time Scalability
In this section we expand the MMNail optimization module presented in Section 3.2.3 such that it allows time scalability, i.e. creation of MMNail visualizations for a set of N time durations T1, T2,,…, TN with TN> …>T2>T1 . Our goal for scalability is to ensure that elements included in a shorter MMNail with duration Ti are included in any longer MMNail with duration Tn>Ti . This time scalability is achieved by iteratively applying the two-stage approach from Section 3.2.3 to decreasing time durations TN > …> T2 > T1 as follows:
For iteration n=N,...,1:
| For the first stage, | ||
| maximize | ![]() | |
| subject to | ![]() | (7) |
![]() |
where
for q ∈ {v,av}, and x*n+1 is a solution of (7) in iteration n+1.
| For the second stage, | ||
| maximize | ![]() | |
| subject to | ![]() | (8) |
![]() |
where β ∈ [0,1], T^n is the total time duration available at iteration n after the first stage to be filled in the audio channel,
is the number of empty audio intervals after the first stage in iteration n, and x**n+1 is a solution of (8) in iteration n+1. A solution { x*n , x**n } describes contents of a set of time-scalable MMNails for time durations T1, T2,,…, TN, having the property that if a document element e is included in MMNail with duration Ti, it is also included in the MMNail with duration Tn>Ti.












