Interactive Speech Skimming via Time-Stretched Audio Replay

Wolfgang Hürst (Albert-Ludwigs-Universität Freiburg, Germany) and Tbias Lauer (Albert-Ludwigs-Universität Freiburg, Germany)
Time stretching, sometimes also referred to as time scaling, is a term describing techniques for replaying speech signals faster (i.e., time compressed) or slower (i.e., time expanded) while preserving their characteristics, such as pitch and timbre. One example for such an approach is the SOLA (synchronous overlap and add) algorithm (Roucus & Wilgus, 1985), which is often used to avoid cartoon-character-like voices during faster replay. Many studies have been carried out in the past in order to evaluate the applicability and the usefulness of time stretching for different tasks in which users are dealing with recorded speech signals. One of the most obvious applications of time compression is speech skimming, which describes the actions involved in quickly going through a speech document in order to identify the overall topic or to locate some specific information. Since people can listen faster than they talk, time-compressed audio, within reasonable limits, can also make sense for normal listening, especially in view of He and Gupta (2001), who suggest that the future bottleneck for consuming multimedia contents will not be network bandwidth but people’s limited time. In their study, they found that an upper bound for sustainable speedup during continuous listening is at about 1.6 to 1.7 times the normal speed. This is consistent with other studies such as Galbraith, Ausman, Liu, and Kirby (2003) or Harrigan (2000), indicating preferred speedup ratios between 1.3 and 1.8. Amir, Ponceleon, Blanchard, Petkovic, Srinivasan, and Cohen (2000) found that, depending on the text and speaker, the best speed for comprehension can also be slower than normal, especially for unknown or difficult contents.

