The Automated Generation and Further Application of Tree-Structure Outline for Lecture Videos with Synchronized Slides

The Automated Generation and Further Application of Tree-Structure Outline for Lecture Videos with Synchronized Slides

Xiaoyin Che (Hasso Plattner Institute, Potsdam, Germany), Haojin Yang (Hasso Plattner Institute, Potsdam, Germany) and Christoph Meinel (Hasso Plattner Institute, Potsdam, Germany)
Copyright: © 2014 |Pages: 17
DOI: 10.4018/ijtem.2014010103
OnDemand PDF Download:
List Price: $37.50


In this paper, the authors illustrate their motivation and method in the automated generation of tree-structure outline for lecture videos with supplementary synchronized slides, and then propose a further application, lecture video segmentation by slide-group-change event, based on the outline previously generated. Starting with OCR (Optical Character Recognition) result, with an approximate accuracy of 90%, the authors attempt to reconstruct the text system of each slide into an up-to-3-level content tree, and then explore logical relations between slides in order to set them hierarchical. A final up-to-6-level outline will be achieved after removing all the redundancy. And the hierarchy of the slides, which is saved in the outline, will largely simplify the additional segmentation process. Evaluation result shows that, the final outline generated based on the test dataset retains about only 1/4 of the original texts from all slides and is organized well, with a high accuracy of 85% at slide title level. And the majority of the segments the authors' get are logically reasonable, while the average length of them is about 5~15 minutes.
Article Preview

Solution Framework

Figure 1 depicts the framework of our solution. Preprocessing is the first part, which contains logo removing and text modification. Outline generation is the most important part. Both intra-slide reconstruction and inter-slides analysis have quite a lot of detailed procedures. Followed by an independent step, the final outline will be achieved. And the segmentation process is comparatively simple with only two steps: logical segmentation and default time segmentation.

Figure 1.

Diagram of proposed solution framework

Generally there are two main challenges in our research, to analyze the slide layout or the slides logic and the robustness on OCR accuracy problem. All steps have to take both these two challenges under consideration.


The whole solution begins with preprocessing on the raw data deriving directly from OCR results, in order to exclude potential interfere by useless slide content and modify the recognition errors as much as possible.

Logo and Foot Line

Some lecture or presentation slides, especially those built on a university or company template, always have a logo. When existing, logo appears in the same position of almost every slide, commonly in a corner. Due to size of the logo and where it locates, it may probably be recognized as a major part inside a single slide in the outline generation process, such as title, which may drastically damage the real content structure of the slide.

To solve this problem, we employ a position-based detection scheme, in which the logo, if it exists, will be found out. Any text-lines which share exactly the same position, have same or very similar text content but locate in different slides will be addressed as logo-candidate and their appearance frequency will be counted. When the detection scheme is finished, those logo-candidates with high appearance frequency over the threshold decided by the total number of slides will be removed permanently. And in order to avoid removing some non-logo but logo-like text content, for example a same title shared by multiple continuous slides, the detection scheme will only be applied in the edge areas of the slides.

Despite the logo, some other kind of template-based slide content, such as foot line, will also be eliminated during this step. Practically they are perhaps not as harmful as the logo, but definitely we cannot benefit from them.

OCR Error Modification

The accuracy of the OCR program we used is approximate 90%, and it would be very irresponsible to offer the 10% ill-recognized texts to the e-learning portal users. In our research, text-lines will be checked by splitting into words. If the average word length is shorter than 2 characters, this text-line will be discarded entirely.

Otherwise, a text-line can also be shortened by eliminating ill-recognized words, which include continuous short words, a word with an abnormally long length or containing too much symbols. Besides, a dictionary for frequently used short words or professional initials such as ‘a’, ‘is’ or ‘OS’ is used to keep these meaningful short words from being deleted.

Tree-Structure Outline Generation

As we mentioned in solution framework chapter, the tree-structure outline generation process contains two major parts, intra-slide reconstruction and inter-slides analysis, and an independent following step to generate the final outline. In this chapter, we will explain the two major parts in detail.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 2 Issues (2013)
Volume 2: 2 Issues (2012)
Volume 1: 2 Issues (2011)
View Complete Journal Contents Listing