Reference Hub

This research has been cited in:

Article
An Ontological Framework for Information Extraction From Diverse Scientific SourcesIEEE Access10.1109/ACCESS.2021.3063181
Chapter
TEKNO: Preparing Legacy Technical Documents for Semantic Information SystemsNatural Language Processing and Information Systems10.1007/978-3-319-59569-6_51
Conference
PDFFigures 2.0Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries10.1145/2910896.2910904
Conference
An Unsupervised Approach for Automatic Discovery of Metadata in Document Images2016 Fifteenth Mexican International Conference on Artificial Intelligence (MICAI)10.1109/MICAI-2016.2016.00009
Conference
WiSe — Slide Segmentation in the Wild2019 International Conference on Document Analysis and Recognition (ICDAR)10.1109/ICDAR.2019.00062
Article
PDF text classification to leverage information extraction from publication reportsJournal of Biomedical Informatics10.1016/j.jbi.2016.03.026
Article
On The Current State of Scholarly Retrieval SystemsEngineering, Technology & Applied Science Research10.48084/etasr.2448
Chapter
Inventory and Content Separation in Grammatical Descriptions of Languages of the WorldLinking Theory and Practice of Digital Libraries10.1007/978-3-030-86324-1_3
Article
A hybrid strategy to extract metadata from scholarly articles by utilizing support vector machine and heuristicsScientometrics10.1007/s11192-023-04774-7
Article
A Page Object Detection Method Based on Mask R-CNNIEEE Access10.1109/ACCESS.2021.3121152
Chapter
Robust Argumentative Zoning for Sensemaking in Scholarly DocumentsAdvanced Language Technologies for Digital Libraries10.1007/978-3-642-23160-5_10
Chapter
Information Extraction from PDF Sources Based on Rule-Based System Using Integrated FormatsSemantic Web Challenges10.1007/978-3-319-46565-4_23
Chapter
Knowledge Extraction and Modeling from Scientific PublicationsSemantics, Analytics, Visualization. Enhancing Scholarly Data10.1007/978-3-319-53637-8_2
Conference
SPaSe - Multi-Label Page Segmentation for Presentation Slides2019 IEEE Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV.2019.00082
Article
End-to-end dilated convolution network for document image semantic segmentationJournal of Central South University10.1007/s11771-021-4731-9
Article
Functional structure identification of scientific documents in computer scienceScientometrics10.1007/s11192-018-2640-y
Conference
SemAcSearch: A semantically modeled academic search engine2017 Conference on Information and Communication Technology (CICT)10.1109/INFOCOMTECH.2017.8340633
Article
A New Citation Recommendation Strategy Based on Term Functions in Related Studies SectionJournal of Data and Information Science10.2478/jdis-2021-0022
Conference
Scholarly Data Mining: Making Sense of Scientific Literature2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL)10.1109/JCDL.2017.7991622
Conference
Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR.2017.462

Logical Structure Recovery in Scholarly Articles with Rich Document Features

Minh-Thang Luong, Thuy Dung Nguyen, Min-Yen Kan

Source Title: Multimedia Storage and Retrieval Innovations for Digital Library Systems

ISBN13: 9781466609006|ISBN10: 1466609001|EISBN13: 9781466609013

DOI: 10.4018/978-1-4666-0900-6.ch014

Cite Chapter Cite Chapter

MLA

Luong, Minh-Thang, et al. "Logical Structure Recovery in Scholarly Articles with Rich Document Features." Multimedia Storage and Retrieval Innovations for Digital Library Systems, edited by Chia-Hung Wei, et al., IGI Global, 2012, pp. 270-292. https://doi.org/10.4018/978-1-4666-0900-6.ch014

APA

Luong, M., Nguyen, T. D., & Kan, M. (2012). Logical Structure Recovery in Scholarly Articles with Rich Document Features. In C. Wei, Y. Li, & C. Gwo (Eds.), Multimedia Storage and Retrieval Innovations for Digital Library Systems (pp. 270-292). IGI Global. https://doi.org/10.4018/978-1-4666-0900-6.ch014

Chicago

Luong, Minh-Thang, Thuy Dung Nguyen, and Min-Yen Kan. "Logical Structure Recovery in Scholarly Articles with Rich Document Features." In Multimedia Storage and Retrieval Innovations for Digital Library Systems, edited by Chia-Hung Wei, Yue Li, and Chih-Ying Gwo, 270-292. Hershey, PA: IGI Global, 2012. https://doi.org/10.4018/978-1-4666-0900-6.ch014

Export Reference

Favorite

View Full Text HTML

View Full Text PDF

Abstract

Scholarly digital libraries increasingly provide analytics to information within documents themselves. This includes information about the logical document structure of use to downstream components, such as search, navigation, and summarization. In this paper, the authors describe SectLabel, a module that further develops existing software to detect the logical structure of a document from existing PDF files, using the formalism of conditional random fields. While previous work has assumed access only to the raw text representation of the document, a key aspect of this work is to integrate the use of a richer representation of the document that includes features from optical character recognition (OCR), such as font size and text position. Experiments reveal that using such rich features improves logical structure detection by a significant 9 F1 points, over a suitable baseline, motivating the use of richer document representations in other digital library applications.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Logical Structure Recovery in Scholarly Articles with Rich Document Features

MLA

APA

Chicago

Export Reference

Abstract

Request Access