A Hierarchical Online Classifier for Patent Categorization

A Hierarchical Online Classifier for Patent Categorization

Domonkos Tikk (Budapest University of Technology and Economics, Hungary), György Biro (TextMiner Ltd., Hungary) and Attila Törcsvári (Arcanum Development Ltd., Hungary)
Copyright: © 2008 |Pages: 24
DOI: 10.4018/978-1-59904-373-9.ch012
OnDemand PDF Download:


Abstract: Patent categorization (PC) is a typical application area of text categorization (TC). TC can be applied in different scenarios at the work of patent offices depending on at what stage the categorization is needed. This is a challenging field for TC algorithms, since the applications have to deal simultaneously with large number of categories (in the magnitude of 1000–10000) organized in hierarchy, large number of long documents with huge vocabularies at training, and they are required to work fast and accurate at on-the-fly categorization. In this paper we present a hierarchical online classifier, called HITEC, which meets the above requirements. The novelty of the method relies on the taxonomy dependent architecture of the classifier, the applied weight updating scheme, and on the relaxed category selection method. We evaluate the presented method on two large English patent application databases, the WIPO-alpha and the Espace A/B corpora. We also compare the presented method to other TC algorithms on these collections, and show that it outperforms them significantly.

Complete Chapter List

Search this Book:
Table of Contents
Cláudio Chauke Nehme
Hercules Antonio do Prado, Edilson Ferneda
Hercules Antonio do Prado, Edilson Ferneda
Chapter 1
Jie Tang, Mingcai Hong, Duo Liang Zhang, Juanzi Li
This chapter is concerned with the methodologies and applications of information extraction. Information is hidden in the large volume of web pages... Sample PDF
Information Extraction: Methodologies and Applications
Chapter 2
Roberto Penteado, Eric Boutin
The information overload demands that organizations set up new capabilities concerning the analysis of data and texts to create the necessary... Sample PDF
Creating Strategic Information for Oranizations with Structured Text
Chapter 3
Christian Aranha, Emmanuel Passos
This chapter integrates elements from Natural Language Processing, Information Retrieval, Data Mining and Text Mining to support competitive... Sample PDF
Automatic NLP for Competitive Intelligence
Chapter 4
Horacio Saggion
Free text is a main repository of human knowledge, therefore methods and techniques to access this unstructured source of knowledge are of paramount... Sample PDF
Mining Profiles and Definitions with Natural Language Processing
Chapter 5
Ying Liu, Han Tong Loh, Wen Feng Lu
This chapter introduces an approach of deriving taxonomy from documents using a novel document profile model that enables document representations... Sample PDF
Deriving Taxonomy from Documents at Sentence Level
Chapter 6
Shigeaki Sakurai
This chapter introduces knowledge discovery methods based on a fuzzy decision tree from textual data. It argues that the methods extract features of... Sample PDF
Rule Discovery from Textual Data
Chapter 7
Edson Takashi Matsubara, Maria Carolina Monard, Ronaldo Cristiano Prati
This chapter presents semi-supervised multi-view learning in the context of text mining. Semi-supervised learning uses both labelled and unlabelled... Sample PDF
Exploring Unclassified Texts Using Multiview Semisupervised Learning
Chapter 8
Lean Yu, Shouyang Wang, Kin Keung Lai
With the rapid increase of the huge amount of online information, there is a strong demand for Web text mining which helps people discover some... Sample PDF
A Multi-Agent Neural Network System for Web Text Mining
Chapter 9
Jon Atle Gulla, Hans Olaf Borch, Jon Espen Ingvaldsen
Due to the large amount of information on the web and the difficulties of relating user’s expressed information needs to document content... Sample PDF
Contextualized Clustering in Exploratory Web Search
Chapter 10
Li Weigang, Wu Man Qi
This chapter presents a study of Ant Colony Optimization (ACO) to Interlegis Web portal, Brazilian legislation Website. The approach of AntWeb is... Sample PDF
AntWeb—Web Search Based on Ant Behavior: Approach and Implementation in Case of Interlegis
Chapter 11
Leandro Krug Wives, José Palazzo Moreira de Oliveira, Stanley Loh
This chapter introduces a technique to cluster textual documents using concepts. Document clustering is a technique capable of organizing large... Sample PDF
Conceptual Clustering of Textual Documents and Some Insights for Knowledge Discovery
Chapter 12
Domonkos Tikk, György Biro, Attila Törcsvári
Abstract: Patent categorization (PC) is a typical application area of text categorization (TC). TC can be applied in different scenarios at the work... Sample PDF
A Hierarchical Online Classifier for Patent Categorization
Chapter 13
Patricia Bintzler Cerrito
The purpose of this chapter is to demonstrate how text mining can be used to reduce the number of levels in a categorical variable to then use the... Sample PDF
Text Mining to Define a Validated Model of Hospital Rankings
Chapter 14
Wagner Francisco Castilho, Gentil José de Lucena Filho, Hércules Antonio do Prado, Edilson Ferneda
Clustering analysis (CA) techniques consist in, given a set of objects, estimating dense regions of points separated by sparse regions, according to... Sample PDF
An Interpretation Process for Clustering Analysis Based on the Ontology of Language
About the Contributors