A Semi-Automatic Annotation Method of Effect Clue Words for Chinese Patents Based on Co-Training

A Semi-Automatic Annotation Method of Effect Clue Words for Chinese Patents Based on Co-Training

Na Deng (Hubei University of Technology, Wuhan, China), Chunzhi Wang (Hubei University of Technology, Wuhan, China), Mingwu Zhang (Hubei University of Technology, Wuhan, China), Zhiwei Ye (Hubei University of Technology, Wuhan, China), Liang Xiao (Hubei University of Technology, Wuhan, China), Jingbai Tian (Hubei University of Technology, Wuhan, China), Desheng Li (Hubei University of Technology, Wuhan, China) and Xu Chen (Zhongnan University of Economics and Law, Wuhan, China)
Copyright: © 2018 |Pages: 19
DOI: 10.4018/IJDWM.2018100101

Abstract

In the era of big data, the latest and most advanced technologies are usually revealed to the world in the form of patents. Patents include abundant technical, economic and legal information. A deep analysis and mining of patents can provide important support for enterprises. Patent effect annotation is an important step in patent analysis and mining, and the extraction of patent effect clue words can greatly improve the accuracy and recall rate of annotation. This article summarizes the classification and characteristics of effect clue words, and proposes a co-training-based method of extracting effect clue words from Chinese patents suitable for various fields. Through a strategy called self-filtering, this method can gradually enrich effect clue words thesaurus by iterations, not relying on any other third-party filters. The experiments give the detailed steps, comparisons and boosting of the method.
Article Preview

Introduction

In the era of big data, all walks of life carry out business through the network, resulting in the accumulation of large amounts of data in the network (Bouramoul, 2016; Mary & Malarvizhi, 2014; Pereira & Pereira, 2015; Qumsiyeh & Ng, 2016; Shen, Liu, Shen, Liu, & Sun, 2017; Shen, Shen, Chen, Huang, & Susilo, 2016; Tsai, 2011; Tsou, 2010). In healthcare field, a large number of patient information, drug information, and diagnosis and treatment information are stored. (Barbantan, Porumb, Lemnaru, & Potolea, 2016; Wang X, 2015). In education, there are a lot of information about students, teachers and specialties. In telecommunications industry, massive traffic data and communication data are generated every day (Trasarti, Giannotti, Nanni, Pedreschi, & Renso, 2011). The analysis and effective use of data in various areas can help each industry arrange resources reasonably, increase productivity and discover opportunities. Mining the hidden information in these data can help managers to make decisions to improve the quality and efficiency of production and life (Daly & Taniar, 2004; Silvestri, Corazza, Benerecetti, & Alicante, 2016; Taniar, Rahayu, Vincent, & Daly, 2008). However, regardless of which field, the latest and most advanced technologies and methods are usually revealed to the world in the form of patents, in order to grab technology heights as early as possible.

Patent is a kind of special text in the Internet, with strict format requirements and writing habits, which would bring in conflict and contradiction. On the one hand, it is necessary to express the techniques and inventions clearly; and on the other hand, the expressions should be as obscure as possible to prevent the invention from being imitated or infringed. As a carrier of human wisdom and innovation, patents contain rich technical, economic and legal information. In recent years, patent has become a competing object of analysis and mining. The effective use of patent information can provide important support for enterprises on technological innovations, avoiding risks, purchasing patents, safeguarding their interests and so on (Mandl, 2017; Tseng, Lin, & Lin, 2007; Zhang, Li, & Li, 2015).

Patent annotation (Agatonovic et al., 2008; Carvalho, Franca, & Lima, 2014) is a key step in patent online retrieval, analysis and mining. Patent annotation extracts important information from patents, such as techniques, functions, keywords and so on, which can help to realize online retrieval, patent analysis and mining on semantic level, reflecting a certain degree of intelligence. In patent online retrieval, you can improve the recall rate by extending the search terms with similar techniques or functions. In patent analysis, patent technology effect matrix can be constructed by enumerating the technologies and effects of multiple patents in tabular form to help patent applicants discover patent minefield and patent blank area (Chen, 2011; Zhang, 2017). In patent mining, patent annotation is the key step of patent classification, clustering and recommendation.

Patent abstract is an important component of patent text. It describes and summarizes patent background, purpose, method and function in brief space, and usually does not include any professional and complicated legal information, while retains most of important information in patent. Thus, patent abstract is a very good data source for patent annotation, analysis and mining.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 15: 4 Issues (2019): 2 Released, 2 Forthcoming
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing