Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Web Text Categorization Based on Statistical Merging Algorithm in Big Data Environment

Rujuan Wang, Gang Wang

Source Title: International Journal of Ambient Computing and Intelligence (IJACI) 10(3)

DOI: 10.4018/IJACI.2019070102

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In the field of modern information technology, how to find information quickly, accurately and comprehensively that users really needed has become the focus of research in this field. In this article, a feature selection method based on a complex network is proposed for the structure and content characteristics of large-scale web text information. The preprocessed web text is converted into a complex network. The nodes in the network correspond to the entries in the text. The edges of the network correspond to the links between the entries in the text, and the degree of nodes and the aggregation system are used. Second, the text classification method is studied from the point of view of data sampling, and a text classification method based on density statistics is proposed. This method uses not only the density information of the text feature set in the classification process, but also the use of statistical merging criteria to get the text. The difference information of each feature has a better classification effect for large text collections.

Article Preview

Top

1. Introduction

The uses of Information Technology (IT) has increased day which therefore ended to be everything that we are doing, we can directly go through online on the spot. Information technology is any kinds of software or tools for keeping information, retrieve and sending the information using a certain type of technology such as computer, mobile phones, computer networks and more. With this IT, people are now able to upload, retrieve, store their information and collect information to Big Data. Since Big Data hold massive information with the use of IT such as the internet, students are now able to study online which is called as e-Learning. As the tools provided by Information Technology (IT) have increased continuously, these have affected all aspects of our lives, specifically, in the area of academic. Big Data and e-Learning do bring people or the users specifically, both various benefits and disadvantages because of its multi-function ability. Therefore, it affects our social skills, mental growth, physical and risks of invading our personal information (Internet of Things, n.d.). Web Semantics for Textual and Visual Information Retrieval is a pivotal reference source for the latest academic research on embedding and associating semantics with multimedia information to improve data retrieval techniques (Singh et al., 2017).

Data is the concrete form of information presentation. The main source of knowledge we acquire is text data. Therefore, in order to meet the needs of users for fast and accurate information acquisition, it is necessary to effectively classify and manage massive text data. Traditional text categorization and clustering techniques have many problems in dealing with this information, such as reduced scalability, lack of corpus and inadequate classification accuracy.

In recent years, many text classification methods have been proposed, such as a clustering-based PU active text classification method proposed by Liu Lu et al. (2013), which combines SVM active learning and the improved Rocchio classifier. The method improves the weight evaluation function and improves the accuracy of classification to a certain extent; Xu Li et al. (2012) introduced genetic algorithm into SVM text classifier, which reduced the error text to a certain extent; Dhar and so on proposed categorization of Bangla web text documents based on tf-idf-icf text analysis scheme (Dhar et al., 2018). The paper argues that addition of Inverse Class Frequency (ICF) measure to the Term Frequency (TF) and Inverse Document Frequency (IDF) methods can yield better responses in the act of feature extraction from a language like Bangla. The automatic text classification using BPLion-neural network and semantic word processing proposed by Ranjan (2017). It presents a semantic word processing technique for text categorization that utilizes semantic keywords, instead of using independent features of the keywords in the documents. Zhang Xiaofei et al. (2009) fusion clustering operation based on the KNN text classification method to improve the accuracy of text classification; Improving semi-supervised text classification by using Wikipedia knowledge proposed by Zhang Zhilin (2013). It proposed a new similarity measure based on the semantic relevance between Wikipedia features, and apply this similarity measure to clustering based classification. Zhu Jun et al. (2014) proposed an SVM method-based gene/protein name extraction, the accuracy of classification results reached 71.9. %. This method shows good performance when dealing with long text, but it cannot solve short text classification with sparse feature words and high unevenness of sample. It is obviously unable to meet the needs of data classification in the current network platform. Then there are some clustering algorithms for short text, such as the dynamic combination classification method of short text proposed by Yan Rui (2009). Liu Kang et al. (2014) using deep learning network, the space vector of high-dimensional and sparse short text is changed to a new low-dimensional and essential feature space. The method solves the classification of short text by constructing a tree combination classifier structure.

Complete Article List

Search this Journal:

Reset

Volume 15: 1 Issue (2024)

Volume 14: 1 Issue (2023)

Volume 13: 6 Issues (2022): 1 Released, 5 Forthcoming

Volume 12: 4 Issues (2021)

Volume 11: 4 Issues (2020)

Volume 10: 4 Issues (2019)

Volume 9: 4 Issues (2018)

Volume 8: 4 Issues (2017)

Volume 7: 2 Issues (2016)

Volume 6: 2 Issues (2014)

Volume 5: 4 Issues (2013)

Volume 4: 4 Issues (2012)

Volume 3: 4 Issues (2011)

Volume 2: 4 Issues (2010)

Volume 1: 4 Issues (2009)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Web Text Categorization Based on Statistical Merging Algorithm in Big Data Environment

Abstract

1. Introduction

Complete Article List