Research on Digital Forensics Based on Uyghur Web Text Classification

Research on Digital Forensics Based on Uyghur Web Text Classification

Yasen Aizezi (Xinjiang Police college, Urumqi, China), Anwar Jamal (Xinjiang Police College, Urumqi, China), Ruxianguli Abudurexiti (Xinjiang Police College, Urumqi, China) and Mutalipu Muming (Xinjiang Police College, Urumqi, China)
DOI: 10.4018/978-1-7998-3025-2.ch032
OnDemand PDF Download:
No Current Special Offers


This paper mainly discusses the use of mutual information (MI) and Support Vector Machines (SVMs) for Uyghur Web text classification and digital forensics process of web text categorization: automatic classification and identification, conversion and pretreatment of plain text based on encoding features of various existing Uyghur Web documents etc., introduces the pre-paratory work for Uyghur Web text encoding. Focusing on the non-Uyghur characters and stop words in the web texts filtering, we put forward a Multi-feature Space Normalized Mutual Information (M-FNMI) algorithm and replace MI between single feature and category with mutual information (MI) between input feature combination and category so as to extract more accurate feature words; finally, we classify features with support vector machine (SVM) algorithm. The experimental result shows that this scheme has a high precision of classification and can provide criterion for digital forensics with specific purpose.
Chapter Preview

Scheme In This Paper

This paper puts forward a Uyghur digital forensics scheme based on text classification, mainly including 3 parts: (1) Uyghur text pre-processing; (2) feature extraction; (3) text classification. In the stage of feature extraction, this paper improves traditional MI feature extraction which only considers MI of single feature and category and fails to consider the relevance between contextual features, and replaces MI of single feature and category with MI between combination and category of input features.

Complete Chapter List

Search this Book: