This chapter provides insight on pattern recognition by illustrating various approaches and frameworks which aid in the prognostic reasoning facilitated by feature selection and feature extraction. The chapter focuses on analyzing syntactical and statistical approaches of pattern recognition. Typically, a large set of features have an impact on the performance of the predictive model. Hence, there is a need to eliminate redundant and noisy pieces of data before developing any predictive model. The selection of features is independent of any machine learning algorithms. The content-rich information obtained after the elimination of noisy patterns such as stop words and missing values is then used for further prediction. The refinement and extraction of relevant features yields in performance enhancements of future prediction and analysis.
TopIntroduction
Patterns are the sequence of tokens defined over the set of characters. Characters comprise of alphabets, digits or any other ASCII defined values. Pattern recognition is a process of identifying the predefined sequence of characters stated over a collection of alphabets.
Nowadays due to the rapid growth of digital data, identifying the qualified piece of information over a large quantity of data is a challenging task. Hence there is a need for extracting the useful piece of information, which can be accomplished with the aid of pattern recognition frameworks (Zhong, Li & Grance, 2012).
The pattern recognition plays a pre-phase role in the prognostic reasoning for machine learning applications by constructing a pre-processed trained data set. The refinement of a trained data set relies on the attributes selected based on the objectives of applications chosen. The pattern recognition can also be quoted as identifying the sequence of characters termed as strings in preferred order which is referred to as string mining (Dhaliwal, Puglisi & Turpin, 2012).
The pattern recognition process provides room for the elimination of redundant values and noisy content, identifying missing values based on the attributes of the domain and comparison with known patterns to find a match or mismatch. The study of pattern recognition problems involves the identification of structures in real-time and study of theory and techniques required to represent arrangements in the computer recognized format.
The formal languages and automata theory is one of the computer recognized vertical to analyze the different structures described by central concepts of automata theory. The formal languages enable the machine to interact and the user can define the different structures using formal languages. Formal approaches enable the identification of syntactical structures. The finite state machine is one of the tools to describe the pattern/strings defined over the set of input symbols. The context-free grammars, regular expressions are also used to define the syntactical structures (Brauer, Rieger, Mocan & Barczynski, 2011).
There are various features that make formal grammar an attractive tool. Formal grammars are able to provide a structural and statistical description of the data in a condensed matter. It can also be used as a syntactic source to generate all patterns belonging to a specific class (finite and infinite).
Formal methods are successfully being used in a verity of fields such as natural language processing, bioinformatics, and applied behavior analysis. They proved to be effectual in describing the syntax of a language or the structural relations in patterns or data. Formal Grammars consist of syntax rules that describe the structure of the sentences in the domain. A grammatical parser attempts to parse each input using the inferred grammar. If successful, the input is accepted as part of the domain language. Graphical representation for grammatical parsing is usually done using a parse tree.
Formal grammar is an effective and advanced tool for data association, extraction, and modeling. Formal methods have various qualities that make them an attractive research topic. Formal grammars can deliver a statistical and structural description of the data in a condensed matter. It is also capable of applying highly-integrated data mining with capabilities from data processing to a macro data analysis using a common programming language. When structured data are presented as sequences, formal grammars can overcome location-specific structural characteristics. Using formal grammars can also assist in predicting and associating additional data that belongs to the same class. This makes it suitable to be used for a wide range of structured data classes (Habrard A., Bernard M. & Sebban M., 2003).
The statistical approaches are also playing a major role in defining the finite sequence of symbols. The randomness of the attributes must be tested before applying any of the data mining techniques to improve the performance of the predictive model. The uniformity and independence property determines the randomness of the attributes. The different statistical tests are used to determine the randomness and to choose the relevant attributes from the set of attributes. For example chi-square test, Kolmogorov-Smirnov test, autocorrelation test, etc (Banks, Carson II, Nelson & Nicol, 2010).