Pattern recognition is the process of efficiently
detecting any patterns or regularities in the given data. Clustering is an
example of unsupervised machine learning while classification is supervised
learning. The processes can be parametric where in the data is summarised by a
set of parameters or can be non parametric. Linear discriminant analysis, a parametric
classification algorithm is used in testing the significance of gene pathway
and gene network models.
Classification assigns instances to predefined classes based on features.
It analyses and learn association between the features from the training data
to classify the unknown variables. The common classification technique,
decision tree, divides the search space into subsets using divide and conquer
technique. Giving grades for students is a simple classification problem. In
reality classification is teaching computer to do classification from the
derived knowledge. Linear regression is simple classification methods where in
relationship between observed variables are modeled 2.
The input data are categorized into training data and test data. Training
data comprises of representative data from a known category and the test data
is unknown data. A feature extractor is used to extract features from input
data. Features are the parameters or explanatory variables most relevant to the
problem extracted from observations. It can be either categorical, ordinal,
integer or real valued and is represented as a vector. When applied in
bioinformatics the vector consist of frequency of nucleotides such as A, T, G,
C or its 2-mer, 3-mer etc. Dimensionality reduction techniques are implemented
to reduce the number of features.
Feature selection is another pre processing methods used to filter
features to remove unwanted and redundant data and include most relevant or
quality data to produce reliable output. A trainer/classifier, implements any
of the clustering or classification algorithm and maps input to the
corresponding class. The whole process is represented in the diagram given