Article: Classification of Alzheimer’s
disease by Using Random Forest
by
Gakiza Canisius and Irivuzimana Aimé Muyombano Ph.D
Scientific Institute of research «SDRInstitute» Kigali-Rwanda
Abstract
Alzheimer’s disease (AD) is a progressive
neurodegenerative disease that leads to the loss of memory and cognition
function. By neuroimaging and natural data from Alzheimer’s disease
Neuroimaging Intuitive (ADNI) database, previous researchers presented the
classifications of the application framework. They applied Random Forest (RF)
classifier to distinguish between
multiple patients and they recommended bagging without changing techniques for
handling imbalanced data. They used RF algorithm for diagnosing based on the
combination of all information for comparisons between AD and Normal Control
(NC), as well as between Multi Cognitive Impairment (MCI) and NC patients.
In
this work, we built our own algorithm named New Random Forest (NRF) to
improve the classification accuracies by running efficiently of the large
dataset as it is able to handle thousands of input variables without deletion
also.
It was also able to classify large amounts of data with high accuracy
and give an estimation of what variables those are important in the diagnosis classification. Our algorithm proved easier to combine different
types of data without additional processing. It’s binary and multiple
classification accuracies are in higher than many other complex algorithms
including the comparisons performances of previous had achieved.
Our
purposed is to build the algorithms which diagnose the disease status of
patients. Using the same data from ADNI on NRF, we achieved classification
accuracies of 90.47% between AD and NC, and 86.69% between MCI and NC. This was
an improvement from 89% for AD and NC and 75% for MCI and NC respectively that
prior research had achieved.
Index Terms—
AD Diagnosis, Random Forest Classifier, Normal Control, MCI, Bagging.
I. INTRODUCTION
Alzheimer’s Disease
(AD) is a type of dementia that causes problems with memory and cognition
function. Though AD grows with age [1], age is not the cause of AD. In general,
AD affects people aged around 64 years [2]. Patients
with MCI have a higher risk of progressing to AD [1, 2]. Recently, AD is known
as most common dementia and has become an increasing public health problem [1,
3].
Computer Aided Diagnosis (CAD) using
machine learning techniques (NRF) diagnoses AD [2, 4, 5]. CAD is used to
develop classification accuracy with AD and MCI [2, 6], and to differentiate
the disease from normal aging, it has become a powerful clinical tool to
identify patients for early treatment which may possibly prolong progression of
the disease [2, 7].
They presented a framework for multiple
classifications based on pairwise similarity measures derived from Breiman
Random Forest (BRF) [7] .They used similarities to construct a manifold
representation from labeled training data and then inferred the clinical labels
of test data plot into this space; BRF algorithm presented a unified model of
random decision for classification, regression, estimation, manifold learning
and semi-supervised learning.
Other researchers used RF filter (RFF) to
identify a subnet of features that provides the highest binary classification
accuracy; it was also able to measure the importance of features to get a good classification
outcome. RFF is applied after removing highly corrected features and those with
greater importance are selected but it does not provide better accuracy [8].
NRF classifies
three classes: AD, MCI, and NC which identifies the
distribution between both classes [8, 10]. To improve
the classification accuracy, we build decision trees which are very intuitive
[11]. Within the proper limitation of trees growth, they tend to over-fit the
training data, making the performance classification higher accuracy. In other words;
we used random vectors build trees.
Bagging is made
from the example in the training set [7]. It is an
ensemble classifier involving of various decision trees [7, 8]; where the final
classification of classes for a test example is obtained by combining all
individual trees and it constructs a collection of decision trees exhibiting
controlled variation [7].
Previous researchers presented BRF
and RFF based on AD bagging derived from decision trees classification [1, 2,
7]. They have also presented a framework for multiple classifications based on
pairwise similar measures derived from their algorithms [7, 12, 13]. The
similarities between these two previous types of research were used in this
research to construct a manifold representation from labeled training dataset
and then to interpret the clinical labels of test data mapped into this space
[7, 14, 15].
During this research, a high-performance
algorithm was created based on a decision tree [7]. The intuition of that
observation is in the same terminal nodes more often than dissimilar ones.
Motivated by implementation successes of this research, we first applied
Random Trees to the 48 features and used the result as new features for the
current RF [2, 6, 7].
We then applied binary
classification to identify and diagnose disease status that this research’s NRF
is the most effective for AD classification. The previous researcher’s used the
training set for each individual tree in a BRF [7] by sampling N examples at same
algorithm with replacement beginning the N presented cases in the dataset. This
is identified as bootstrap sampling and bagging [2]. Our classifiers are most
likely to be the best as NRF’s accuracy’s much higher compared to what other
researchers had achieved.
Arrangements in the rest of the
article are as follows: Section 2 describes the background used in our
methodology. In section 3, we present our methodology. In section 4, elicits
the experimental tests and results. The last section covers the final discussions
and conclusions.
II. BACKGROUND
A. Decision trees
A decision tree is
one of the most successful prediction models and classification tasks [7]. Supervised
learning maps observations of an item to conclude the item’s target value. In
machine learning, decision trees as a simple representation of the classifying
trees in which the internal node is labeled with input features [7, 15].

NRF classifier uses decision
trees numbers in order to improve the classification accuracy’s rate [15, 16, and
17]. Using Classification and Regression Trees (CART) algorithm [18], we can compute
randomly taking a part of
of each item being chosen times
of a mistake in categorizing that item. When
and whether
is the fraction of items labeled with
in the set which is incorrectly labeled
according to the distribution of labels in the subset [12].






Where
is Information significant of entropy.

We apply completely classifiers
to an unnoticed sample x and make prediction of the label
for conforming classifier responds the highest
confidence results:


NRF algorithm is used for
complex classification tasks. The main advantage of NRF is that the model
created can be easily interpreted. The result of confidence ideals might differ
between the binary classifiers and the class distribution composed in the
training set. The limitation of NRF algorithm is that a large dataset of trees
can slow the real-time prediction algorithm and classification. During this
research, the high-performance algorithm was created based on a decision tree
[8].
Therefore, for improving NRF
performance, it seems that many works have been created such as bagging and
majority polling [7, 8, 9]. We give privilege to bagging techniques in our
approach. The intuition of that observation is in the same terminal nodes more
often than dissimilar ones. Motivated by implementation success of this research,
we first applied Random Trees to the 48 features and used the result as new
features for the current random forest.
Indeed, NRF present as linear threshold forest that exhibits all of the nice RF
properties. Our algorithm has been shown better as good famous algorithm
classifiers in the variety of dataset [9]. Widespread use of property may be on
many values coefficient, include its excellent achievement is positive, unit
scale invariance and robustness outliers, time and space complexity, and presentable.
While NRF has many desirable properties, a disadvantage is that it is sensitive
to rotate and other operations to mix variables.
B. Random Forest Classifier
NRF is useful in
the construction of classification trees for the different models [7]. It grows
many classifications trees; therefore to classify a new object of the input vector,
we put the input vector down each of the trees in the forest 7].
NRF runs efficiently on large datasets
as it can handle thousands of input variables without deletion and gives
estimates of what variables
are important in the classification [20]. In our algorithm, the combination of
learning models improves the classification bagging and unbiased models to
create a model with low variance [7, 20,21]. NRF
algorithm works as a large collection decision
trees because many decision trees are used to make a classification that is why
the techniques are based on bagging [7, 21].
These
are referred to the AD dataset for which internal test predictions can be made.
By assembling the predictions of the AD data through all trees [7], an internal
estimate of the generalization error of the NRF is determined.
III. Methodology
We tackle
issues related to AD diagnosis using machine learning algorithms. NRF
classifier such as a machine learning algorithm was applied to the data
features to derivate similarity required for AD classification [22]. In our experiments, we used three types of datasets; AD, MCI and NC
datasets. We evaluated the effectiveness of the proposed of NRF methods considering
a set of classification normal elder controls (NC) problems: AD vs. NC and
early/later MCI vs. NC.
The
decision trees are sensitive to the variations in training datasets; we formed
NRF accordingly to our goal. We used bagging as suggested by Breiman [23, 24].
Bagging works by reducing the variance component of errors, the size of the
training dataset used for each learned is effectively reduced by bootstrap
sampling [7].
The bagging pseudo-code is described technically as
follows:
algorithm
1 BAGGING
Input 1: Bagging Algorithm A
Input 2: Dataset D, item T
Output1: Prediction for a given test instance x
1. For i = 1 to T: Pick randomly class D (i) from D
2. Let M (i) become result of training A on D (i)
3. For i = 1 to T: Chosen test class
2. Let C (i) = Output of M (i) on x
4. Return class that appears most often among in C
(1)...C (T)
Given
some rate of errors in the finite size of training dataset; Breiman used
algorithm variability in this training dataset by reducing the term in
question.
To apply bagging, we build
neural network model bias. However, the initial conditions can lead to high
variables in predictions; the neural networks are considered to have a low bias.
Bagging used D as new training sets D (i), each size of n’=n, by sampling with
replacement, some observation repeated in each D (i). If n’=n, then for large n
the set D (i) is expected to have the fraction (1-n/e) of the unique examples
of D.
The rest being duplicates, the
n models are fitted using the above n bagging samples and combined by averaging
the output or classification [7]. Bagging leads to improvement of unstable
procedures, which include for example, artificial neural network,
classification and regression trees, and subset selection in linear regression.
An interesting application of bagging shows improvements in pre-image learning;
it can mildly degrade the performance of stable methods [7, 8].
The bagging technique allows
the sampling distribution of almost any classification using random sampling
methods. The advantage of bagging is the straightforwardness in deriving
estimates of standard errors, and confidence intervals for complex estimates of
complex parameters. It is also an appropriate way to control and check the stability
of the results.
However, decision trees have
advantage to fit a really complex tree to the data, leading to overfitting and
its accuracy depends on a lot of the presented data. For example, the tree can
become biased towards a specific class if it occurs a lot. Otherwise, develop
to be confused while wearisome to suitably assured rules indirect from the
data.
On the contrary, bagging
doesn’t provide general finite sample guarantees, and the apparent simplicity
may conceal the fact those important assumptions are being made when undertaking
the bagging analysis.
Algorithm
2 RANDOM FOREST
Input 1: Algorithm C for binary classifier L
Input 2: Sample X
Input 3: yi
{1, …,K} is the label class samples Xi

Output1: List of classifiers fk for k
{1,….,K}

- For each k in {1, …, K}
- Build new label vector yi = 1 where
yi = k, and k>=1
- Elsewhere
- Apply L to X, y obtain fk
We
apply all classifiers to an unseen sample x and make prediction of the label k
for which the corresponding classifier responds with the highest confidence
results:

NRF algorithm is used for
complex classification tasks. The main advantage of NRF is that the model
created can easily be interpreted. The result of confidence values may differ
between the binary classifiers and the class distribution balanced in the
training set. The limitation of the NRF algorithm is large of amount of trees can
make the algorithm slow for real-time prediction and classification [7].
IV. EXPERIMENTS AND RESULTS
In
our experiments, we used datasets are available as part of the ADNI database
[6, 22]. It is an ongoing, multiple studies designed to develop clinical,
imaging, and biochemical biomarkers for the early detection and tracking of AD
[2, 7, 10]. The elementary goal of ADNI was to test if the serial series of
measures, clinical and neuropsychological assessments can become combined to
measure the progression of MCI and AD.
In
this paper, ADNI subjects corresponding to Magnetic Reasoning Imaging (MRI) data
are included [7]. This yields a total of 819 subjects including 94 AD patients,
429MCI (309 Early MCI and 120 Later MCI), and 296 normal controls (NC). We
considered binary and multiple classification problems: AD vs. NC and E/LMCI
vs. NC.
TABLE
I. DEMOGRAPHIC
AND CLINICAL INFORMATION OF THE SUBJECT
Number of Disease
|
|||
AD
|
EMCI
|
LMCI
|
NC
|
94
|
309
|
120
|
296
|
In
the different classifications, we considered all three groups, AD, MCI, and NC,
at once. Our proposed method performed the best in diagnosing AD/NC and MCI/NC
patients with accuracies of 90.47% and 86.69% for binary classification [2, 22]. It performed better than
previous researcher’s accuracy are performed of 89% and 75%. See Table II. Which
show us the best of implementation and achievement of the maximum accuracy of
the dataset. The difference is statistically significant with the second best.
TABLE
II. BINARY
CLASSIFICATIONACCURACY AND COMPARISON ACCURACY (%).
Method
Classify
|
N
R. F
|
B.
R. F[7]
|
R.F.F[8]
|
AD VS. NC
|
90.47
|
89
|
82.1
|
(E/LMCI) VS. NC
|
86.69
|
75
|
65.7
|
a. B&F
R F: Breiman and Filter Random Forest Filter,
b. N R.F: Our Random Forest Algorithm.
c. B&F. R.F [7, 8]:
Comparison Accuracy reference papers [7, 8].
We
then compared our binary classification results with other researcher’s results
[8]. Our binary classification achieved high accuracies of 90.47% and 86.69% and
multiple classification methods achieved accuracies of 96.64%; 94.18%, 93.55%
and 97.73% for AD, MCI, and NC respectively. The optimal structures of NRF
algorithm classification and respective performance are represented in Fig. 1.

Fig. 1. Histogram
for Multiple Classification RF Performance Accuracy (%)
It stands considerably improved for using multiple
classification methods which obtained accuracy of 97.73%; 96.64%, 94.18%, 93.55%
for the diagnosis. See Fig. 1. In the MCI diagnosis (E/LMCI vs. NC), the RF
slightly degraded the proposed method (from 93.55% to 97.73%), while the binary classification method increased
(from 86.69% to 90.47%).
Among
those components, it is obvious that binary classification has the most impact
on the accuracy [7, 8].
V. DISCUSSIONS AND CONCLUSION
In
our methodology, we used NRF classification which applied determining the
optimal structure of classification tasks. Different datasets in the same class
were determined for example in AD vs. NC for dataset [2, 12, 22]. We review that this returns the essential of seeing
diverse high-level relations inherent of NRF for different classification
problems [7, 22].
We
performed different experiments whose results are described in Table II. In a comparison
of the NRF method, the proposed method: AD and MCI greatly improved the
diagnostic accuracy and regression over all the classification problems considered
in this work [25]. The proposed method consistently outperformed the competing
methods of multiple classifications with supervised learning [7, 16].
For NRF,
the size of the dataset is paramount to make a good classification performance.
While there is a limited number of samples available in the ADNI dataset [,3,
26], we realized that there is a small sample size, and evidence proof that the
supervised training helps machine learning methodology to find better optimal
parameters for increased accuracy [2].
Likewise,
we could also obtain the best performances in two binary classification
problems [2, 7, 22]. We have the most important characteristics for machine
learning [26] to compare the combination of the classification [2, 7]. We can
regard the trained dataset as filters that can obtain different type relations
of the inputs [1, 28]. There is no standard way to visualize [29] or interpret
the meaning of the trained influence in an intuitive [30] way which still
remains a challenging issue also in the machine learning field [31].
We
would also like to mention here that it is not straightforward to interpret the
meaning of the representations [32]; our experiments explained clearly that
latent information is very important in AD and MCI diagnosis [33, 34]. We
conclude that the method of multiple classifications for current RF method focuses
on regression target the classification [1, 7]. Moreover, we used a relative
dataset (94 AD, 409 MCI, and 296 NC).
However,
in our experiments, we cannot fail to mention that for binary classification
the random AD vs. NC performed better than others 90.47%. The
information indicators for the progression of AD are unspecified to further
analyze the reasons for better performance finding in a large size of the
training dataset [16, 22]. It is not definitive to be highly interesting and it
suggests that the subjects in the show an available separation from ADNI data
[19].
In
conclusion, we have used machine learning of the ADNI dataset. Furthermore, we
have applied current RF for subspace ensemble AD classification [35]. Then, we
combined binary and multiple classifications to improve the classifications accuracy.
In our experiments, the results from the ADNI database show that sparse
representation classification performs well. We use same datasets to achieve
better classification performance on the previous researcher’s classification methods.
The current RF based also on trees can further increase the classification
accuracy by combining multiple classifications of trees.
In future work, we will apply our method to other
datasets for Fludeoxyglicose polyethylene Terephthalate (FDG-PET) as well as
extend multiple classification method biomarkers to further improve the
accuracy of AD classification.
References
[1]
Suk,
Heung-Il, et al. Supervised Discriminative Group Sparse Representation for Mild
Cognitive Impairment Diagnosis. Neuroinformatics (2014): 1-19.
[2]
Suk,
Heung-Il, et al. Latent feature representation with stacked auto-encoder for
AD/MCI diagnosis. Brain Structure and
Function 220.2 (2013): 841-859.
[3]
Van der
Flier, Wiesje M., and Philip Scheltens. Epidemiology and risk factors of
dementia. Journal of Neurology, Neurosurgery & Psychiatry 76.suppl 5
(2005): v2-v7.
[4]
Ramírez,
Javier, et al. Computer-aided diagnosis of Alzheimer’s type dementia combining
support vector machines and discriminant set of features. Information Sciences
237 (2013): 59-72.
[5]
Ramírez,
Javier, J. M. Górriz, Diego Salas-Gonzalez, A. Romero, Míriam López, Ignacio
Álvarez, and Manuel Gómez-Río. Computer-aided diagnosis of Alzheimer’s type
dementia combining support vector machines and discriminant set of features.
Information Sciences 237 (2013): 59-72.
[6]
Khazaee,
Ali, Ata Ebrahimzadeh, and Abbas Babajani-Feremi. Application of advanced
machine learning methods on resting-state fMRI network for identification of
mild cognitive impairment and Alzheimer’s disease. Brain
Imaging and Behavior (2015): 1-19.
[7]
Gray,
Katherine R., et al. Random forest-based similarity measures for multi-modal
classification of Alzheimer's disease. Neuroimage 65 (2013): 167-175.
[8]
Sarica, A.,
et al. Advanced feature selection in multinominal dementia classification from
structural MRI data. Proc MICCAI Workshop Challenge on Computer-Aided Diagnosis
of Dementia Based on Structural MRI Data. 2014.
[9]
Diniz, Breno
SO, Jony A. Pinto Jr, and Orestes Vicente Forlenza. Do CSF total tau,
phosphorylated tau, and β-amyloid 42 help to predict progression of mild
cognitive impairment to Alzheimer's disease? A systematic review and
meta-analysis of the literature. The World Journal of Biological Psychiatry 9.3
(2008): 172-182.
[10]
Li, Feng, et
al. A Robust Deep Model for Improved Classification of AD/MCI Patients. (2015).
[11]
Gironi,
Maira, et al. A global immune deficit in Alzheimer’s disease and mild cognitive
impairment disclosed by a novel data mining process. Journal of Alzheimer's
disease: JAD 43.4 (2015): 1199-1213.
[12]
Barrett K,
McGuire AD, Hoy EE, Kasischke ES. Potential shifts in dominant forest cover in
interior Alaska driven by variations in fire severity. Ecological Applications.
2011 Oct; 21(7):2380-96.
[13]
Pang,
Herbert, et al. Pathway analysis using random forests classification and
regression. Bioinformatics 22.16 (2006): 2028-2036.
[14]
Pang H, Lin
A, Holford M, Enerson BE, Lu B, Lawton MP, Floyd E, Zhao H. Pathway analysis
using random forests classification and regression. Bioinformatics. 2006 Aug
15; 22(16):2028-36.
[15]
Klöppel,
Stefan, et al. Automatic classification of MR scans in Alzheimer's disease.
Brain 131.3 (2008): 681-689.
[16]
Li, Feng, et
al. A Robust Deep Model for Improved Classification of AD/MCI Patients. (2015).
[17]
Gironi,
Maira, et al. A global immune deficit in Alzheimer’s disease and mild cognitive
impairment disclosed by a novel data mining process. Journal of Alzheimer's
disease: JAD 43.4 (2015): 1199-1213.
[18]
Razi,
Muhammad A., and Kuriakose Athappilly. A comparative predictive analysis of
neural networks (NNs), nonlinear regression and classification and regression
tree (CART) models. Expert Systems with Applications 29, no. 1 (2005): 65-74.
[19]
Diniz, Breno
SO, Jony A. Pinto Jr, and Orestes Vicente Forlenza. Do CSF total tau,
phosphorylated tau, and β-amyloid 42 help to predict progression of mild
cognitive impairment to Alzheimer's disease? A systematic review and
meta-analysis of the literature. The World Journal of Biological Psychiatry 9.3
(2008): 172-182.
[20]
Gironi,
Maira, et al. A global immune deficit in Alzheimer’s disease and mild cognitive
impairment disclosed by a novel data mining process. Journal of Alzheimer's
disease: JAD 43.4 (2015): 1199-1213.
[21]
Tong, Tong,
et al. Nonlinear Graph Fusion for Multi-modal Classification of Alzheimer, s
Disease. Machine Learning in Medical Imaging. Springer International Publishing,
2015. 77-84.
[22]
Li, Feng,
Loc Tran, Kim-Han Thung, Shuiwang Ji, Dinggang Shen, and Jiang Li. A Robust
Deep Model for Improved Classification of AD/MCI Patients. (2015).
[23]
Fan, Yong,
et al. Structural and functional biomarkers of prodromal Alzheimer's disease: a
high-dimensional pattern classification study. Neuroimage 41.2 (2008): 277-285.
[24]
Ramírez,
Javier, et al. Computer-aided diagnosis of Alzheimer’s type dementia combining
support vector machines and discriminant set of features. Information Sciences
237 (2013): 59-72.
[25]
Fan, Yong,
et al. Structural and functional biomarkers of prodromal Alzheimer's disease: a
high-dimensional pattern classification study. Neuroimage 41.2 (2008): 277-285.
[26]
Payan,
Adrien, and Giovanni Montana. Predicting Alzheimer's disease: a neuroimaging
study with 3D convolutional neural networks. preprint arXiv:1502.02506 (2015).
[27]
Tong, Tong,
et al. Nonlinear Graph Fusion for Multi-modal Classification of Alzheimer, s
Disease. Machine Learning in Medical Imaging. Springer International
Publishing, 2015. 77-84.
[28]
Zhang, Yudong, et al. Magnetic
resonance brain image classification via stationary wavelet transform and
generalized eigenvalue proximal support vector machine. Journal of Medical
Imaging and Health Informatics 5.7 (2015): 1395-1403.
[29]
Gaonkar, Bilwaj, et al.
Interpreting support vector machine models for multivariate group wise analysis
in neuroimaging. Medical image analysis 24.1 (2015): 190-204.
[30]
Groot, Marius, M. Arfan Ikram,
Saloua Akoudad, Gabriel P. Krestin, Albert Hofman, Aad van der Lugt, Wiro J.
Niessen, and Meike W. Vernooij. Tract-specific white matter degeneration in
aging: The Rotterdam Study. Alzheimer's & Dementia 11, no. 3 (2015):
321-330.
[31]
Zhang Y, Dong Z, Liu A, Wang S,
Ji G, Zhang Z, Yang J. Magnetic resonance brain image classification via
stationary wavelet transform and generalized eigenvalue proximal support vector
machine. Journal of Medical Imaging and Health Informatics. 2015 Nov
1;5(7):1395-403.
[32]
Papagno, Costanza, et al. Idiom
comprehension in Alzheimer’s disease: The role of the central executive. Brain
126.11 (2003): 2419-2430.
[33]
Li, Tie-Qiang, and Lars-Olof
Wahlund. The search for neuroimaging biomarkers of Alzheimer's disease with
advanced MRI techniques. Acta Radiologica 52.2 (2011): 211-222.
[34]
Gironi,
Maira, et al. A global immune deficit in Alzheimer’s disease and mild cognitive
impairment disclosed by a novel data mining process. Journal of Alzheimer's
disease: JAD 43.4 (2015): 1199-1213.
[35]
Farhan,
Saima, Muhammad Abuzar Fahiem, and Huma Tauseef. "An
Ensemble-of-Classifiers Based Approach for Early Diagnosis of Alzheimer’s
Disease: Classification Using Structural Features of Brain Images. "Computational
and mathematical methods in medicine 2014 (2014).
0Awesome Comments!