Article: Classification of Alzheimer’s disease by Using Random Forest

             Article: Classification of Alzheimer’s disease by Using Random Forest
                                                                by

                               Gakiza Canisius and Irivuzimana Aimé Muyombano Ph.D  

                        Scientific Institute of research «SDRInstitute» Kigali-Rwanda

                            amuyombano@gmail.com and gcanisius@gmail.com
                                                                     

Abstract
Alzheimer’s disease (AD) is a progressive neurodegenerative disease that leads to the loss of memory and cognition function. By neuroimaging and natural data from Alzheimer’s disease Neuroimaging Intuitive (ADNI) database, previous researchers presented the classifications of the application framework. They applied Random Forest (RF) classifier to distinguish   between multiple patients and they recommended bagging without changing techniques for handling imbalanced data. They used RF algorithm for diagnosing based on the combination of all information for comparisons between AD and Normal Control (NC), as well as between Multi Cognitive Impairment (MCI) and NC patients.
In this work, we built our own algorithm named New Random Forest (NRF) to improve the classification accuracies by running efficiently of the large dataset as it is able to handle thousands of input variables without deletion also. It was also able to classify large amounts of data with high accuracy and give an estimation of what variables those are important in the diagnosis classification.  Our algorithm proved easier to combine different types of data without additional processing. It’s binary and multiple classification accuracies are in higher than many other complex algorithms including the comparisons performances of previous had achieved.
Our purposed is to build the algorithms which diagnose the disease status of patients. Using the same data from ADNI on NRF, we achieved classification accuracies of 90.47% between AD and NC, and 86.69% between MCI and NC. This was an improvement from 89% for AD and NC and 75% for MCI and NC respectively that prior research had achieved.
Index Terms AD Diagnosis, Random Forest Classifier, Normal Control, MCI, Bagging.

                                                                                                                                                            I. INTRODUCTION

Alzheimer’s Disease (AD) is a type of dementia that causes problems with memory and cognition function. Though AD grows with age [1], age is not the cause of AD. In general, AD affects people aged around 64 years [2]. Patients with MCI have a higher risk of progressing to AD [1, 2]. Recently, AD is known as most common dementia and has become an increasing public health problem [1, 3].

Computer Aided Diagnosis (CAD) using machine learning techniques (NRF) diagnoses AD [2, 4, 5]. CAD is used to develop classification accuracy with AD and MCI [2, 6], and to differentiate the disease from normal aging, it has become a powerful clinical tool to identify patients for early treatment which may possibly prolong progression of the disease [2, 7].
They presented a framework for multiple classifications based on pairwise similarity measures derived from Breiman Random Forest (BRF) [7] .They used similarities to construct a manifold representation from labeled training data and then inferred the clinical labels of test data plot into this space; BRF algorithm presented a unified model of random decision for classification, regression, estimation, manifold learning and semi-supervised learning.

Other researchers used RF filter (RFF) to identify a subnet of features that provides the highest binary classification accuracy; it was also able to measure the importance of features to get a good classification outcome. RFF is applied after removing highly corrected features and those with greater importance are selected but it does not provide better accuracy [8].

NRF classifies three classes: AD, MCI, and NC which identifies the distribution between both classes [8, 10]. To improve the classification accuracy, we build decision trees which are very intuitive [11]. Within the proper limitation of trees growth, they tend to over-fit the training data, making the performance classification higher accuracy. In other words; we used random vectors build trees.
Bagging is made from the example in the training set [7]. It is an ensemble classifier involving of various decision trees [7, 8]; where the final classification of classes for a test example is obtained by combining all individual trees and it constructs a collection of decision trees exhibiting controlled variation [7].

Previous researchers presented BRF and RFF based on AD bagging derived from decision trees classification [1, 2, 7]. They have also presented a framework for multiple classifications based on pairwise similar measures derived from their algorithms [7, 12, 13]. The similarities between these two previous types of research were used in this research to construct a manifold representation from labeled training dataset and then to interpret the clinical labels of test data mapped into this space [7, 14, 15].

During this research, a high-performance algorithm was created based on a decision tree [7]. The intuition of that observation is in the same terminal nodes more often than dissimilar ones. Motivated by implementation successes of this research, we first applied Random Trees to the 48 features and used the result as new features for the current RF [2, 6, 7].
We then applied binary classification to identify and diagnose disease status that this research’s NRF is the most effective for AD classification. The previous researcher’s used the training set for each individual tree in a BRF [7] by sampling N examples at same algorithm with replacement beginning the N presented cases in the dataset. This is identified as bootstrap sampling and bagging [2]. Our classifiers are most likely to be the best as NRF’s accuracy’s much higher compared to what other researchers had achieved.

Arrangements in the rest of the article are as follows: Section 2 describes the background used in our methodology. In section 3, we present our methodology. In section 4, elicits the experimental tests and results. The last section covers the final discussions and conclusions.

                                                                                                                                                            II. BACKGROUND

A. Decision trees

A decision tree is one of the most successful prediction models and classification tasks [7]. Supervised learning maps observations of an item to conclude the item’s target value. In machine learning, decision trees as a simple representation of the classifying trees in which the internal node is labeled with input features [7, 15].
; Where Y is the target variable that we are trying to classify or use for that task.

NRF classifier uses decision trees numbers in order to improve the classification accuracy’s rate [15, 16, and 17]. Using Classification and Regression Trees (CART) algorithm [18], we can compute randomly taking a part of  of each item being chosen times  of a mistake in categorizing that item. When  and whether is the fraction of items labeled with  in the set which is incorrectly labeled according to the distribution of labels in the subset [12].
     Where  is Information significant of entropy.

We apply completely classifiers to an unnoticed sample x and make prediction of the label  for conforming classifier responds the highest confidence results:          
NRF algorithm is used for complex classification tasks. The main advantage of NRF is that the model created can be easily interpreted. The result of confidence ideals might differ between the binary classifiers and the class distribution composed in the training set. The limitation of NRF algorithm is that a large dataset of trees can slow the real-time prediction algorithm and classification. During this research, the high-performance algorithm was created based on a decision tree [8].
Therefore, for improving NRF performance, it seems that many works have been created such as bagging and majority polling [7, 8, 9]. We give privilege to bagging techniques in our approach. The intuition of that observation is in the same terminal nodes more often than dissimilar ones. Motivated by implementation success of this research, we first applied Random Trees to the 48 features and used the result as new features for the current random forest.
Indeed, NRF present as linear threshold forest that exhibits all of the nice RF properties. Our algorithm has been shown better as good famous algorithm classifiers in the variety of dataset [9]. Widespread use of property may be on many values coefficient, include its excellent achievement is positive, unit scale invariance and robustness outliers, time and space complexity, and presentable. While NRF has many desirable properties, a disadvantage is that it is sensitive to rotate and other operations to mix variables.

B. Random Forest Classifier

NRF is useful in the construction of classification trees for the different models [7]. It grows many classifications trees; therefore to classify a new object of the input vector, we put the input vector down each of the trees in the forest 7].
NRF runs efficiently on large datasets as it can handle thousands of input variables without deletion and gives estimates of what variables are important in the classification [20]. In our algorithm, the combination of learning models improves the classification bagging and unbiased models to create a model with low variance [7, 20,21]. NRF algorithm works as a large collection decision trees because many decision trees are used to make a classification that is why the techniques are based on bagging [7, 21].
These are referred to the AD dataset for which internal test predictions can be made. By assembling the predictions of the AD data through all trees [7], an internal estimate of the generalization error of the NRF is determined.

                                                                                                                                                             III. Methodology

We tackle issues related to AD diagnosis using machine learning algorithms. NRF classifier such as a machine learning algorithm was applied to the data features to derivate similarity required for AD classification [22]. In our experiments, we used three types of datasets; AD, MCI and NC datasets. We evaluated the effectiveness of the proposed of NRF methods considering a set of classification normal elder controls (NC) problems: AD vs. NC and early/later MCI vs. NC.
     The decision trees are sensitive to the variations in training datasets; we formed NRF accordingly to our goal. We used bagging as suggested by Breiman [23, 24]. Bagging works by reducing the variance component of errors, the size of the training dataset used for each learned is effectively reduced by bootstrap sampling [7].
The bagging pseudo-code is described technically as follows:
algorithm 1 BAGGING
Input 1: Bagging Algorithm A
Input 2: Dataset D, item T
Output1: Prediction for a given test instance x
1. For i = 1 to T: Pick randomly class D (i) from D
2. Let M (i) become result of training A on D (i)
3. For i = 1 to T: Chosen test class
2. Let C (i) = Output of M (i) on x
4. Return class that appears most often among in C (1)...C (T)

Given some rate of errors in the finite size of training dataset; Breiman used algorithm variability in this training dataset by reducing the term in question.
To apply bagging, we build neural network model bias. However, the initial conditions can lead to high variables in predictions; the neural networks are considered to have a low bias. Bagging used D as new training sets D (i), each size of n’=n, by sampling with replacement, some observation repeated in each D (i). If n’=n, then for large n the set D (i) is expected to have the fraction (1-n/e) of the unique examples of D.

The rest being duplicates, the n models are fitted using the above n bagging samples and combined by averaging the output or classification [7]. Bagging leads to improvement of unstable procedures, which include for example, artificial neural network, classification and regression trees, and subset selection in linear regression. An interesting application of bagging shows improvements in pre-image learning; it can mildly degrade the performance of stable methods [7, 8].

The bagging technique allows the sampling distribution of almost any classification using random sampling methods. The advantage of bagging is the straightforwardness in deriving estimates of standard errors, and confidence intervals for complex estimates of complex parameters. It is also an appropriate way to control and check the stability of the results.
However, decision trees have advantage to fit a really complex tree to the data, leading to overfitting and its accuracy depends on a lot of the presented data. For example, the tree can become biased towards a specific class if it occurs a lot. Otherwise, develop to be confused while wearisome to suitably assured rules indirect from the data.
On the contrary, bagging doesn’t provide general finite sample guarantees, and the apparent simplicity may conceal the fact those important assumptions are being made when undertaking the bagging analysis. 

Algorithm 2 RANDOM FOREST
Input 1: Algorithm C for binary classifier L
Input 2: Sample X
Input 3:  yi  {1, …,K} is the label class samples Xi
Output1: List of classifiers fk for k {1,….,K}
  1. For each k in {1, …, K}
  2. Build new label vector yi = 1 where yi = k, and k>=1
  3. Elsewhere
  4. Apply L to X, y obtain fk
We apply all classifiers to an unseen sample x and make prediction of the label k for which the corresponding classifier responds with the highest confidence results:
NRF algorithm is used for complex classification tasks. The main advantage of NRF is that the model created can easily be interpreted. The result of confidence values may differ between the binary classifiers and the class distribution balanced in the training set. The limitation of the NRF algorithm is large of amount of trees can make the algorithm slow for real-time prediction and classification [7].

                                                                                                                                  IV. EXPERIMENTS AND RESULTS

In our experiments, we used datasets are available as part of the ADNI database [6, 22]. It is an ongoing, multiple studies designed to develop clinical, imaging, and biochemical biomarkers for the early detection and tracking of AD [2, 7, 10]. The elementary goal of ADNI was to test if the serial series of measures, clinical and neuropsychological assessments can become combined to measure the progression of MCI and AD.

In this paper, ADNI subjects corresponding to Magnetic Reasoning Imaging (MRI) data are included [7]. This yields a total of 819 subjects including 94 AD patients, 429MCI (309 Early MCI and 120 Later MCI), and 296 normal controls (NC). We considered binary and multiple classification problems: AD vs. NC and E/LMCI vs. NC.

TABLE I.  DEMOGRAPHIC AND CLINICAL INFORMATION OF THE SUBJECT
Number of Disease
AD
EMCI
LMCI
NC
94
309
120
296

In the different classifications, we considered all three groups, AD, MCI, and NC, at once. Our proposed method performed the best in diagnosing AD/NC and MCI/NC patients with accuracies of 90.47% and 86.69% for binary classification [2, 22]. It performed better than previous researcher’s accuracy are performed of 89% and 75%. See Table II. Which show us the best of implementation and achievement of the maximum accuracy of the dataset. The difference is statistically significant with the second best.

TABLE II.  BINARY CLASSIFICATIONACCURACY AND COMPARISON ACCURACY (%).

                  Method
Classify

N R. F

B. R. F[7]

R.F.F[8]

AD VS. NC

90.47

89

82.1

(E/LMCI) VS. NC

86.69

75

65.7
     a. B&F R F: Breiman and Filter Random Forest Filter,
b. N R.F: Our Random Forest Algorithm.
c. B&F. R.F [7, 8]: Comparison Accuracy reference papers [7, 8].

We then compared our binary classification results with other researcher’s results [8]. Our binary classification achieved high accuracies of 90.47% and 86.69% and multiple classification methods achieved accuracies of 96.64%; 94.18%, 93.55% and 97.73% for AD, MCI, and NC respectively. The optimal structures of NRF algorithm classification and respective performance are represented in Fig. 1.
Fig. 1.  Histogram for Multiple Classification RF Performance Accuracy (%)
It stands considerably improved for using multiple classification methods which obtained accuracy of 97.73%; 96.64%, 94.18%, 93.55% for the diagnosis. See Fig. 1. In the MCI diagnosis (E/LMCI vs. NC), the RF slightly degraded the proposed method (from 93.55% to 97.73%), while the binary classification method increased (from 86.69% to 90.47%). Among those components, it is obvious that binary classification has the most impact on the accuracy [7, 8].

                                                                   V. DISCUSSIONS AND CONCLUSION

In our methodology, we used NRF classification which applied determining the optimal structure of classification tasks. Different datasets in the same class were determined for example in AD vs. NC for dataset [2, 12, 22].  We review that this returns the essential of seeing diverse high-level relations inherent of NRF for different classification problems [7, 22].
We performed different experiments whose results are described in Table II. In a comparison of the NRF method, the proposed method: AD and MCI greatly improved the diagnostic accuracy and regression over all the classification problems considered in this work [25]. The proposed method consistently outperformed the competing methods of multiple classifications with supervised learning [7, 16].
For NRF, the size of the dataset is paramount to make a good classification performance. While there is a limited number of samples available in the ADNI dataset [,3, 26], we realized that there is a small sample size, and evidence proof that the supervised training helps machine learning methodology to find better optimal parameters for increased accuracy [2].
Likewise, we could also obtain the best performances in two binary classification problems [2, 7, 22]. We have the most important characteristics for machine learning [26] to compare the combination of the classification [2, 7]. We can regard the trained dataset as filters that can obtain different type relations of the inputs [1, 28]. There is no standard way to visualize [29] or interpret the meaning of the trained influence in an intuitive [30] way which still remains a challenging issue also in the machine learning field [31].
We would also like to mention here that it is not straightforward to interpret the meaning of the representations [32]; our experiments explained clearly that latent information is very important in AD and MCI diagnosis [33, 34]. We conclude that the method of multiple classifications for current RF method focuses on regression target the classification [1, 7]. Moreover, we used a relative dataset (94 AD, 409 MCI, and 296 NC).

However, in our experiments, we cannot fail to mention that for binary classification the random AD vs. NC performed better than others 90.47%. The information indicators for the progression of AD are unspecified to further analyze the reasons for better performance finding in a large size of the training dataset [16, 22]. It is not definitive to be highly interesting and it suggests that the subjects in the show an available separation from ADNI data [19].

In conclusion, we have used machine learning of the ADNI dataset. Furthermore, we have applied current RF for subspace ensemble AD classification [35]. Then, we combined binary and multiple classifications to improve the classifications accuracy. In our experiments, the results from the ADNI database show that sparse representation classification performs well. We use same datasets to achieve better classification performance on the previous researcher’s classification methods. The current RF based also on trees can further increase the classification accuracy by combining multiple classifications of trees.
 In future work, we will apply our method to other datasets for Fludeoxyglicose polyethylene Terephthalate (FDG-PET) as well as extend multiple classification method biomarkers to further improve the accuracy of AD classification.

References

              [1]            Suk, Heung-Il, et al. Supervised Discriminative Group Sparse Representation for Mild Cognitive Impairment Diagnosis. Neuroinformatics (2014): 1-19.
              [2]            Suk, Heung-Il, et al. Latent feature representation with stacked auto-encoder for AD/MCI diagnosis.  Brain Structure and Function 220.2 (2013): 841-859.
              [3]            Van der Flier, Wiesje M., and Philip Scheltens. Epidemiology and risk factors of dementia. Journal of Neurology, Neurosurgery & Psychiatry 76.suppl 5 (2005): v2-v7.
              [4]            Ramírez, Javier, et al. Computer-aided diagnosis of Alzheimer’s type dementia combining support vector machines and discriminant set of features. Information Sciences 237 (2013): 59-72.
              [5]            Ramírez, Javier, J. M. Górriz, Diego Salas-Gonzalez, A. Romero, Míriam López, Ignacio Álvarez, and Manuel Gómez-Río. Computer-aided diagnosis of Alzheimer’s type dementia combining support vector machines and discriminant set of features. Information Sciences 237 (2013): 59-72.
              [6]            Khazaee, Ali, Ata Ebrahimzadeh, and Abbas Babajani-Feremi. Application of advanced machine learning methods on resting-state fMRI network for identification of mild cognitive impairment and Alzheimer’s disease. Brain Imaging and Behavior (2015): 1-19.
              [7]            Gray, Katherine R., et al. Random forest-based similarity measures for multi-modal classification of Alzheimer's disease. Neuroimage 65 (2013): 167-175.
              [8]            Sarica, A., et al. Advanced feature selection in multinominal dementia classification from structural MRI data. Proc MICCAI Workshop Challenge on Computer-Aided Diagnosis of Dementia Based on Structural MRI Data. 2014.
              [9]            Diniz, Breno SO, Jony A. Pinto Jr, and Orestes Vicente Forlenza. Do CSF total tau, phosphorylated tau, and β-amyloid 42 help to predict progression of mild cognitive impairment to Alzheimer's disease? A systematic review and meta-analysis of the literature. The World Journal of Biological Psychiatry 9.3 (2008): 172-182.
         [10]            Li, Feng, et al. A Robust Deep Model for Improved Classification of AD/MCI Patients. (2015).
         [11]            Gironi, Maira, et al. A global immune deficit in Alzheimer’s disease and mild cognitive impairment disclosed by a novel data mining process. Journal of Alzheimer's disease: JAD 43.4 (2015): 1199-1213.
         [12]            Barrett K, McGuire AD, Hoy EE, Kasischke ES. Potential shifts in dominant forest cover in interior Alaska driven by variations in fire severity. Ecological Applications. 2011 Oct; 21(7):2380-96.
         [13]            Pang, Herbert, et al. Pathway analysis using random forests classification and regression. Bioinformatics 22.16 (2006): 2028-2036.
         [14]            Pang H, Lin A, Holford M, Enerson BE, Lu B, Lawton MP, Floyd E, Zhao H. Pathway analysis using random forests classification and regression. Bioinformatics. 2006 Aug 15; 22(16):2028-36.

         [15]            Klöppel, Stefan, et al. Automatic classification of MR scans in Alzheimer's disease. Brain 131.3 (2008): 681-689.
         [16]            Li, Feng, et al. A Robust Deep Model for Improved Classification of AD/MCI Patients. (2015).
         [17]            Gironi, Maira, et al. A global immune deficit in Alzheimer’s disease and mild cognitive impairment disclosed by a novel data mining process. Journal of Alzheimer's disease: JAD 43.4 (2015): 1199-1213.
         [18]            Razi, Muhammad A., and Kuriakose Athappilly. A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models. Expert Systems with Applications 29, no. 1 (2005): 65-74.
         [19]            Diniz, Breno SO, Jony A. Pinto Jr, and Orestes Vicente Forlenza. Do CSF total tau, phosphorylated tau, and β-amyloid 42 help to predict progression of mild cognitive impairment to Alzheimer's disease? A systematic review and meta-analysis of the literature. The World Journal of Biological Psychiatry 9.3 (2008): 172-182.
         [20]            Gironi, Maira, et al. A global immune deficit in Alzheimer’s disease and mild cognitive impairment disclosed by a novel data mining process. Journal of Alzheimer's disease: JAD 43.4 (2015): 1199-1213.
         [21]            Tong, Tong, et al. Nonlinear Graph Fusion for Multi-modal Classification of Alzheimer, s Disease. Machine Learning in Medical Imaging. Springer International Publishing, 2015. 77-84.
         [22]            Li, Feng, Loc Tran, Kim-Han Thung, Shuiwang Ji, Dinggang Shen, and Jiang Li. A Robust Deep Model for Improved Classification of AD/MCI Patients. (2015).
         [23]            Fan, Yong, et al. Structural and functional biomarkers of prodromal Alzheimer's disease: a high-dimensional pattern classification study. Neuroimage 41.2 (2008): 277-285.
         [24]            Ramírez, Javier, et al. Computer-aided diagnosis of Alzheimer’s type dementia combining support vector machines and discriminant set of features. Information Sciences 237 (2013): 59-72.
         [25]            Fan, Yong, et al. Structural and functional biomarkers of prodromal Alzheimer's disease: a high-dimensional pattern classification study. Neuroimage 41.2 (2008): 277-285.
         [26]            Payan, Adrien, and Giovanni Montana. Predicting Alzheimer's disease: a neuroimaging study with 3D convolutional neural networks. preprint arXiv:1502.02506 (2015).
         [27]            Tong, Tong, et al. Nonlinear Graph Fusion for Multi-modal Classification of Alzheimer, s Disease. Machine Learning in Medical Imaging. Springer International Publishing, 2015. 77-84.
         [28]            Zhang, Yudong, et al. Magnetic resonance brain image classification via stationary wavelet transform and generalized eigenvalue proximal support vector machine. Journal of Medical Imaging and Health Informatics 5.7 (2015): 1395-1403.
         [29]            Gaonkar, Bilwaj, et al. Interpreting support vector machine models for multivariate group wise analysis in neuroimaging. Medical image analysis 24.1 (2015): 190-204.
         [30]            Groot, Marius, M. Arfan Ikram, Saloua Akoudad, Gabriel P. Krestin, Albert Hofman, Aad van der Lugt, Wiro J. Niessen, and Meike W. Vernooij. Tract-specific white matter degeneration in aging: The Rotterdam Study. Alzheimer's & Dementia 11, no. 3 (2015): 321-330.

         [31]            Zhang Y, Dong Z, Liu A, Wang S, Ji G, Zhang Z, Yang J. Magnetic resonance brain image classification via stationary wavelet transform and generalized eigenvalue proximal support vector machine. Journal of Medical Imaging and Health Informatics. 2015 Nov 1;5(7):1395-403.

         [32]            Papagno, Costanza, et al. Idiom comprehension in Alzheimer’s disease: The role of the central executive. Brain 126.11 (2003): 2419-2430.
         [33]            Li, Tie-Qiang, and Lars-Olof Wahlund. The search for neuroimaging biomarkers of Alzheimer's disease with advanced MRI techniques. Acta Radiologica 52.2 (2011): 211-222.
         [34]            Gironi, Maira, et al. A global immune deficit in Alzheimer’s disease and mild cognitive impairment disclosed by a novel data mining process. Journal of Alzheimer's disease: JAD 43.4 (2015): 1199-1213.

         [35]            Farhan, Saima, Muhammad Abuzar Fahiem, and Huma Tauseef. "An Ensemble-of-Classifiers Based Approach for Early Diagnosis of Alzheimer’s Disease: Classification Using Structural Features of Brain Images. "Computational and mathematical methods in medicine 2014 (2014).