To explore the discriminatory ability of a decision tree model based on cognitive testing data for the differential diagnosis of schizophrenia.
MethodsThis study enrolled 82 patients with schizophrenia and 82 patients with affective disorders. The cognitive function of the two groups of participants was assessed based on learning, symbol coding, digital span, trail making, and category fluency tests. The logistic regression model in the sklearn package in Python was applied to discriminate and analyse the data for all 11 variables in the MATRICS Consensus Cognitive Battery (MCCB).
ResultsThe recognition rate for schizophrenia and affective disorder using all 11 variables of the MCCB was 82%.
ConclusionThe logistics model based on cognitive data distinguished patients with schizophrenia from those with affective disorder.
Schizophrenia is a mental disorder characterized by hallucinations, delusions, and cognitive dysfunction. This condition has a high disability rate and a lifetime prevalence rate of 0.6%.1 At present, the diagnosis of schizophrenia mainly depends on doctors' judgment, as objective and quantifiable indicators are lacking. Cognitive deficits are one of the main symptoms of schizophrenia. These are the primary symptoms of schizophrenia rather than a result of the disease and are also an important factor affecting the social function and prognosis of patients.2-5 Cognitive deficits in schizophrenia may be objective disease markers with clear auxiliary diagnostic value.6,7 Affective disorder is the most common type of mental illness, with a lifetime prevalence rate as high as 7.4%, with depression accounting for 6.8% and bipolar disorder accounting for 0.6%.1 Patients with affective disorders also have varying degrees of cognitive deficits.8,9
The MATRICS Consensus Cognitive Battery (MCCB) for schizophrenia is often used as an evaluation tool to assess cognitive function in mental illness.10,11 The MCCB was translated and revised by Chuan Shi and colleagues, standardized in the Chinese culture, and has good reliability and validity, with test-retest reliability ranging from 0.73 to 0.94.12,13 The MCCB contains seven dimensions, each of which includes one or more measurement indicators. For example, the spatial span test includes two indicators: the forward and backward spatial span test scores. Many studies have shown that patients with schizophrenia have lower scores than unaffected controls on almost every dimension and each subtest of the MCCB.14-16 Therefore, a cognitive index system based on cognitive measurement indicators, especially a cognitive index system containing multiple cognitive measurements, may be helpful for the auxiliary diagnosis of diseases.
Patients with schizophrenia have a certain degree of affective cognitive impairment17,18 that is often accompanied by depressive symptoms.19 It is difficult to distinguish between schizophrenia and affective disorders based on traditional experience. In recent years, an increasing number of studies have used machine learning for the auxiliary or differential diagnosis of diseases.20 A decision tree algorithm is a machine learning algorithm, first described in "Experiments in Induction" by E. B. Hunt et al. in 1966; however, it was Ross Quinlan who made decision tree a mainstream machine learning algorithm (in 2011, he won the most prestigious award in the field of data mining, the Knowledge Discovery and Data Mining [KDD] Innovation Award). Quinlan proposed the Iterative Dichotomiser 3 (ID3) algorithm in 1979, which stimulated decision tree research.
The present study applied machine learning methods to analyse cognitive test data to identify new methods for performing a differential diagnosis of schizophrenia and affective disorders. This study also used these methods to integrate cognitive test data and evaluate the effect of cognitive variables on the differential diagnosis of schizophrenia and affective disorders.
MethodsParticipantsSchizophrenia group: 82 patients with schizophrenia were selected from the Peking University Sixth Hospital. The enrolment criteria were: 1) meeting the diagnostic criteria for schizophrenia according to the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders in the United States (DSM-4) and 2) age 18–60 years, with an education level above elementary school.
Affective disorder group: 82 patients with affective disorder were selected from the Peking University Sixth Hospital. The enrolment criteria were: 1) meeting the diagnostic criteria for affective disorder according to the DSM-4 and 2) age 18–60 years, with an education level above elementary school.
The exclusion criteria were: 1) associated mental retardation or cerebral organic diseases; 2) severe decline or impulsive excitement and uncooperativeness; 3) substance abuse; 4) hearing or visual impairment; and 5) severe physical illness.
The general clinical information is shown in Table 1. The study was approved by the Ethics Committee of Peking University Sixth Hospital (approval number 2009_1) and all participants signed an informed consent form before the test.
Comparison of basic information and cognitive function between patients with schizophrenia and patients with affective disorders.
This instrument examines the ability to learn and instantly recall, retain, and regenerate word information and to recognize words after a delay. A list of 12 words (three semantic categories with four words in each category) was presented to the participants three times. After each presentation, the participants were asked to recall as many words from the list as possible in any order. Twenty to 30 minutes after the three presentations, the participants were asked to recall the words on this list. Subsequently, a list of 24 words was read to the participants; the words were read one at a time, and the participants were asked if each word has appeared on the original list. The words presented at the recognition stage included the original words, words with the same semantic classification, and unrelated words. The recorded results included the number of correctly named, repeated, and inserted words in trials 1, 2, and 3; delayed recall and re-recognition; and the number of true and false positive words identified in the re-recognition test. The present study used the total numbers of recalls and delayed recalls on the HVLT-R to reflect the patients' verbal learning and memory function.
Symbol coding testThis test is a subtest of the Chinese version of the Brief Assessment of Cognition in Schizophrenia (BACS). It is used to measure the speed of information processing and includes attention and writing exercises. The participants are asked to select the numbers that match different symbols and to fill in the blanks within 90 seconds using a standard template.
Trail making testThis test is a subtest of the Halstead–Reitan (H-R) Neuropsychological Battery. The participants are required to draw a line connecting circles numbered from 1 to 25 that are randomly distributed on a 16K-resolution screen in descending order. The completion time and number of errors are recorded. In addition to measuring the speed of information processing, the test also reflects the participants' attention, cognitive ranking ability, and visual-spatial function.
Spatial span testThis test is similar to the digital span test in the Wechsler Adult Intelligence Scale III (WAIS-III); the only difference is that it is presented visually rather than auditorily. The spatial span test board has 10 squares, each with a number from 1 to 10 printed on the side facing the examiner. Two operations are performed for each sequence length. Two sequence arrangements of the same length are not the same; only the number of blocks that need to be tapped is the same. The present study used the total spatial span scale to reflect the participants' visual-spatial working memory.
Category fluency testThis test requires participants to name as many words as possible within a specific category, such as "animals", "actions", "vegetables", and "vehicles", in 1 minute. This study used the widely used "animals" category for testing. The task required for this test is different from tasks that involve naming words that start with the same sound or the same single word as category fluency tests can activate oral word associations through concepts and categories. The score was the total number of correctly named words. The numbers of repetitions (repeating a correct word) and insertions (words that do not meet the guidelines, such as words that do not fit the category) were also recorded. Many neurocognitive functions are involved in word fluency, which usually reflects information processing speed, vocabulary size, semantic memory, working memory, suppression, and concept maintenance.
Statistical methodsGeneral data processingStatistical analysis of the demographic data was conducted in IBM SPSS Statistics for Windows, version 22.0. T-tests were used to compare the age of the groups, while chi-square tests were used to compare sex and education levels between the groups.
Analysis of decision tree structureThe entire data analysis runs on Python software, and the source code are shown in Fig. 1. The diagnostic classifier was based on decision tree analysis using a machine learning approach. In this study, 80% and 20% of the data were used for training and testing, respectively.
ResultsPatient age, sex, and education level did not differ significantly between the schizophrenia and affective disorder groups. Immediate recall, symbol coding, trail making, and category fluency differed significantly between the groups. The performance of the schizophrenia group was worse than that of the affective disorder group, with the schizophrenia group showing fewer corrections and a longer response time. Delayed recall and spatial span did not differ significantly between the groups (Table 1).
To compare the individual scores of the two groups of patients, scatterplots of eight cognitive test indicators that showed significant differences between the two groups of patients were created. Although the two groups showed significant differences in their average scores, a high degree of data overlap at the individual level was observed (Fig. 2).
The decision tree analysis method was used to incorporate all 10 variables of the cognitive test scale. Deep machine learning and recognition were performed for the cognitive performance of the patients with schizophrenia and affective disorders. The recognition accuracy rate was 82%. The details are listed in Table 2 and Fig. 3.
This study examined the cognitive function of patients with schizophrenia and affective disorder through multiple dimensions and observed significant differences between the two groups. The results of the decision tree analysis showed a correct recognition rate as high as 82% with the use of cognitive tests to distinguish schizophrenia from affective disorders.
Previous studies have reported differences in cognitive function between schizophrenia and affective disorders.19,21 The cognitive impairment in schizophrenia is more serious, which may be related to the older age and lower education level of patients with schizophrenia.22 The present study strictly matched the two groups of patients by age and education level. Other than digit span, all other cognitive abilities were worse in patients with schizophrenia than in those with affective disorders, a finding that may be related to the disease itself.
At present, few classification studies have applied a logistics model based on cognitive indicators for the comparison of schizophrenia and affective disorder. In this study, patients with schizophrenia and patients with affective disorders showed significant differences in multiple cognitive indicators but had large overlaps in a single cognitive indicator (Table 1); thus, it is difficult to distinguish the two groups using a single indicator. This problem can be solved through machine learning methods. The results of this study provide a reference for future research of the same type and for clinical auxiliary diagnosis based on cognitive deficiencies.
Author contributionsXin Yu and Wentian Dong designed the study and critically reviewed the manuscript for important content. Wentian Dong wrote the raw manuscript and analyzed data. Xin Yu, Wentian Dong, Chan Shi and Qihui Niu diagnosed the patients. Yong He, Chuan Shi, Haokui Yu, Jun Ji performed the data collection and data analyses. Wentian Dong and Jiuju Wang drafted the initial manuscript. All of the authors reviewed and revised the manuscript.
Ethical considerationsThe study was approved by the Ethics Committee of Peking University Sixth Hospital, and all subjects signed an informed consent before the test.
This work was supported by a grant from National Key R&D Program of China (2018YFC1314200), Beijing Science and Technology Plan (Z2011000055200092).