The US Preventative Services Taskforce recommends screening adults for depression in primary care where adequate systems are established to ensure accurate diagnosis, effective treatment and follow-up. However, there is currently no consensus on which screening tool is most suitable for use in primary healthcare. We aim to systematically review the literature for operating characteristics of depression screening tools capable of self-administration in primary healthcare and meta-analyse the psychometric characteristics of these tools to determine their performance and accuracy.
MethodsAn electronic literature search of EMBASE, Medline and CINAHL Complete was conducted from January 1982 to September 15, 2019 using the keywords: depression, screening, primary healthcare and adult. General and psychometric characteristics were extracted for screening tools studied in primary healthcare only when assessed against a ‘reference-standard’.
ResultsEighty-one studies from 22 countries were included in the review. Forty unique depression screening tools suitable for self-administration were identified in studies yielding 138 psychometric data sets. Based on ease of administration, 18 screening tools were suitable for use in primary healthcare. Of the tools meta-analysed, only the PHQ-9 and WHO-5 displayed superior accuracy and were easily administered.
ConclusionAlthough numerous depression screening tools are suitable for use in primary care based on ease of administration, the PHQ-9 was the most widely assessed tool and displayed superior DOR, a-ROC, specificity and LR + . Our review supports the use of the PHQ-9 as a brief, easily administered depression screening tool with superior discriminatory performance and robust psychometric characteristics in primary care settings.
Depression is the most common mental health disorder in the world,1 with a reported prevalence in primary care patients between 5 and 10%.2 The prevalence of depression in the USA increased from 3.3% in 1991 to 8.2% in 2008,3,4 and in Australia from 6.8% to 10.3% in the ten years to 2008.5 If the current trend continues, depression will become the second most prevalent condition causing morbidity globally by the end of 2020.1 Research found 20% of Americans adults and 15% of Australians aged 16–85 years will experience depressive symptoms at least once in their lifetime.6,7
Depression exerts a significant financial burden on health care systems and society.4,8 The human costs are also substantial, with depression affecting relationships as well as physical and emotional health and mortality. The mean life span of depressed patients has been reported to be 5–25 years shorter than the general population.2,9 Individuals with depression also carry an increased risk of suicide,7,10 and alcohol and drug abuse.11 Despite this, depression continues to be under-detected and under-treated in primary care.12–14 Patients with depression are more likely to initially visit a primary healthcare practitioner than be treated by a mental health professional.15 However, primary healthcare practitioners find it challenging to identify patients with depression as it often presents with other physical symptoms.16 The use of a screening tool may assist primary healthcare practitioners in improving the detection of depression in this setting.
{Coyne, 2001 #1562}In an effort to improve detection rates and reduce the disease burden, the US Preventative Services Taskforce [USPSTF] recommends screening to identify adults with depression in primary care settings where operational strategies exist to ensure appropriate diagnosis, referral and management of patients screening positive for depression.17 Currently however, there is no consensus on which screening tool is most suitable for use in primary healthcare. Indeed, the USPSTF does not make any specific recommendation, but suggests any commonly used depression screening tool may be used.17 Hence, a degree of uncertainty exists around choosing an appropriately validated depression screening tool for use in primary healthcare settings.
Many self-administered screening tools for depression have been developed and validated in various settings in the past four decades, with numerous studies comparing the accuracy and performance of one or several depression screening tools in primary care.18–20 The most recent reviews on the characteristics of depression screening tools in primary care were undertaken in 2002 and 201821,22 The 2002 review evaluated the general characteristics of the tools identified, including administrative characteristics such as time to complete the tool, however focused primarily on the sensitivity and specificity of the screening tools and did not evaluate other useful psychometric parameters describing performance such as likelihood ratios and predictive values.23–25 While the review in 2018 provided a more comprehensive evaluation of the psychometric properties of a wide range of instruments, it did not examine general characteristics relating to the ease of administration such as time taken, or level of literacy required, to complete the tool. Ideally, a tool that screens for depression in primary care should be acceptable to both the individual being screened26 and the practitioner.27,28 To minimise the administration burden, it should be brief, easy to understand and administer. However, the assumption that a brief, simple depression screening tool is more acceptable to both patient and practitioner has not been well researched.20 Nevertheless, the ease of administration, reflected by characteristics such as time taken to complete the tool and levels of literacy required, are important considerations when choosing a depression screening tool in primary care.29,30 Consequently, longer tools (≥15 questions),31 tools with complex scoring methods, or tools considered more difficult for patients to understand, may limit their utility in primary healthcare settings by adding to the burden of administration.
To date, no review has systematically evaluated both measurable psychometric properties and the administrative operating characteristics of depression screening tools capable of self-administration in primary care. For the purposes of this manuscript we have defined administrative operating characteristics as those related to the administration of the tool including literacy level required, complexity of scoring, and time to complete the tool. The aim of this systematic review and meta-analysis was to comprehensively evaluate all characteristics of depression screening tools capable of self-administration to expand the evidence around these characteristics. Adding to this body of evidence will assist primary healthcare practitioners in making a more informed choice on the most suitable depression screening tool.
MethodsSearch strategy and selection criteriaAn electronic literature search was conducted using EMBASE, Medline and CINAHL Complete from January 1, 1982 to 15 September 2019. The starting date was chosen because depression screening in primary care became more frequently evaluated around that time.32 The search used the terms ‘depression’ or ‘major depression’ or ‘depressive symptoms’ or ‘depressive disorder’ and ‘screening’ or ‘screening tool’ or ‘screening instrument’ or’ screening test’ or ‘screening questionnaire’ or ‘risk assessment’ and ‘adult’ or ‘adults’ and ‘primary healthcare’ or ‘primary care’ or ‘general practice’. An example of the search strategy is provided as a supplementary file. References from included articles were also searched manually for additional articles satisfying the search criteria that were not captured by the electronic search. Inclusion and exclusion criteria are presented in Box 1. Studies evaluating tools requiring clinician expertise or standardized structured psychiatric interviews were excluded as they require considerable training and skill to administer,16 and are generally more time consuming and not suitable for primary care screening. Specific depression-related conditions such as pre-menstrual dysphoric disorder, post-partum or manic depression, isolated dysthymia, or depression with other psychiatric disorders were excluded as they were not the focus of this review. While geriatric-specific tools were also excluded, studies with geriatric patients were included.
Inclusion Exclusion criteria.
Inclusion criteria: | Exclusion criteria: |
---|---|
Written in the English language | Editorials, letters, surveys, conference abstracts, case reports |
Adults aged 18 years or over | Reference (gold) standard not used |
Screening for depression/major depression/ depressive disorder/depressive symptoms | Screening tools requiring clinician expertise |
Screening tool compared to reference (gold) standard | Screening in hospital settings |
Screening tool capable of self-administration | Population-specific screening tools (e.g geriatric, cardiology patients) |
Sensitivity and specificity were reported | Pre-menstrual dysphoric disorder, post-partum depression, manic depression, depression with other psychiatric disorders |
Screening conducted in primary health care settings |
Two reviewers (EW and PM) independently screened and extracted data from articles obtained from databases. After duplicates were removed using EndNote X8, titles and abstracts were reviewed against the inclusion criteria. Full texts of remaining articles were independently reviewed and included in the review after any differences were resolved by consensus. Data extraction from included articles was independently undertaken on the screening tool(s) administered, the reference standard, sample size, prevalence of depression, administrative operating characteristics, cut-off values and psychometric characteristics.
Quality assessment of studies was independently assessed by two reviewers (PM and EW or DN) using the revised Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool.33 All QUADAS-2 signalling questions were used to assess the risk of bias (RoB) for patient selection, index test, reference standard and flow and timing. In keeping with reference standards used in studies, the appropriate interval was defined as ‘in the past month’ for the signalling question ‘Was there an appropriate interval between the index test and reference standard’? Any discrepancies in risk of bias assessment were resolved through consensus.
Screening tool evaluationAdministrative operating characteristics, including the number of questions, administration time, ease of scoring and required level of literacy, were extracted to assess the ease of administration and suitability for use in primary care settings. Tools containing ≥15 questions were considered long,31 and tools requiring logistic regression or linear transformation were considered difficult to score. Tools requiring higher levels of literacy were considered more difficult for respondents to answer. The level of literacy required for each tool as reported in the manuscript was extracted from relevant studies or from descriptive reviews,34 as low respondent literacy levels may influence screening tool accuracy.31
Psychometric characteristics were extracted for each screening tool at the recommended cut-off point determined by developers of the tool wherever possible. If unavailable, the optimum cut-off point reported by the relevant study was extracted. The optimum cut-off point, in the opinion of the author of each study, represents the most suitable trade-off between sensitivity and specificity using the receiver operating characteristic curve to maximise the correct classification.35 Sensitivity and specificity (Se%, Sp%) were extracted to identify tools with values >80%, as they are considered more effective at identifying people screening positive with depression and people screening negative without depression, respectively.18,36 Positive and negative predictive values (PPV%, NPV%) were extracted to estimate the probability of obtaining correct screening results.(29) Area under the receiver operating characteristic curve (a-ROC) values were extracted to assess overall screening tool accuracy. Tools with a-ROC values ≥0.90 are considered to be highly accurate, a-ROC values of 0.7−0.9 indicate ‘moderate to good’ accuracy, and a-ROC values of 0.5−0.7 indicate low accuracy.24 Positive and negative likelihood ratios (LR+, LR-) were extracted to estimate how much more (or less) likely patients with depression are to have the observed result than patients without depression.23 A tool with a higher LR + suggests it is more likely someone with depression will have a positive screen than someone without depression and a tool with a lower LR- suggests it is more likely someone without depression will have a negative screen than someone with depression.23 Missing predictive values and likelihood ratios were calculated using the sensitivity, specificity and prevalence of depression reported by the relevant study.
Data analysisMeta-analysis was performed on screening tools only when evaluated by five or more studies where true positive and negative (TP, TN), and false positive and negative (FP, FN) counts were available. Only tools evaluated by ≥5 studies were meta-analysed because sufficient observations were required to estimate the underlying random effects with sufficient precision.37 Diagnostic odds ratios (DOR’s) were calculated by constructing 2 × 2 tables to determine TP, TN, FP and FN counts and using the formula described by Glas et al.38 A DOR is a ratio of the odds of obtaining a positive screen if the person has depression relative to the odds of obtaining a positive screen if the person does not have depression.38 Higher DOR values indicate the screening tool has better discriminatory performance than those with a lower DOR.38 Meta-analyses of test accuracy used the Rutter and Gatsonis hierarchical summary ROC (HSROC) model to estimate the sensitivity, specificity, diagnostic odds ratio (DOR), positive and negative likelihood ratios, and summary ROC curves.39 Heterogeneity was modelled by including covariates in the HSROC model and assessing whether the likelihood ratio test p-value indicated improved model fit. Statistical models were programmed using METADAS macro in SAS software v9.4,40 in accordance with the Cochrane Handbook recommendations for meta-analyses of diagnostic test accuracy.41 SAS METADAS macro was used to investigate heterogeneity. Heterogeneity was investigated only when ≥8 studies were available for that tool because sufficient studies are required to provide sufficient precision. HSROC curves and coupled forest plots were generated using RevMan5.42
ResultsThe electronic database search identified 1154 articles. Eleven additional articles were identified after examining references from full-text articles. Of these 1165 articles, 59 were duplicates and excluded, 940 were excluded after title and abstract review and eighty-five articles were excluded as psychometric data were unreported. Eighty-one studies satisfied the inclusion criteria and were included in the review. Forty-seven studies evaluated tools that were included in the meta-analysis. Fig. 1 shows the study selection process.
Twelve of the eighty-one studies evaluated three or more screening tools. Studies were conducted in 22 countries, including the USA (n = 32), UK (n = 11) and Germany (n = 6). Sample size ranged from 31 to 5438 people (median 265). Major depressive disorder (MDD) was the outcome measured in 79% of articles (n = 64), ‘depression’ in 20% (n = 16) and minor or major depression in 1% (n = 1) of studies. The prevalence of depression across studies in primary care varied from 2% to 64% (median = 12.8%). Table 1 summarises studies evaluating depression screening tools capable of self-administration in primary care.
Studies of depression screening tools in primary healthcare.
Author. Year | Country | Tool | Cut-off (c/o) | Comparator & Criteria | Sample size | Outcome measured | Prevalence (%) |
---|---|---|---|---|---|---|---|
Ayalon, L. 2009 | Israel | MDIPHQ-9PHQ-1MDI-1 | ≥ 21≥ 10 1≥1 | SCID-I DSM-IV | 153 | MDD | 3.9 |
Baer, L. 2000 | USA | HANDS BDI-II Zung-SDS | ≥ 9≥11≥50 | SCID DSM-IV | 45 | Depression | 64.0 |
Beekman, A.T. 1997 | Holland | CESD-20 | ≥16 | DIS DSM-III | 487 | MDD | 2.0 |
Blank, K. 2004 | USA | CESD-20CESD-10 PHQ-2 yes/noYale SQ | ≥16≥ 4≥ 1≥ 1 | DIS DSM-IV | 125 | MDD | 11.0 |
Cameron, I.M. 2008 | UK | BDI-IIHADS-DPHQ-9QID-SR16 | ≥23≥ 9≥12≥13 | HRSD 17 item (c/o ≥14) | 282 | Depression severity | 47.5 |
Dutton, G.R. 2004 | USA | BDI-II | ≥14 | DSM-IV interview | 220 | MDD | 29.5 |
Evans, S. 1993 | UK | GHQ-30 | ≥ 4 | GMS-AGECAT | 145 | Depression | 35.8 |
Fechner-Bates, S. 1994 | USA | CESD-20 | ≥16 | SCID DSM-IIIR | 425 | MDD | 12.5 |
Gaynes, B.N. 2010 | USA | M-3 | ≥ 5 | MINI DSM-IV | 647 | MDD | 16.0 |
Hanlon, C. 2015 | UK | PHQ-2PHQ-9Kessler K10Kessler K6 | ≥ 3≥ 6≥18≥ 9 | MINI DSM-IV | 306 | MDD | 5.9 |
Jirapramukpitak, T. 2009 | Thailand | EURO-D | ≥ 4 | MINI DSM-IV | 150 | MDD | 34.0 |
Klinkman, M.S. 1997 | USA | CESD-20 | ≥ 4 | DSM-IIIR | 425 | MDD | 13.4 |
Lam, C.L. 1995 | Hong Kong | HADSHADS-D | ≥ 9 ≥ 6 | CIS DSM-III | 100 | Depression | 9.0 |
Lamoureux, B.E. 2010 | USA | QID-SR16 | ≥13 | SCID DSM-IV | 155 | MDD | 21.9 |
Lowe, B. 2004 | Germany | HADS-DPHQ-9WHO-5 | ≥ 8≥10≤ 8 | SCID DSM-IV | 501 | MDD | 13.2 |
Lowe, B. 2004 | Germany | HADS-DPHQ-9WHO-5 | ≥ 8≥10≤ 9 | IDCL ICD-10 | 528 | MDD | 15.8 |
Lustman, P.J. 1997 | USA | BDI-I | ≥10 | DIS-R DSM-III | 172 | MDD | 36.6 |
Lyness, J.M. 1997 | USA | CESD-20 | ≥21 | SCID DSM-IIIR | 130 | MDD | 9.2 |
McManus, D. 2005 | USA | CESD-10 PHQ-9PHQ-2 yes/noPHQ-2 | ≥10≥10≥ 1≥ 3 | DIS DSM-IV | 1024 | MDD | 21.9 |
Okimoto, J.T. 1982 | USA | ID (Popoff)Zung-SDS | ≥11≥60 | DSM-III | 55 | MDD | 30.9 |
Olsson, I. 2005 | Norway | HADS-D | ≥ 8 | DSQ DSM-IV | 1385 | MDD | 9.0 |
Rait, G. 1999 | UK | BASDEC | ≥ 7 | GMS-AGECAT | 130 | Depression | 10.0 |
Roberge, P. 2013 | Canada | HADS HADS-D | ≥16≥ 8 | CIDI DSM-IV | 1010* 660* | MDD | 57.9* |
Robison, J. 2002 | USA | CESD-20CESD-10PHQ-2 yes/noYale SQ | ≥21≥ 4≥ 1≥ 1 | CIDI DSM-IV | 303 | MDD | 12.0 |
Schulberg, H.C. 1985 | USA | CESD-20 | ≥16 | DIS DSM-III | 294 | Depression | 9.2 |
Sung, S.C. 2013 | Singapore | QID-SR16PHQ-9 | ≥ 9≥ 6 | MINI DSM-IV | 400 | MDD | 3.0 |
Thomas, J.L. 2001 | USA | CESD-20 | ≥16 | DIS DSM-IV | 179 | MDD | 11.0 |
Upadhyaya, A.K. 1997 | UK | HADS-DSelfCARE(D) | ≥ 8≥ 5 | GMS-AGECAT | 72 | Depression | 27.8 |
Whooley, M.A. 1997 | USA | BDI-IBDI-SFCESD-20CESD-10MOS-DSDDS-PCPHQ-2 yes/no | ≥10≥ 5≥16≥10≥0.062≥ 1 | DIS DSM-III | 536 | MDD | 18.1 |
Wilhelm, K. 2004 | Australia | BDI-FS | ≥ 4 | DSM-IV interview | 212 | MDD | 11.8 |
Wilkinson, M.J. 1988 | UK | HADS | ≥ 8 | CIS DSM-III | 100 | MDD | 14.0 |
Williams, J.W. Jnr 1999 | USA | CESD-20 | ≥16 | DIS-R DSM-III | 296 | MDD | 7.4 |
Williams, J.W. Jnr 1995 | USA | SDS | 0.043 | SCID-IP DSM-III | 221 | MDD | 4.0 |
Yang, Y. 2014 | China | HADS-D HADS | ≥ 9≥16 | MINI(v5)DSM-IV | 100 | MDD | 38.0 |
Yeung, A. 2002 | USA | BDI-II | ≥16 | SCID-IP DSM-IIIR | 180 | MDD | 29.4 |
Zich, J.M. 1990 | USA | BDI-ICESD-20 | ≥10≥16 | DIS DSM-III | 31 34 | MDD | 9.7 5.9 |
Aragones-B, E. 2001 | Spain | Zung-SDS | ≥50 | SCID DSM-IV | 205 | MDD | 14.7 |
Arroll, B. 2003 | New Zealand | PHQ-2 yes/no PHQ-1 | ≥1 1 | CIDI DSM-IV | 421 | MDD | 6.9 |
Arroll, B. 2010 | New Zealand | PHQ-2 PHQ-9 | ≥ 3 ≥ 10 | CIDI DSM-IV | 2642 | MDD | 6.2 |
Awata, S 2007 | Japan | WHO-5 | ≤13 | SCID-I DSM-IV | 129 | MDD | 10.7 |
N-Azah, M.N. 2005 | Malaysia | PHQ-9 | ≥10 | CIDI ICD-10 | 180 | Depression | 53.9 |
Azevedo-Marques, J. 2009 | Brazil | WHO-5SRQ-20 | ≤11≥8 | SCID DSM-IV | 120 | Depression | 20.8 |
Banerjee, S. 1998 | UK | SelfCARE(D) | ≥7/8 | GMS-AGECAT | 218 | Depression | 37.8 |
Broadhead,W.E. 1995 | USA | SDDS-PC | any 2 | SCID-P DSM-IIIR | 388 | MDD | 15.7 |
Burnam, M.A. 1988 | USA | MOS-D | ≥ 0.06 | DIS DSM-III | 1416(ECA) 501 (PCP) | MDD | 3.0 3.0 |
Campo-Arias, A. 2006 | Colombia | Zung-SDS | ≥49 | SCID DSM-IV | 266 | MDD | 16.5 |
Chen, S. 2010 | China | PHQ-2PHQ-9 | ≥ 3≥ 10 | SCID DSM-IV | 77 | MDD | N/A |
Chen, S, 2013 | China | PHQ-9 | ≥10 | SCID DSM-IV | 280 | MDD | N/A |
Cheng, C.M. 2007 | Hong Kong | PHQ-9 PHQ-2 yes/no | ≥ 9 ≥ 1 | Chinese HRSD 17 item(c/o≥16) | 357 | Depression | 8.4 |
Corapcioglu, A. 2004 | Turkey | PRIME-MDPHQ-9 | Algorithm ≥10 | DSM-IV interview | 1387 | MDD | 6.6 |
Esler, D. 2008 | Australia | PHQ-9 PHQ-2 yes/noPHQ-2 | ≥ 9≥ 1≥ 3 | SCID DSM-IV | 34 34 34 | MDDMinor or MDD Minor or MDD | 15.428.228.2 |
Gilbody, S. 2007 | UK | PHQ-9 | ≥10 | SCID DSM-IIIR | 96 | MDD | 37.5 |
Goldberg, D.P. 1997 | UK | GHQ-12 | ≥2 | CIDI ICD-10 | 5438 | Depression | 24.0 |
Henkel, V. 2004 | Germany | GHQ-12WHO-5PHQ-9 | ≥ 2≤13≥10 | CIDI DSM-IV | 448 | MDD | 10.2 |
Howe, A. 2000 | UK | MHI-1 | ≥2 | GMS-AGECAT | 100 | Depression | 30.0 |
Inagaki, M. 2013 | Japan | PHQ-9 PHQ-2 | ≥10 ≥ 3 | MINI DSM-IIIR | 104 | MDD | 7.4 |
Kroenke, K. 2003 | USA | PHQ-2 | ≥ 3 | SCID DSM-III | 580 | MDD | 7.1 |
Kroenke, K. 2014 | USA | PROMIS | ≥ 8 | PHQ DSM-IV | 244 | MDD | 24.1 |
Kroenke, K. 2001 | USA | PHQ-9 | ≥10 | SCID DSM-III | 580 | MDD | 7.1 |
Lamers, F. 2008 | Holland | PHQ-9 | ≥ 7 | MINI DSM-IV | 620 | MDD | N/A |
Leung, K.K. 1998 | Taiwan | Zung-SDS | ≥55 | DSM-IV interview | 50 | Depression | 18.0 |
Lino, V.T. 2014 | Brazil | PHQ-2 | ≥ 3 | SCID-I DSM-IV | 142 | MDD | 26.1 |
Liu, S.I. 2011 | Taiwan | PHQ-9 PHQ-2 | ≥10 ≥ 3 | SCAN DSM-IV | 1954 | MDD | 3.3 |
Loerch, B. 2000 | Germany | PRIME-MD | Yes | CIDI DSM-IV | 704 | MDD | 18.6 |
Lotrakul, M. 2008 | Thailand | PHQ-9 | ≥10 | MINI DSM-IV | 279 | MDD | 6.8 |
McQuaid, J.R. 2000 | USA | CESD-20 | ≥16 | CIDI DSM-IV | 213 | MDD | 23.9 |
Means-Christensen, A.J. 2005 | USA | MHI-5MHI-1 | ≤23≤ 4 | PHQ | 246 | MDD | 6.9 |
Means-Christensen, A.J. 2006 | USA | MHI-1 | 1 | CIDI DSM-IV | 801 | MDD | 39.8 |
Mergl, R. 2007 | Germany | WHO-5GHQ-12 | ≥13≥ 2 | CIDI DSM-IV | 394 | Depression | 22.8 |
Muhwezi, W.W. 2006 | Uganda | SWB | ≥10 | MINI DSM-IV | 234 | MDD | 31.6 |
Nagel, R. 1998 | USA | MOS-D | n/a | DIS DSM-IIIR | 147 | MDD | 2.8 |
Phelan, E. 2010 | USA | PHQ-2PHQ-9 | ≥ 3≥10 | SCID DSM-IV | 69 | MDD | 12.0 |
Picardi, A. 2013 | Italy | PC-SAD | positive | SCID-I DSM-IV | 212 | MDD | N/A |
Saipanish, R. 2009 | Thailand | WHO-5 | ≤13 | SCID DSM-IV | 274 | MDD | 6.9 |
Schmitz, N. 1999 | Germany | GHQ-12 | ≥ 2 | SCID DSM-IIIR | 408 | Depression | 5.2 |
Spitzer, R.L. 1994 | USA | PRIME-MD PQ | Yes to 1 of 2 items | SCID DSM-III | 431 | MDD | 14.0 |
Spitzer, R.L. 1999 | USA | PHQ-9 | algorithm | SCID DSM-III | 585 | MDD | N/A |
Thapar, A 2014 | UK | BDI-IHADS-DPHQ-2PHQ-9PHQ-4 | ≥10≥ 8≥ 3≥10≥3 | SCAN DSM-IV | 272 | MDD | 22.2 |
Yeung, A. 2008 | USA | PHQ-9 | ≥15 | SCID DSM-IV | 184 | MDD | 22.8 |
Zhang, Y. 2013 | Hong Kong | PHQ-9 | ≥10 | MINI DSM-IV | 99 | MDD | 23.2 |
Zuithoff. N 2010 | Holland | PHQ-9PHQ-2 | ≥10≥ 3 | CIDI DSM-IV | 1338 | MDD | 13.0 |
Legend: MDD: Major depressive disorder, N/A: not available, *: French speaking patients only.
CIDI : Composite International Diagnostic Interview, DIS: Diagnostic Interview Schedule, DSM-III/IV Diagnostic and Statistical Manual of Mental Disorders 3rd/4th edition, GMS-AGECAT: Geriatric Mental State-Automated Geriatric Examination for Computer Assisted Taxonomy, HRSD : Hamilton Rating Scale for Depression, ICD-10 : International Statistical Classification of Diseases, IDCL : International Diagnostic Checklist, MINI: Mini-International Neuropsychiatric Interview for DSM-IV, SCAN: Schedules for Clinical Assessment in Neuropsychiatry, SCID : The Structured Clinical Interview for DSM-IV Disorders.
BASDEC: Brief Assessment Schedule Depression Cards, BDI-II: Beck Depression Inventory, version 2 BDI-FS: Beck Depression Inventory-Fast Screen, BDI-SF: Beck Depression Inventory-Short Form, CES-D: Centre for Epidemiological Studies-Depression Scale (10 & 20 item), Euro-D: European Depression Scale, GHQ: General Health Questionnaire, HADS: Hospital Anxiety and Depression Scale, HADS-D: HADS depression subscale HANDS: Harvard Department of Psychiatry/National Depression Screening Day Scale ID: (Popoff) Index of Depression, MDI: Major Depression Inventory, MHI-5: Mental Health Inventory, MOS-D: Medical Outcomes Study Depression Scale, PC-SAD: Primary Care Screener for Affective Disorders, PHQ: Patient Health Questionnaire (1,2 yes/no, 2 & 9 item) PRIME-MD: Primary Care Evaluation of Mental Disorders, PROMIS: Patient Reported Outcomes Measurement Information System QIDS-SR: Quick Inventory of Depressive Symptomatology–Self-Report SDDS-PC: Symptom Driven Diagnostic System-Primary Care, SDS: Short Depression Screen SelfCARE(D) SQ: Single Question SWB: Subjective wellbeing subscale WHO-5: World Health Organization Wellbeing Index, Zung-SDS: Zung’s Self-Rating Depression Scale.
The review identified forty unique depression screening tools and provided 138 psychometric data sets. The 9-item Patient Health Questionnaire (PHQ-9) was the most frequently evaluated tool (n = 27). Eleven tools assessed the presence of depressive symptoms over a 2-week period in accordance with Diagnostic and Statistical Manual of mental disorders (DSM) guidelines.43 They included the four PHQ tools, Beck Depression Inventory version-2 (BDI-II), Beck Depression Inventory-fast screen (BDI-FS), Harvard Department of Psychiatry National Depression Screening Day Scale (HANDS), My Mood Monitor (M-3), Major Depression Inventory (MDI), Medical Outcomes Study Depression Scale (MOS-D) and World Health Organization Wellbeing Index (WHO-5). The most frequently used reference-standard was the Structured Clinical Interview for DSM Disorders (SCID) (n = 26).
Quality assessmentTwenty-three studies (28%) were considered good quality with a low RoB across all domains. Patient selection, reference standard and index test domains rated low RoB in the majority of studies. However, the flow and timing domain rated unclear RoB in 42 (53%) studies, due largely to limited reporting of the interval between administering the screening tool and receiving the reference-standard and because groups with and without depression were included in analysis in different proportions.Fig. 2 summarises the overall QUADAS-2 quality assessment results.
Administrative operating characteristicsTen screening tools were classified as ultra-short (1–4 questions), nineteen were classified as short (5–14 questions), and eleven were classified as longer tools (≥ 15 questions). Ultra-short tools took one to three minutes to complete and were easily scored except the Subjective wellbeing subscale (SWB), which did not report the ease of scoring. The level of literacy was considered ‘easy’ for the PHQ-1 and Yale single question tools and ‘average’ for the PHQ-2, PHQ-2(yes/no), PHQ-4, PROMIS, MDI-1, MHI-1, MHI-1(yes/no) and SWB. Of the ultra-short tools, only the PHQ tools met the DSM criteria. Short tools, on average, took five minutes to complete (range 2−10 min) and scoring was described as ‘complex’ for the MHI-5, MOS-D, SDS and SDDS-PC, ‘average’ for the WHO-5, and the remaining fourteen tools were simple to score. The level of literacy was classified as ‘easy’ for nine short tools (BDI-SF, BDI-FS, CESD-10, GHQ-12, HANDS, Kessler-K6 and K10, SDDS and SelfCARE(D)), eight were rated ‘average’ (EURO-D, M-3, MDI, MHI-5, MOS-D, PHQ-9, SDS and WHO-5) and two were rated ‘difficult’ (HADS, HADS-D). Six of the short tools (BDI-FS, HANDS, MDI, MOS-D, PHQ-9 and WHO-5) met the DSM criteria. Longer screening tools took 5−15 min to complete, were simple to score with the exception of the 37 item PC-SAD (mathematical algorithm required),27 and required levels of literacy described as ‘easy or average’. Of the longer screening tools, only the BDI-II met the DSM criteria. Overall, only the BDI-FS, HANDS, M-3, MDI, MDI-1, four PHQ tools and WHO-5 took ≤5 min to administer, had an ‘easy or average’ level of literacy, scoring was ‘easy or average’, and met the DSM criteria. Table 2 summarises the administrative operating characteristics of screening tools identified.
Administrative operating characteristics.
Screening Tool (Number of Studies) | Number of Items (class) | Symptom review period | Administration time (minutes) | Literacy level | Ease of scoring | Score range | Recommended cut-off ɸ |
---|---|---|---|---|---|---|---|
BASDEC (1) | 19 (long) | Undisclosed | 5−10 | Average | Simple | 0−19 | >7 |
BDI-II (4) BDI (4) | 21 (long) 21 (long) | Past 2 weeksPast 7 days | 5−105−10 | EasyEasy | SimpleSimple | 0–63 ɣ | 14+10+ |
BDI-SF (1) | 13 (short) | Past 7 days | 5 – 7 | Easy | Simple | 0–39 ɣ | ≥5 |
BDI-FS (PC) (1) | 7 (short) | Past 2 weeks | ≤ 5 | Easy | Simple | 0−21 ɣ | ≥4 |
CESD-20 item (12) | 20 (long) | Past 7 days | ≤ 10 | Easy | Simple | 0 – 60 | ≥16 |
CESD-10 item (4) | 10 (short) | Past 7 days | 2 – 5 | Easy | Simple | 0–30 | ≥10 |
EURO-D (1) | 12 (short) | Past 7 days | 5 | Average | Simple | 0–12 | ≥4 |
GHQ-12 (4) | 12 (short) | Past few weeks | 5−10 | Easy | Simple | 0–12 ≠ | ≥2 |
GHQ-30 (1) | 20 (long) | Past few weeks | 10 | Easy | Simple | 0–30 ≠ | ≥4 |
HADS (4) | 14 (short) | Past 7 days | 5 | Difficult | Simple | 0−42 | ≥16 |
HADS-D (9) | 7 (short) | Past 7 days | 2−5 | Difficult | Simple | 0−21 | ≥8 |
HANDS (1) | 10 (short) | Past 2 weeks | 5−10 | Easy | Simple | 0–30 | ≥9 |
ID (1) | 15 (long) | Recently | 2 – 5 | Easy | Simple | 0−15 | ≥10 |
Kessler K6 (1) | 6 (short) | Past 4 weeks | 2−3 | Easy | Simple | 1 - 30 | 12 -14 (mild) |
Kessler K10 (1) | 10 (short) | Past 4 weeks | 2−3 | Easy | Simple | 1 - 50 | 20−24 (mild) |
M-3 (1) | 7 (short) | Past 2 weeks | ≤ 5 | Average | Simple | 0−28 | ≥5 |
MDI (1) | 10 (short) | Past 2 weeks | ≤ 5 | Average | Simple | 0−50 | ≥21 |
MHI-5 (1) | 5 (short) | Past month | 2−3 | Average | Complex | 0–30 (raw score) | ≤23 Δ |
MOS-D (3) | 8 (short) | Past 2 weeks | 2−3 | Average | Complex | 0 – 1€ | ≥0.06 |
PC-SAD (1) | 37 (long) (includes 3 pre-screen items) | Past month | 10 1–2 (pre-screen) | Easy | Complex | + or - | Positive score (from algorithm) |
PHQ-2 (12) | 2 (ultrashort) | Past 2 weeks | 1−2 | Average | Simple | 0−6 | ≥3 |
PHQ-2 Yes/No (7) | 2 (ultrashort) | Past 2 weeks | 1−2 | Average | Simple | 0−2 | ≥1 |
PHQ-4 (1) | 4 (ultrashort) | Past 2 weeks | 2−3 | Average | Simple | 0–12 | ≥3 |
PHQ-9 (27) | 9 (short) | Past 2 weeks | ≤ 5 | Average | Simple | 0 – 27 OR1 of first 2 and ≥5 of 9 items | ≥10 OR algorithm |
PRIME-MD (3) | 26 (long) | Past month | 15 (part 1) | Average | Simple | positive responses | algorithm |
PROMIS (1) | 4 (ultrashort) | Past 7 days | 2 – 3 | Average | Simple (raw score) | 4−20 | ≥8 |
QIDS-SR16 (3) | 16 (long) | Past 7 days | 5−7 | Average | Simple | 0−27 | ≥ 7 |
SDDS-PC (2) | 5 (short) | Past month | 1−2 | Easy | Complex | 0−5 | any 2 |
SelfCARE(D) (2) | 12 (short) | Past month | 2 – 3 | Easy | Simple | 0–12 | 5−6 |
Single question | (ultrashort) | ||||||
- MDI-1 (1) | 1 | Past 2 weeks | < 1 | Average | Simple | 0−5 | ≥2 |
- MHI-I (2) | 1 | Past year | < 1 | Average | Simple | 1 – 6 | ≥2 |
- MHI-1 (yes/no) (1) | 1 | Past year | < 1 | Average | Simple | 0−1 | 1 |
- Yale (2) | 1 | Past month | < 1 | Easy | Simple | 0−1 | 1 |
- PHQ-1 (2) | 1 | Past 2 weeks | < 1 | Easy | Simple | 0−1 | |
SDS (1) | 8 (short) | Past week | unclear | Average | Complex | 0−1€ | 0.06 |
SRQ-20 (1) | 20 (long) | Past 30 days | unclear | Average | Simple | 0−20 | ≥8 |
SWB (1) | 4 (ultrashort) | Past week | unclear | Average | unclear | unclear | 10 |
WHO-5 (7) | 5 (short) | Past 2 weeks | 2 – 5 | Average | Average | 0–25 raw score | ≤13 |
Zung-SDS (5) | 20 (long) | Past few days | 5−10 | Easy | Simple | 25−100 | ≥50 |
Legend: BASDEC: Brief Assessment Schedule Depression Cards, BDI-II: Beck Depression Inventory- version 2, BDI-FS: Beck Depression Inventory-Fast Screen, BDI-SF: Beck Depression Inventory-Short Form.
CES-D: Centre for Epidemiological Studies-Depression Scale (10 & 20 item), Euro-D: European Depression Scale, GHQ: General Health Questionnaire, HADS: Hospital Anxiety and Depression Scale.
HADS-D: HADS depression subscale HANDS: Harvard Department of Psychiatry National Depression Screening Day Scale ID: (Popoff) Index of Depression, MDI: Major Depression Inventory.
MHI-5: Mental Health Inventory, MOS-D: Medical Outcomes Study Depression Scale, PC-SAD: Primary Care Screener for Affective Disorders, PHQ: Patient Health Questionnaire (1,2 yes/no, 2 & 9 item).
PRIME-MD: Primary Care Evaluation of Mental Disorders, PROMIS: Patient Reported Outcomes Measurement Information System QIDS-SR: Quick Inventory of Depressive Symptomatology–Self-Report.
SDDS-PC: Symptom Driven Diagnostic System-Primary Care, SDS: Short Depression Screen SelfCARE(D) SQ: Single Question SWB: Subjective Wellbeing scale WHO-5: World Health Organization Wellbeing Index, Zung- SDS: Zung’s Self-Rating Depression Scale.
τ: Time-frame over which participants must reflect on when answering each question; λ, Literacy level using Fog Index; ɸ, Cut-off point for depression recommended by instrument developers ≠ Bimodal scoring method: Dichotomous or Likert scoring € calculated by logistic regression; Δ, 1–5 Likert scale raw score converted by linear transformation to 0–100; ɣ, BDI-II, BDI-SF & BDI-FS cut-offs for mild –moderate depression.
Psychometric characteristics varied between studies for the same tool and varied considerably from tool to tool. The recommended cut-off point was not applied in thirty-two screening tool evaluations. a-ROC values were available for 89 of 138 screening tool evaluations with values ranging from 0.68 to 0.97 (median = 0.88). The Zung SDS had the highest median a-ROC (0.92) across 2 studies screening 471 patients. The PHQ-9 had the highest a-ROC (0.97) in an individual study and the second highest median a-ROC (0.91) across 21 studies screening almost 6350 patients. Sensitivity values ranged from 30% to 100%, with 70 of 138 evaluations having a sensitivity >80%. Specificity values varied more widely, ranging from 12% to 98%, with 61 of 138 evaluations having a specificity >80%. PPV values (rule-in accuracy) ranged from 9% to 92%, with 5 of 138 evaluations having a PPV > 80%. NPV values (rule-out accuracy) ranged from 12% to 100%, with 121 of 138 evaluations having an NPV > 80%. Positive likelihood ratios (LR+) ranged from 1.14 to 36.50, with 41 of 138 evaluations having an LR+ ≥5, while negative likelihood ratios (LR-) ranged from <0.01 to 0.66, with 58 of 138 evaluations having an LR- ≤0.2. Table 3 summarises the psychometric characteristics of all screening tools included in the systematic review.
Psychometrics of included studies.
Instrument | Author Year | Cut-Off | Sensitivity (%) (95%CI) | Specificity (%) (95%CI) | PPV (%) (95%CI) | NPV (%) (95%CI) | LR + | LR - | AUC (95%CI) |
---|---|---|---|---|---|---|---|---|---|
BASDEC | Rait, G 1999 | ≥7 | 71.0 | 88.0 | 39.7 | 96.5 | 5.92 | 0.32 | N/A |
BDI-II | Baer, L 2000Cameron, IM 2008 Dutton, GR 2004Yeung, A 2002 | ≥11≥20≥14≥16 | 96.084.087.779.0 | 31.068.083.991.0 | 71.072.069.579.0 | 81.082.094.291.0 | 1.392.655.448.77 | 0.130.240.150.23 | N/A0.850.910.94 |
BDI | Thapar, A 2014 Lustman, PJ 1997Whooley, MA 1997Zich, JM 1990 | ≥10≥13≥10≥10 | 98.385.089.0100.0 | 39.788.064.075.0 | 31.780.063.929.8 | 98.891.096.4100.0 | 1.637.082.504.00 | 0.040.170.17<0.01 | 0.890.940.87 N/A |
BDI-SF | Whooley, MA 1997 | ≥ 5 | 92.0 | 61.0 | 34.1 | 97.2 | 2.40 | 0.13 | 0.86 |
BDI-FS(PC) | Wilhelm, K 2004 | ≥ 4 | 91.0 | 62.0 | 24.2 | 98.1 | 2.39 | 0.15 | 0.85 |
CES-D 20 item | Beekman, AT 1998Blank, K 2004Fechner-Bates, S 1994Klinkman, MS 1997Lyness, JM 1997McQuaid, JR 2000Robison, J 2002Schulberg, HC 1985Thomas, JL 2001Whooley, MA 1997Williams, JW Jnr 1999Zich, JM 1990 | ≥16≥16≥16≥16≥21≥16≥20≥16≥16≥16≥16≥16 | 10079.079.580.792.078.773.096.395.093.088.0100.0 | 87.675.071.171.787.076.572.038.670.069.075.053.0 | 14.128.128.030.741.857.826.213.728.439.732.612.0 | 100.096.696.196.099.189.295.198.999.197.897.8100.0 | 2.233.102.722.857.083.352.601.573.173.003.522.12 | <0.010.290.290.270.090.280.380.100.070.100.16<0.01 | N/A0.86N/AN/A0.94N/A0.77N/A0.880.89N/AN/A |
CES-D 10 item | Blank, K 2004McManus, D 2005Robison, J 2002Whooley, MA 1997 | ≥ 4≥10≥ 4≥10 | 79.076.076.090.0 | 81.079.070.072.0 | 33.950.525.741.5 | 96.992.195.597.0 | 4.103.602.503.20 | 0.260.300.350.14 | 0.890.870.770.87 |
EURO-D | Jirapramukpitak, T 2009 | ≥ 5 | 84.3 | 58.6 | 55.9 | 80.2 | 2.00 | 0.27 | 0.78 |
GHQ-12 | Goldberg, DP 1997Henkel, V 2004Mergl, R 2007Schmitz, N 1999 | ≥ 2≥ 2≥ 2≥ 2/3 | 76.385.087.060.0 | 83.463.063.074.0 | 52.934.014.257.0 | 91.895.098.676.0 | 4.602.302.352.30 | 0.280.230.210.54 | 0.880.870.840.73 |
GHQ-30 | Evans, S 1993 | ≥4 | 77.0 | 67.0 | 59.1 | 82.5 | 3.33 | 0.34 | N/A |
HADS | Lam, CL 1995Roberge, P 2013Wilkinson, MJ 1988Yang, Y 2014 | ≥ 9≥16≥ 8≥16 | 80.062.090.066.7 | 90.077.086.076.5 | 67.055.081.032.3 | 95.082.012.092.8 | 8.002.706.422.84 | 0.220.490.120.44 | N/A0.760.950.86 |
HADS-D | Cameron, IM 2008Thapar, A 2014Lam, CL 1995Löwe, B 2004a (SCID)Löwe, B 2004b (IDCL)Olsson, L 2005Roberge, P 2013Upadhyaya, AK 1997Yang, Y 2014 | ≥ 9≥ 8≥ 6≥ 8≥ 8≥ 8≥ 8≥ 8/9≥ 8 | 73.085.278.088.087.080.056.070.080.0 | 76.068.291.069.070.088.080.087.090.6 | 72.043.347.030.135.239.741.067.741.9 | 76.094.298.097.496.697.888.088.297.1 | 3.042.688.662.842.906.672.805.388.51 | 0.360.220.240.170.180.230.550.340.22 | 0.830.86N/A0.890.880.930.75N/A0.94 |
HANDS | Baer, L 2000 | ≥ 9 | 96.0 | 60.0 | 81.0 | 89.0 | 2.40 | 0.07 | N/A |
ID (Popoff) | Okimoto, JT 1982 | ≥11 | 88.0 | 61.0 | 50.2 | 91.9 | 2.25 | 0.20 | N/A |
Kessler K10 | Hanlon, C 2015 | ≥18 | 77.8 | 76.7 | 17.3 | 98.2 | 3.34 | 0.29 | 0.83 |
Kessler K6 | Hanlon, C 2015 | ≥ 9 | 77.8 | 73.3 | 15.4 | 98.1 | 2.91 | 0.30 | 0.84 |
M-3 | Gaynes, BN 2010 | ≥ 5 | 84.0 | 80.0 | 54.0 | 95.0 | 4.19 | 0.20 | N/A |
MDI | Ayalon, L 2009 | ≥21 | 83.3 | 97.2 | 55.0 | 99.0 | 29.75 | 0.17 | N/A |
MHI-5 | Means-Christensen, A 2005 | ≤23 | 90.9 | 57.6 | 17.4 | 98.5 | 2.14 | 0.16 | 0.91 |
MOS-D | Burnam, MA 1988 (*PSP)Nagel, R 1998Whooley, MA 1997 | ≥0.06N/A≥0.06 | 86.0100.093.0 | 90.077.072.0 | 20.011.042.0 | 99.5100.093.3 | 8.60 4.343.30 | 0.16<0.010.10 | N/AN/A 0.89 |
PC-SAD | Picardi, A 2013 | positive | 89.8 | 82.6 | 51.2 | 97.6 | 5.20 | 0.12 | N/A |
PHQ-4 | Thapar, A 2014 | ≥3 | 93.4 | 67.9 | 45.4 | 97.3 | 2.91 | 0.10 | 0.91 |
PHQ-2 (scored) | Arroll, B 2010Chen, S 2010Esler, D 2008Thapar, A 2014 Hanlon, C 2015Inagaki, M 2013Kroenke, K 2003Lino, VT 2014Liu, SI 2011McManus, D 2005Phelan, R 2010Zuithoff, N 2010 | ≥ 3≥ 3≥ 3≥ 3≥ 3≥ 3≥ 3≥ 3≥ 3≥ 3≥ 3≥ 3 | 61.084.030.072.133.3 61.082.952.064.039.063.042.0 | 92.090.070.882.193.198.090.087.093.692.085.094.0 | 33.5N/A30.053.523.166.038.455.025.457.736.453.0 | 97.3N/A70.891.295.797.098.685.098.784.394.491.0 | 7.708.401.024.024.8224.752.904.0010.004.904.207.00 | 0.420.180.980.340.720.400.190.130.380.660.440.62 | N/A0.92N/A0.870.780.950.930.770.900.840.810.83 |
PHQ-2 yes/no | Arroll, B 2003Blank, K 2004Cheng, CM 2007Esler, D 2008McManus, D 2005Robison, J 2002Whooley, MA 1997 | ≥ 1≥ 1≥ 1≥ 1≥ 1≥ 1≥ 1 | 97.079.096.7100.090.092.096.0 | 67.058.073.412.569.044.057.0 | 17.918.925.032.344.918.333.0 | 95.799.799.6100.095.797.698.5 | 2.901.903.631.142.901.602.20 | 0.050.370.04<0.010.140.180.07 | N/A0.72N/AN/A0.840.680.82 |
PHQ-9 | Arroll, B 2010Ayalon, L 2009N-Azah, N 2005 Cameron, IM 2008Chen, S 2010Chen, S 2013Cheng, CM 2007Corapcioglu, A 2004Esler, D 2008Gilbody, S 2007Thapar, A 2014Hanlon, C 2015Henkel, V 2004Inagaki, M 2013Kroenke, K 2001Lamers, F 2008Liu, SI 2011Lotrakul, M 2008Löwe, B 2004a (SCID) Löwe, B 2004b (ICDL)McManus, D 2005Phelan, E 2010Spitzer, RL 1999Sung, SC 2013Yeung, A 2008Zhang, Y 2013Zuithoff, N 2010 | ≥10≥10≥10≥10≥10≥10≥ 9≥10≥ 9 ≥10≥10≥ 6algorithm≥10 ≥10≥ 7≥10≥10≥11≥10≥10≥10algorithm≥ 6≥15≥10≥10 | 74.066.660.987.075.087.080.071.480.091.785.2 77.879.045.088.092.286.074.098.090.054.0 63.073.091.781.056.549.0 | 91.098.680.769.089.081.092.091.971.478.376.980.686.099.088.078.193.985.080.077.090.082.0 98.072.298.084.295.0 | 35.267.038.670.0N/AN/AN/A38.233.371.751.220.055.072.035.941.632.527.042.742.360.232.3 80.29.292.042.159.0 | 98.199.078.186.0N/AN/AN/A97.995.294.094.898.395.096.099.098.399.598.099.697.687.494.2 97.099.695.090.593.0 | 8.407.403.152.806.815.0410.008.812.804.233.694.015.4732.847.104.2114.15.044.903.915.403.50 36.53.3040.53.589.80 | 0.280.340.480.190.280.310.220.310.280.110.190.280.240.550.140.100.150.310.030.130.510.460.270.110.190.520.54 | N/AN/A0.740.880.920.91N/AN/AN/A0.940.900.850.910.930.950.920.960.890.950.920.860.87N/A0.820.970.850.89 |
PRIME-MD (part 1) | Corapcioglu, A 2004 Loerch, B 2000Spitzer, RL 1994 | positivepositivepositive | 52.268.057.0 | 89.784.098.0 | 57.150.080.1 | 87.692.099.5 | 5.074.2528.5 | 0.530.380.44 | N/AN/AN/A |
PROMIS | Kroenke, K 2014 | ≥ 8 | 83.0 | 84.0 | 62.5 | 93.9 | 5.29 | 0.20 | 0.89 |
QIDS-SR16 | Cameron, IM 2008 Lamoureux, BE 2010Sung, SC 2013 | ≥11≥13≥ 9 | 86.076.583.3 | 68.081.884.7 | 70.054.214.5 | 85.092.599.4 | 2.684.205.40 | 0.210.290.20 | 0.890.820.84 |
SDDS-PC | Broadhead, WE 1995 Whooley, MA 1997 | 22 | 90.496.0 | 77.251.0 | 39.730.0 | 98.098.3 | 3.962.0 | 0.120.08 | N/A0.86 |
SelfCARE-D | Banerjee, S 1997Upadhyaya, AK 1997 | ≥ 8/9≥ 5/6 | 73.095.0 | 70.086.0 | 58.048.0 | 84.097.8 | 2.432.38 | 0.390.06 | 0.77N/A |
Single - MDI-1 | Ayalon, L 2009 | ≥ 1 | 66.6 | 91.6 | 23.0 | 99.0 | 7.92 | 0.36 | N/A |
Single - MHI-1 - MHI-1 - MHI-1(y/n) | Howe, A 2000Means-Christensen, A 2005Means-Christensen, A 2006 | ≤2/3≤ 4 1 | 67.088.085.0 | 60.062.073.0 | 41.714.663.6 | 81.098.991.1 | 1.682.313.15 | 0.550.190.21 | N/A0.82N/A |
Single - PHQ-1 - PHQ-1 | Arroll, B 2003Ayalon, L 2009 | ≥ 1≥ 1 | 86.083.3 | 72.093.8 | 18.645.0 | 98.699.0 | 3.0013.44 | 0.190.18 | N/AN/A |
Single - Yale SQ - Yale SQ | Blank, K 2004Robison, J 2002 | ≥ 1≥ 1 | 64.086.0 | 64.042.0 | 11.630.0 | 93.551.0 | 1.801.48 | 0.560.34 | N/AN/A |
SDS | Williams, JW Jnr 1995 | 0.043 | 100.0 | 72.0 | 13.0 | 100.00 | 3.57 | <0.01 | N/A |
SRQ-20 | Azevedo-Marques, J 2009 | ≥ 8 | 81.0 | 86.0 | 81.0 | 87.0 | 5.79 | 0.22 | 0.86 |
SWB | Muhwezi, W 2007 | 10 | 75.7 | 86.3 | 76.7 | 85.6 | 5.52 | 0.28 | N/A |
WHO-5 | Awata, S 2007Azevedo-Marques, J 2009Henkel, V 2004Löwe, B 2004a (SCID)Löwe, B 2004b (ICDL)Mergl, R 2007Saipanish, R 2009 | ≤13≤11≤13≤ 8≤ 9≤13≤13 | 100.077.094.095.087.090.089.0 | 74.189.065.073.070.063.065.0 | 31.881.037.034.935.214.716.0 | 100.087.098.098.996.698.199.0 | 3.867.002.693.522.902.432.56 | <0.010.260.090.070.190.160.16 | 0.900.830.900.910.890.860.86 |
Zung-SDS | Aragones, E 2001Baer, L 2000Campo-Arias, A 2006Leung, KK 1998Okimoto, JT 1982 | ≥50≥50≥49≥55≥60 | 94.089.050.066.776.0 | 70.053.094.690.282.0 | 35.077.064.730.2 65.5 | 99.073.090.597.7 88.4 | 3.091.899.266.80 4.22 | 0.090.200.530.37 0.29 | 0.93N/A0.90N/AN/A |
PPV: Positive Predictive Value, NPV: Negative Predictive Value, LR+: Positive Likelihood Ratio, LR-: Negative Likelihood Ratio, AUC: Area under the ROC curve, N/A: not available.
BASDEC: Brief Assessment Schedule Depression Cards, BDI-II: Beck Depression Inventory, version 2 BDI-FS: Beck Depression Inventory-Fast Screen, BDI-SF: Beck Depression Inventory-Short Form, CES-D: Centre for Epidemiological Studies-Depression Scale (10 & 20 item), Euro-D: European Depression Scale, GHQ: General Health Questionnaire, HADS: Hospital Anxiety and Depression Scale, HADS-D: HADS depression subscale HANDS: Harvard Department of Psychiatry/National Depression Screening Day Scale ID: (Popoff) Index of Depression, MDI: Major Depression Inventory, MHI-5: Mental Health Inventory, MOS-D: Medical Outcomes Study Depression Scale, PC-SAD: Primary Care Screener for Affective Disorders, PHQ: Patient Health Questionnaire (1,2 yes/no, 2 & 9 item) PRIME-MD: Primary Care Evaluation of Mental Disorders, PROMIS: Patient Reported Outcomes Measurement Information System QIDS-SR: Quick Inventory of Depressive Symptomatology–Self-Report SDDS-PC: Symptom Driven Diagnostic System-Primary Care, SDS: Short Depression Screen SelfCARE(D) SQ: Single Question WHO-5: World Health Organization Wellbeing Index, Zung- SDS: Zung’s Self-Rating Depression Scale.
*PSP: Primary care screening sample.
Author in bold print: QUADAS-2 RoB rated low in all domains.
Authors in italic print: Study included in meta-analysis.
Authors of studies with low RoB across all domains for the tool evaluated in Table 3 are shown in bold font. Of the tools included in the meta-analysis, only the PHQ-9 and WHO-5 had a low RoB across all domains in ≥5 studies.
Meta-analysisSix screening tools (CESD-20, HADS-D, PHQ-9, PHQ-2(scored), WHO-5 and Zung-SDS) were evaluated by five or more studies and underwent meta-analysis. Authors of studies included in the meta-analysis are shown in italic font in Table 3. Psychometric characteristics of screening tools analysed are summarised in Table 4. The PHQ-9 had the highest diagnostic odds ratio (DOR) (25.69), highest LR+ (6.79) and second-highest specificity (0.89). The WHO-5 had the second-highest DOR (19.70) and highest sensitivity (0.90), while the PHQ-2(scored) had the lowest DOR (12.78) but the highest specificity (0.91). A visual representation of the results of the meta-analysis as radar plots are provided as a supplemental file.
Summary psychometrics for screening tool meta-analysis.
Tool | Parameter | ||||
---|---|---|---|---|---|
DOR | Sensitivity | Specificity | LR+ | LR- | |
CESD-20 | 14.06 | 0.86 | 0.69 | 2.77 | 0.20 |
HADS-D | 13.90 | 0.77 | 0.80 | 3.94 | 0.28 |
PHQ-9 | 25.69 | 0.77 | 0.89 | 6.79 | 0.26 |
PHQ-2(scored) | 12.78 | 0.55 | 0.91 | 6.26 | 0.49 |
WHO-5 | 19.70 | 0.90 | 0.68 | 2.78 | 0.14 |
Zung-SDS | 15.70 | 0.77 | 0.82 | 4.33 | 0.28 |
DOR: diagnostic odds ratio, LR+: positive likelihood ratio LR-: negative likelihood ratio.
Considering the psychometric and administrative operating characteristics of tools meta-analysed together, only the PHQ-9 and WHO-5 could be administered in ≤5 min, had an ‘easy or average’ level of literacy, were ‘easy or average’ to score, assessed presence of symptoms over a 2 week period in accordance with DSM criteria and displayed better diagnostic accuracy than other tools meta-analysed.
Sampling method was not a source of heterogeneity in estimating pooled psychometric characteristics for the PHQ-9 but sampling method was a source of heterogeneity for CESD-20 analysis. Evaluation of heterogeneity lacked sufficient precision as estimates of random effects were zero for the PHQ-2(scored), WHO-5 and Zung-SDS and evaluation was underpowered for the HADS-D hence estimates for these four tools are of little use.
DiscussionThis is the first systematic review of both the administrative operating characteristics and psychometric characteristics of depression screening tools capable of self-administration in primary care settings for fifteen years. It is also the first review to meta-analyse several psychometric characteristics of these tools where possible. Unlike previous reviews, quality assessment of all included studies was performed. We found the majority of studies had low RoB across most domains of quality and applicability, with the exception of flow and timing where a significant number were unclear primarily because of limited reporting. This provides greater confidence in the interpretation of our results.
This review identified forty unique depression screening tools capable of self-administration from eighty-one studies. In contrast, the 2002 review identified 16 screening tools from 38 studies,21 while the 2017 review identified 55 screening tools from 60 studies.22 We used less restrictive inclusion criteria regarding the aims of included studies therefore more studies were identified. However, we only included tools capable of self-administration where a reference standard was used to interpret screening results and only where studies reported psychometric data for depression screening. Hence, the number of studies included and the number of tools identified differed from the 2002 and 2017 reviews.
When assessing psychometric characteristics, previous reviews focused on sensitivity and specificity. Despite being useful measures of accuracy, these alone are not ideal as measurements vary according to the cut-off point chosen.23 Unlike the previous reviews, our review systematically analysed DOR’s and likelihood ratios (LR’s) and evaluated a-ROC values when reported, in addition to sensitivity and specificity to provide a more thorough assessment of screening tool performance in primary care settings. Further, this is the first systematic review to meta analyse the psychometric properties including the sensitivity, specificity, likelihood ratios and diagnostic odds ratios, providing more precise estimates of these parameters. Diagnostic odds ratios, likelihood ratios and a-ROC values are considered more reliable methods of describing screening tool performance than sensitivity and specificity alone as they enable comparisons without selecting a particular cut-off point.23,24,35,38
Self-administered screening tools used in primary care should be brief, easily administered and easily scored.28 They should also be easily understood by respondents, as screening tool accuracy may be influenced by lower levels of literacy.26,31 Seventeen tools (42%) failed to satisfy one or more of these criteria, rendering them less suitable for use in primary care settings. Based solely on ease of administration, twenty-three tools were considered suitable for use in primary care. However, a meta-analysis of single-question screening tools found that, when used alone, they may only identify 3 in every 10 patients with depression.44 We also found the five single-question tools generally had higher LR- values than other tools, suggesting a negative result was more likely to occur in people with depression. Therefore, only eighteen tools were considered suitable for use in primary-care settings.
Analysis showed the PHQ-9 had the highest DOR and therefore better performance in primary care settings than other tools meta-analysed. The PHQ-9 was also one of the most accurate screening tools based on a-ROC.24 Although the Zung-SDS had slightly higher accuracy (median a-ROC = 0.92), it was considered less suitable in primary care settings as it contained more than twice as many questions, and took around twice as long to administer.
Likelihood ratios are becoming a more popular method than sensitivity and specificity alone, in describing the clinical usefulness of a screening tool as they do not vary with prevalence of depression, unlike predictive values.23,35 Of the tools meta-analysed, not only did the PHQ-9 have the highest DOR, it also had the highest LR + suggesting it was nearly seven times more likely to produce a positive screening result in someone with depression than in someone without depression. In contrast, although the WHO-5 had the second highest DOR, having a lower LR+, it was only three times more likely to produce a positive screen in someone with depression than in someone without depression. Hence, after a positive result, the PHQ-9 greatly increases the chance of ‘ruling-in’ depression and after a negative result, the WHO-5 with the lowest LR-, greatly increases the chance of ‘ruling-out’ depression.24
For screening in primary care, a tool with a higher sensitivity (fewer false-negatives) would be more useful than one with a higher specificity (fewer false positives).45 We found the WHO-5 had the highest sensitivity (90.0%) and would be suitable for use in primary care based on ease of administration (taking ≤5 min to administer). A meta-analysis and several validation studies suggested another way to minimise false negatives and also reduce the burden of administration, is to use the PHQ-2(yes/no), a highly sensitive pre-screening tool, followed by the PHQ-9 after a positive PHQ-2 result.46–48 A tool with a higher NPV is also useful for screening in primary care as it indicates a negative screen is more likely to be a true negative result, reducing the chance of a false negative result.36,45 Of the tools meta-analysed, we found the WHO-5 had the highest median NPV (98.0%) followed by the PHQ-9 (96.5%). Despite the useful information provided by sensitivity, specificity and predictive values,25,49 this psychometric data should be supported by more meaningful measures of screening tool performance such as DOR’s, LR’s and a-ROC values.
Given the current lack of guidelines to assist the choice of an appropriate depression screening tool in primary healthcare, the choice of over 40 screening tools and the considerable variability in operating characteristics, this review assists in selecting an easily administered and psychometrically sound depression screening tool. Our review showed eighteen tools were ultimately suitable for use in primary care based on brevity, ease of scoring and level of literacy. Analysis showed the PHQ-9 had the highest performance and accuracy in primary care settings, suggesting it discriminates between patients with and without depression more effectively than any other screening tool analysed. Furthermore, the PHQ-9 was the most effective tool at ‘ruling-in’ depression after a positive screening result based on a superior LR + . The PHQ-9 also achieved the best balance between sensitivity and specificity of the tools meta-analysed. The PHQ-9 was the most extensively evaluated tool in primary care and validated in more than nine countries, it assesses depressive symptoms in accordance with DSM guidelines and can also assist practitioners in monitoring the severity of depression and an individuals’ response to treatment.19,21 Although other tools were found to be suitable for use in primary care based on their ease of administration, further studies with these tools are required to obtain better estimates of their psychometric properties in this setting.
While ease of administration and acceptable performance are important for selecting a self-administered screening tool, these alone do not ensure the effectiveness of a screening tool. Effectiveness depends on several factors including whether the tool is acceptable to the population and whether the screening provides benefits that outweigh the harms.50 There is evidence that screening for depression has features that may impact on the acceptability to the population, specifically the stigma associated with mental illness.51 Further work is also required to determine what the best setting for screening with self-administered tools is. A study in 2003 found that clinic-based screening identified the largest proportion of patients with depression.52 However, the study also found that screening at home identified an older patient population with chronic illnesses. This is important given the relationship between chronic illness and depression.53
There were several limitations to our systematic review. Only articles written in English were included therefore language bias is possible. We could only meta-analyse psychometric characteristics for tools evaluated by ≥5 studies to avoid a loss of precision. Nonetheless, the most frequently evaluated tools were meta-analysed. Even though all studies were conducted in primary care settings, investigating heterogeneity for the majority of tools studied was not possible because they were evaluated by ≤8 studies. However, random effects modelling was used to adjust for possible heterogeneity between studies for accuracy and threshold of the tools meta-analysed.
ConclusionAlthough we found numerous depression screening tools suitable for use in primary care based on ease of administration, the PHQ-9 was the most widely assessed tool and displayed superior DOR, LR+, a-ROC, and specificity. The PHQ-2(yes/no) may be useful as a pre-screening tool as it is an ultra-brief tool, showed good sensitivity and had a low LR- and was frequently evaluated. Our review supports the use of the PHQ-9 as a brief, easily administered depression screening tool with the most robust psychometric characteristics in primary care settings.
Ethical considerationsNo ethics approval has been sought for this work as the manuscript is a systematic review of published studies conducted by other researchers.
Conflict of interestThe authors declare that they have no conflicts of interest to disclose.
FundingThis research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
No other acknowledgements.
PM contributed to the acquisition of data, the analysis and interpretation of data, interpretation of risk of bias, and drafting and revision of the manuscript.
DN contributed to the conception of the work, the analysis and interpretation of data, interpretation of risk of bias, and drafting and revision of the manuscript.
EW contributed to the acquisition of data, interpretation of risk of bias, and revision of the manuscript.