NINDS CDE Notice of Copyright
Please visit this website for more information about the instrument: Stroop Test
A commonly used version is the Delis-Kaplan Executive Function System (D-KEFS) Color-Word Interference Test (CWIT). The CWIT consists of the three traditional Stroop trials (color naming, color name reading, interference) as well as a fourth trial in which the subject switches back and forth between naming the dissonant ink colors and reading the conflicting color names. The stimulus booklet and forms are copyrighted and included as part of the D-KEFS test kit, but can be purchased separately from the test publisher (Stroop Color and Word Test).
Other versions include the Golden Stroop, Victoria Stroop, and Trenerry Stroop. The Golden and Trenerry Stroops are available from Psychological Assessment Resources, Inc.(PAR). The Victoria Stroop is non-proprietary.
NeuroRehab Supplemental - Highly Recommended
Recommendations for Use: Indicated for studies requiring a measure of executive functioning.
May not be recommended for use in individuals who are colorblind or have loss of color acuity.
Supplemental - Highly Recommended: Myalgic encephalomyelitis/Chronic fatigue syndrome (ME/CFS)
Supplemental: Huntington's Disease (HD), Multiple Sclerosis (MS), Sport-Related Concussion (SRC), and Stroke
|Short Description of Instrument||
Several different versions of the Stroop are available (discussed in more detail below). The Stroop Test involves three trials. In the WORD trial, the subject reads words of color names (e.g., red, blue) printed in black ink. In the COLOR trial, the subject identifies colors (e.g., rectangles printed in red or blue). Finally, in the COLOR-WORD response inhibition trial, the subject must name the color in which a word is presented, while ignoring the printed word. Thus, incongruence between the word's color and identity (e.g., the word "blue" presented in red) requires inhibition and response selection. As noted above, multiple versions of the Stroop test are available (e.g., Victoria, Golden, D-KEFS, and Trenerry versions).
Word reading and color naming are measures of processing speed, while color-word inhibition is a measure of executive functions.
Construct measured: Cognitive flexibility, response inhibition, and processing speed
Generic vs. disease specific: Generic
Means of administration (paper and pencil, computerized): Both Paper&Pencil and Computerized.
Location of administration (clinic, home, telephone): Clinical and Research Settings
Intended respondent (patient, caregiver): Patient
# of items: N/A
# of subscales and names of sub-scales: N/A
Measurements: Type of scale used to describe individual items and total/subscale scores (nominal, ordinal, or [essentially] continuous): Continuous.
If ordinal or continuous, explain if ceiling or floor effects are to be expected if the measure is used in specific HD Subgroups. No floor effects. Individuals with advanced disease may struggle with the interference trial.
The UHDRS version of the Stroop task has been commonly used in HD research. To date, no one version of the Stroop Tests has been shown to be clearly superior to others. Intended use of instrument/ purpose of tool (cross-sectional, longitudinal, diagnostic): Assessment of cognitive function in HD cross-sectional and longitudinal studies. Sensitivity to Change/ Ability to Detect Change: (over time or in response to an intervention): In published cross-sectional (Stout et al., 2011) and internal analyses (PREDICT-HD), the test is sensitive to changes in premanifest HD, especially in individuals who are closer to an expected diagnosis. Unpublished internal analyses of 7-year longitudinal data (PREDICT) also shows changes in rates of change over time in premanifest HD on all subtests, especially color and word naming.
The TRACK-HD study In a cross-sectional analysis of the Stroop WORD found that healthy controls performed significantly better on the than both the early HD and the premanifest HD groups. Longitudinally, the TRACK-HD study found significant differences in rates of change for early HD compared to controls, but did not find significant differences in rates of change for premanifest HD compared to controls.
In Stroop WORD, the TRACK-HD premanifest participants may be less likely to show cognitive effects than the PREDICT-HD Premanifest participants because: (1) they are further from estimated onset based on CAG repeat length and age (Langbehn et al., 2004) and (2) they are potentially less progressed because the TRACK-HD study excluded premanifest subjects based on UHDRS motor scores >= 5. In general, cognitive tests will be more effective metrics in studies of premanifest HD when the focus is on subjects that are close to onset.
Meta-analysis of HD observational studies published 1993-2007 reveals both cross sectional performance differences compared to healthy controls and longitudinal change within HD groups over time for Stroop Reading and Stroop Color that is evident in both premanifest and Early HD. The Stroop Interference findings are less impressive, with smaller cross sectional effect sizes and no significant longitudinal effects (see below).
For research purposes, versions in which the outcome metric reflects the number of stimuli completed by the patient (e.g., Golden Stroop or Trenerry Stroop) may be preferred because they tend to be more normally distributed compared to versions that utilize reaction time as the outcome metric, which are often positively skewed (e.g., DKEFS Color-Word Interference Test or Victoria Stroop).
|Scoring and Psychometric Properties||
Scoring: Scoring differs depending on Stroop version. For Golden and Trenerry Stroops, each trial is based on the number of correct responses in a fixed amount of time, typically within 45-60 seconds (Golden, 1975). Higher scores indicate better cognitive performance. For DKEFS CWIT and Victoria Stroop, each trial outcome is the time required to complete the stimulus page. Lower scores indicate better cognitive performance.
Standardization of scores to a reference population (z scores, T scores): Raw scores can be converted to T-scores for different ranges of age and years of education, depending on norms used. Studies reporting raw scores should control for age and education.
If scores have been standardized to a reference population, it is important to indicate frame of reference for scoring (general population, HD subjects, other disease groups). General population (5-90 years of age; education levels of 2 to 20 years).
Reliability: High reliability across different versions.
Test-retest or intra-interview (within rater) reliability (as applicable): Test-retest reliabilities cover periods of 1 minute to 10 days. Reliabilities for Word, Color, and Color-Word are respectively .88, .79 and .71 (Jensen, 1965) and .89, .84., and .73 (Golden, 1975).
Inter-interview (between-rater) reliability (as applicable):
Internal consistency: Correlations among the subtests are moderate to high (.71 to .84) (Chafetz and Mathew, 2004).
Statistical methods used to assess reliability: See the HD-CAB studies regarding intraclass correlations and reliability data. (Stout et al., 2014;Stout et al., 2017).
Construct validity: The interference score correlates well with measures of attention and prepotent response inhibition (May and Hasler, 1998)
Known Relationships to Other Variables (e.g., gender, education, age): Not valid in color-blind individuals. The color-word interference score is related to aging (Mitrushina et al., 2005). Age and education should be controlled if reporting test scores.
Diagnostic Sensitivity and Specificity, if applicable (in general population, HD population- premanifest/ manifest, other disease groups):
Cross-Sectional sensitivity in PreHD
(Group: Effect Size, P value, # of studies/ total # of HD participants across studies) Cross-Sectional sensitivity in HD
(Group: Effect Size, P value, # of studies/total # of HD participants across studies) Longitudinal sensitivity within subjects
(Group: Effect Size, P value, # of studies/ total # of HD participant across studies)
All Pre: -0.44, 0.001, 13/242;
Near Pre:-0.65, 0.001, 4/152 Early: -1.29, <0.001, 10/220 Dx: -0.65, 0.022, 4/115;
Near Pre: -0.61, <0.001, 2/160;
All Pre: -0.47, <.003, 4/180
All Pre: -0.44, 0.002, 14/260;
Near Pre: -0.87, 0.001, 4/152 Early: -1.35, <0.001, 9/207 Dx: -0.79, 0.008, 3/102;
Near Pre: -0.44, 0.001, 2/160;
All Pre: -0.34, 0.001, 4/180
All Pre: -0.24, 0.065, 18/332;
Near Pre: -0.64, 0.004, 5/158 Early: -1.09, <0.001, 10/184 Dx: -0.15, 0.108, 4/115;
Near Pre: -0.3, 0.215, 2/159;
All Pre: 0, .999, 5/212
Strengths: The color and word subtests are particularly sensitive in cross-sectional and longitudinal studies of premanifest and early manifest HD and may be a useful measure of inhibitory processes. Task has been tested at sites in the United States, Canada, United Kingdom, Australia, Germany, and Spain. Task is easy to administer. Stroop Interference is a well established neuropsychological test measure of inhibition. Substantial literature in mild TBI and sport concussion, Huntington's disease, as well as cognitive aging and neurodegenerative diseases.
Weaknesses: The Stroop requires multiple cognitive processes (e.g., processing speed, inhibition, verbal fluency) and is may be less neuroanatomically specific.
Special Requirements for administration: A stopwatch is required.
Administration Time: Assessment takes approximately 2 minutes for each of the two to three trial types.
Translations available: Spanish (Golden Version), Cantonese (Victoria Version). The UHDRS version is available in several European languages including: Czech, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Polish, Portuguese, Spanish and Swedish.
Stroop JR. Studies of interference in serial verbal reactions. J Experimental Psychol: General. 1935;18:643-662.
Golden, CJ. Stroop Color and Word Test: A Manual for Clinical and Experimental Uses. Chicago, Illinois: Skoelting, 1978, pp. 1-32.
Golden C & Freshwater SM. The Stroop Color and Word Test: A Manual for Clinical and Experimental Uses. Wood
Dale, IL: Stoelting Co, 2002.
Chafetz MD, Matthews LH. A new interference score for the Stroop test. Arch Clin Neuropsychol. 2004;19(4):555-567.
Golden CJ. The measurement of creativity by the Stroop Color and Word Test. J Pers Assess. 1975;39(5):502-506.
Jensen AR. Scoring the Stroop test. Acta Psychol (Amst). 1965;24(5):398-408.
Koga H, Takashima Y, Murakawa R, Uchino A, Yuzuriha T, Yao H. Cognitive consequences of multiple lacunes and leukoaraiosis as vascular cognitive impairment in community-dwelling elderly individuals. J Stroke Cerebrovasc Dis. 2009;18(1):32-37.
Matser JT, Kessels AG, Lezak MD, Troost J. A dose-response relation of headers and concussions with cognitive impairment in professional soccer players. J Clin Exp Neuropsychol. 2001;23(6):770-774.
May CP, Hasher L. Synchrony effects in inhibitory control over thought and action. J Exp Psychol Hum Percept Perform. 1998;24(2):363-379.
Mitrushina MM, Boone KB, Razani J, D'Elia LF. Handbook of Normative Data for Neuropsychological Assessment (2nd ed.). New York: Oxford University Press, 2005.
Murphy CF, Gunning-Dixon FM, Hoptman MJ, Lim KO, Ardekani B, Shields JK, Hrabe J, Kanellopoulos D, Shanmugham BR, Alexopoulos GS. White-matter integrity predicts stroop performance in patients with geriatric depression. Biol Psychiatry. 2007;61(8):1007-1010.
Stout JC, Paulsen JS, Queller S, Solomon AC, Whitlock KB, Campbell JC, Carlozzi N, Duff K, Beglinger LJ, Langbehn DR, Johnson SA, Biglan KM, Aylward EH. Neurocognitive signs in prodromal Huntington disease. Neuropsychology. 2011;25(1):1-14.
Stout JC, Queller S, Baker KN, Cowlishaw S, Sampaio C, Fitzer-Attas C, Borowsky B; HD-CAB Investigators. HD-CAB: a cognitive assessment battery for clinical trials in Huntington's disease 1,2,3. Mov Disord. 2014 Sep;29(10):1281-1288
Stout JC, Andrews SC, Glikmann-Johnston Y. Cognitive assessment in Huntington disease clinical drug trials. Handb Clin Neurol. 2017;144:227-244.
Strauss E, Sherman EMS, Spreen O. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary, 3rd ed. New York: Oxford University Press, 2006.
Thomas M, Smith A. An Investigation into the Cognitive Deficits Associated with Chronic Fatigue Syndrome. Open Neurol J. 2009;3:13-23.
Wall SE, Williams WH, Cartwright-Hatton S, Kelly TP, Murray J, Murray M, Owen A, Turner M. Neuropsychological dysfunction following repeat concussions in jockeys. J Neurol Neurosurg Psychiatry. 2006;77(4):518-520.
Westerberg H, Jacobaeus T, Hirvikoski T, Clevberger P, Ostensson ML, Bartfai A, Klingberg T. Computerized working memory training after stroke - a pilot study. Brain Injury. 2007;21(1):21-29.
Document last updated January 2022