Idolizing Efficient Decision-Making: Simon, Paula, Randy and Signal Detection Theory
March 1, 2008
When I was asked to write a concise description of signal detection theory for the interested traumatologist, I immediately thought of American Idol. Perhaps I should explain. Fans of the show know that the three judges (Paula Abdul, Simon Cowell, and Randy Jackson) each have characteristic talent evaluation styles. Paula seems to like and appreciate everybody, Simon is highly critical and only recognizes the most obvious talent, while Randy is somewhere in between.
In signal detection terms, Paula is highly sensitive to the musical voice. Her tendency is to identify anybody with a shred of talent as worthy of advancing to the next round. Because Simon is much more conservative, his tendency in signal detection terms is to be highly specific. Only the truly most talented contestants clear Simon’s hurdle. There are, of course, trade-offs to each approach. Paula’s style means that she will commit a higher proportion of false positive errors relative to Randy and Simon. What that means is that she will identify a greater proportion of performers as talented relative to the other judges (and be more likely to erroneously conclude someone has talent when they do not). Conversely, Simon’s style is likely to yield the most false negatives or misses. By setting such a high bar, some truly talented performers will go unrecognized by him. Decision making errors of this sort are often referred to in hypothesis testing as Type I and Type II errors.
Signal detection theory, however, permits researchers to take an additional step. Whenever efficient detection of a signal is required, signal detection theory assists researchers in finding the point at which decision-making errors are minimized. This is often accomplished by plotting the hit rate (i.e., proportion of true positives) against the false alarm rate (i.e., proportion of false positives) for all possible cut points, yielding a curve called the receiver operating characteristic or ROC curve (also called the relative operating characteristic or isosensitivity curve; Macmillan & Creelman, 2005).
Figure 1. Identification of the point on the idealized ROC curve that optimally balances hit and false alarm rates by locating intersection of minor diagonal with curve.
One point on the curve will best balance hits and false alarms and that becomes the cut point for decision-making. In Figure 1, that point is demarcated as “X” and it is the point on the curve closest to the upper left quadrant. Functionally, you can think of this as finding the optimum point on a radio dial in which the best reception is received (in fact, this was the original application of signal detection). Although there might still be some static (i.e., decision-making errors), continued fiddling with the dial fails to yield better reception.
The optimum cut point is found by constructing a line that is at a right angle from the major diagonal. The intersection of the minor diagonal with the curve is the point that maximizes the proportion of true positives relative to false positives. In contrast, points that fall on the major diagonal represent chance discrimination where proportions of true- and false-positives are equal. The overall accuracy of discrimination is summarized in the single value “A” which represents the proportion of area on the graph that is beneath the curve (see the shaded area in the figure) and can range from .50 to 1.0 (Swets, 1988). The greater the shaded area, the greater the detection accuracy. ROCs can also be transformed into z-units (called a zROC) by marking the axis in z scores rather than units of equal proportion. This has the effect of straightening the curve with unit slope. Transformed ROCs are useful when the researcher is interested in predicting how much the false positive rate will increase as a function of an increase in the true positive rate (MacMillan & Creelman, 2005).
In psychological assessment, signal detection principles are useful when assessing whether a screening instrument can adequately detect clinical cases that would typically be identified using a more comprehensive, and expensive, criterion measure (e.g., a clinician interview). For example, one might be interested in identifying the cut point on the PTSD Checklist that optimally discriminates PTSD-positive from PTSD-negative cases using an established yet time consuming criterion measure, the Clinician-Administered PTSD Scale. To this end, Weathers, Litz, Herman, Huska, and Keane (1993) found that a score of 50 on the PCL provided the most efficient cut score in a sample of male combat veterans.
Researchers must consider several issues when using signal detection theory. First, results do not always generalize across samples. In contrast to Weathers and colleagues, Walker, Newman, Dobie, Ciechanowski, and Katon (2002) found the optimum cut point for the PTSD Checklist to be 30 with female participants in a health maintenance organization. Thus, sample demographics can influence which cutoff point most efficiently balances hits and false positive rates. This, in turn, has implications for the ecological validity of a cutoff point and its generalizability to diverse populations within the same diagnostic class.
Second, selected cutoff points should take into account the purpose of the instrument’s use. For example, if the goal is to identify all possible cases of PTSD or to screen for individuals who may benefit from treatment targeting traumatic stress symptoms, lower cut points will be more sensitive and identify a greater proportion of cases as being PTSD-positive (i.e., the Paula Abdul model). Conversely, if a researcher’s goal is to maximize internal validity by creating homogeneous groups of PTSD-positive participants, then greater specificity in screening is required and a higher cut point would be justified (i.e., the Simon Cowell model). Thus, what one considers the most “efficient” cut point must take into account information extrinsic to the data and is inextricably tied to the purpose for which the instrument is used and/or available organizational resources.
Third, the ability of the screening instrument to successfully detect a signal (e.g., a diagnosis of PTSD on the CAPS) is constrained by the reliability and validity of the criterion measure. This would be analogous to the difficulty one has tuning a radio when the signal sent by the station is weak or variable. Although signal detection theory assumes perfect fidelity in the signal (i.e., no criterion measurement error), this is a standard rarely, if ever, reached in the psychological assessment of human beings. Nonetheless, signal detection is useful for calibration purposes and, rather than insist on a gold standard criterion, most signal detection workers acknowledge that the criterion simply be the best available measure (Kraemer, 1992).
Thus, signal detection permits researchers to do more than simply classify decision-making errors. Its application in an assessment context can assist clinicians in identifying the criterion by which to make diagnostic decisions that are most efficient for a particular purpose. In this sense, as Randy Jackson would say, signal detection helps clinicians in “keeping it real, dog.”
Kraemer, H. C. (1992). Evaluating medical tests: Objective and quantitative guidelines. New York: Sage Publications.
MacMillan, N. A., & Creelman, C. D. (2005). Detection theory: A user’s guide. Mahwah, NJ: Erlbaum.
Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285-1293.
Walker, E. A., Newman, E., Dobie, D. J., Ciechanowski, P., & Katon, W. (2002). Validation of the PTSD checklist in an HOM sample of women. General Hospital Psychiatry, 24, 375-380.
Weathers, F. W., Litz, B. T., Herman, D. S., Huska, J. A., & Keane, T. M. (October, 1993). The PTSD Checklist (PCL): Reliability, validity, and diagnostic utility. Paper presented at the annual meeting of the International Society for Traumatic Stress Studies, San Antonio, TX.