INTRODUCTION

Physicians’ clinical decisions frequently deviate from evidence-based care as reflected, for example, in clinical practice guidelines.1,2 These deviations from evidence-based guidelines, resulting in variations in clinical practice, may be appropriate for selected patients, but policymakers must understand the nature and extent of these deviations to be reassured that these clinical decisions are not causing harm or increased costs. The challenge has been to measure the variation in care and account for contributing factors, some of which (for example, case mix or patients’ financial status) are not under clinicians’ control, and to detect those unwarranted variations that can be associated with inefficient resource use and—sometimes—unnecessary risk to patients.

In this literature review, we assess different methods of measuring variations in physician decisions. We focus in particular on techniques that may support research into what organizational features and payment policies promote evidence-based decisions in individual clinical scenarios that contribute substantially to health care use and costs. Although each method has strengths and weaknesses, we devote most of our attention to clinical vignettes as an approach worthy of further research, given that these tools are presently the most feasible method to measure variations in individual physician decisions about pertinent diagnostic and treatment options.

COMMON APPROACHES TO MEASURING POINT-OF-CARE CLINICAL DECISIONS

Although technology in the field is evolving, researchers have regularly used several methods to measure variations in physician point-of-care decisions (Appendix A). Two methods—medical record abstraction and claims data analysis—are based on readily available data on the care of actual patients, but using these approaches generally requires sophisticated statistical analysis to control for differences in patient case mix among providers or across settings. Two other options—standardized patients and clinical vignettes—require primary data collection from clinicians, thus increasing the clinician burden of the research. By using the latter approaches, however, one can directly measure physician decisions and control for case mix by soliciting a decision on a single case or a consistent set of cases from all sampled providers (Table 1).37

Table 1. Traditional Methods of Measuring Variations in Physician Decisions

Medical Record Abstraction

Medical record abstraction relies on a trained chart abstractor to review clinical records and produce a data set of physician decisions as physicians themselves record them.3 The availability of chart records, as well as the fairly low burden placed on physicians or medical practices to provide these data (which are generated in the course of routine patient care), are strong advantages of this method. However, medical information that cannot be extracted automatically from an electronic health record must be abstracted manually by trained researchers, the time4,7 for and expense3,4,5,6 of which may severely limit the sample size that can be included in an analysis. Both handwritten and electronic medical records also suffer from “recording bias,” in that not all relevant medical data or services may be recorded.3,5,6,8

Claims

As a record of physician point-of-care decisions, computerized administrative claims data share many of the advantages of medical record abstraction, being both widely available and requiring no provider time for data collection. These data are also fairly inexpensive to gather, avoiding the costs of hiring medical record abstractors, administering surveys, and using standardized patients (see below). Moreover, these advantages tend to increase the sample size of claims-based analyses, permitting more generalizable results. Yet, for many services (for example, advanced imaging), claims will only reliably identify the provider who is paid to perform the service, while the provider who decides to order the service (and the parameters of that decision) is of greater interest to policymakers. Claims also normally do not contain all the clinical data,4 such as patients’ symptoms or detailed elements of their medical history, that can shape physicians’ point-of-care decisions but do not affect reimbursement, and many clinical decisions (such as referrals) are not reflected in claims at all.5

Standardized Patients

Standardized patients, used in what is often considered the “gold standard” approach to measuring physician decisions, are trained actors who observe physician performance. Actors are asked to portray a particular patient history or set of characteristics (for example, a propensity to demand tests) during a clinical visit and to document the services they receive during the encounter. Like medical record abstraction, the use of standardized patients is presently valuable on a small scale, but likely unrealistic in large-scale studies of variations in point-of-care decisions across diverse communities and practice settings.5,6 The major limitations are the high cost of training and compensating standardized patients,36,12 and the logistical challenges of organizing and coordinating their visits.6 Accordingly, studies using standardized patients will necessarily involve small samples. Importantly, too, providing care for standardized patients takes physician time away from caring for real patients, burdening physicians and their practices to a much greater extent than would other methodological approaches.3,4

INTRODUCTION TO CLINICAL VIGNETTES

Given the challenges posed by the use of medical record abstraction, claims data analysis, and standardized patients to affordably and reliably measure variations in clinical decisions across settings and specialties, the most feasible method may be a fourth option: physician surveys using clinical vignettes—that is, simulated patient cases. A vignette case generally specifies a hypothetical patient’s age, gender, medical complaint, and health history (see Appendix A). Based on the details provided in the case, the respondent is asked to answer one or more questions regarding diagnosis or treatment of the patient (Table 2).4

Table 2. Example of a Clinical Vignette Closed-Ended Question Format

Vignettes may be administered on paper, by telephone, or in person, or they may be computer administered, sometimes incorporating an audio or video recording13 of the patient’s responses. They have been used in a wide range of settings, including medical licensing and board certifications,4 the training of medical students,4,7 and continuing medical education courses.14 Researchers have used this tool to explore variations in physician decisions, both to characterize the extent of variation that exists15 and to assess factors that might contribute to it.16 Vignettes have also figured importantly in studies of the influences of patient race17 and gender18 on physicians’ evaluation and treatment decisions.

Clinical vignettes are likely less expensive to use than both standardized patients and manual medical record abstraction,3,4,7 perhaps even after the costs of instrument development and administration are taken into account. Just as importantly, they are free from the challenges posed by incomplete patient medical records or claims data.4 Logistically, clinical vignettes are more practical6 and less burdensome to physicians than using standardized patients,3 and data collection is easier and faster than with medical record abstraction.4,7 Importantly, given these advantages, sample size in a vignette study is likely to be substantially larger than is feasible with standardized patients or manual medical record abstraction.4,7 (As described in Table 1, the low cost of automated EHR abstraction is currently offset by its limitations in many clinical scenarios.)

Clinical vignettes do have a number of limitations, however. Since they inquire about the treatment of hypothetical patients outside of real-world contexts (for example, without the effects of practice-level influences or time constraints on physicians),4 physicians’ responses may not reflect what occurs in actual practice.3 , 4 Peabody et al. (2000), for instance, raise the notion of “social desirability bias,” which may cause physicians to respond to vignettes based on their knowledge of how they should practice rather than how they actually practice. For instance, a physician who is far behind in seeing patients might not perform examinations he or she recognizes would be recommended (and would likely report in a vignette).3,4

Like any measurement instrument, the clinical vignette approach may also suffer from high costs of instrument development and validation, as well as non-response bias—concerns avoided by claims data analysis, and perhaps, by medical record abstraction. It is important to note that instrument development costs will increase if vignettes must be regularly updated as clinical guidelines change or if different vignettes are required for different specialized roles or practice settings. Lastly, unlike medical record abstraction or claims, clinical vignettes impose burden on physicians (albeit fairly minimal) by requiring them to submit a survey response.7

CONSIDERATIONS FOR DESIGNING AND ADMINISTERING CLINICAL VIGNETTES

Research by Peabody et al. (2000, 2004) provides important guidance on designing and administering clinical vignettes that accurately measure actual physician behavior. Important attributes of vignettes used in their studies include: (1) allowing open-ended responses, (2) presenting realistic time constraints, (3) offering patient cases with varied and realistic levels of clinical complexity, (4) providing real-time information in response to physicians’ answers, and (5) using a design that detects both necessary and unnecessary care.3,7

Designing clinical vignettes that yield valid results requires a thorough understanding of the study purpose, insight into the study population, and an appreciation of the need to balance cost and rigor. In this section, we describe different options for vignette design and conclude with tables of decision points (Table 3) and other considerations (Table 4) that should be weighed in designing a relevant, cost-effective vignette likely to generate responses that accurately reflect physician practice.

Table 3. Decision Points for Constructing a Clinical Vignette
Table 4. Other Design Considerations for Developing a Vignette Survey

Selecting Decisions to Study

Not all clinical decisions are appropriate for measurement with vignettes. Decisions are best suited for the vignette approach when they occupy a middle ground with respect to their evidence base; that is, there should be clear evidence indicating an appropriate choice for patients with certain characteristics, but the “right” answer should not be so obvious as to fail to solicit a variation in responses or raise concerns regarding social desirability bias. For instance, vignettes regarding the decision to counsel on smoking cessation are more likely to face social desirability bias (and insufficient variation in responses) than decisions where the best practice is less universally known. The current state of practice among practitioners being surveyed should also occupy a middle ground; assessing decisions that are already known to be universally present or absent is likely to be of limited value. Finally, as the Choosing Wisely initiative23 has recognized, decisions will ideally be such that variability has consequences of interest to policymakers, whether because of variation in cost or variation in risk to patients.

Open-Ended Versus Closed-Ended Questions

Clinical vignettes may use either open-ended or closed-ended questions. Open-ended questions rely on free response; respondents provide written (or typed) answers to how they would care for the patient, without any prompts or limitations to guide them.9 Closed-ended questions can be variously structured, requiring respondents to make a selection from a checklist, mark “yes” or “no” for a series of items, select an option from a multiple choice list, rank items, or make a selection within a range on a Likert scale (Table 2).

The open-ended or closed-ended structure of questions used by a clinical vignette can affect data quality. Vignettes that offer open-ended responses allow the physician to report what he or she would do in a given situation, without guidance or cueing. Although closed-ended vignettes are used in many applications, including the U.S. Medical Licensing Examination (USMLE),24 the presentation of response options in closed-ended vignettes may cue a physician to respond in a certain way, especially if he or she views one or more options as “correct” or believes the researcher is seeking a specific answer (social desirability). Choices based on what is thought to be the correct response may be inconsistent with actual behavior or decisions in practice and can result in an overestimate of performance.9

Comparing open-ended and closed-ended vignettes directly, Pham et al. (2009) observed that closed-ended vignettes yielded a higher rating of quality of care than responses to identical vignettes presented in an open-ended format.9 Closed-ended responses may reflect both a “cueing” effect and testing ability, confounding assessment of clinical decisions. Overall, although closed-ended vignettes may generally accord with actual practice behavior, open-ended vignettes may result in stronger criterion validity and be better able to distinguish among the decisions of physicians.9,25

Question Format

When constructing a closed-ended vignette, careful selection of question format is essential. One option, a dichotomous “yes/no” response (example 1 in Table 2), is easy to administer and interpret, but may result in bias if some respondents are undecided.19 Responses to a multiple choice question (example 2), although also simple, may likewise be biased, unless the responses provided represent all options a physician would consider for the given scenario in practice.

Likert scale questions solicit answers to the question of “how likely” or “how often” a physician would make a given decision (examples 3a, 3b). A Likert scale format may be appealing because of its familiarity; however, two physicians may interpret the same term on the scale differently.20 Providing categories with ascending or descending numerical values (example 3b) avoids this weakness, but researchers should ensure the numerical range is distributed equally across categories. For example, if numeric categories are narrower toward the endpoint of a scale, respondents may make conclusions about the average frequency and adjust their responses accordingly.20

A fill-in-the-blank question that solicits a numeric value along a range is, in a sense, a compromise between open-ended and closed-ended formats (example 4). By allowing respondents to provide a free-form numerical answer, the question avoids any bias imposed by providing closed-ended options and is also much easier to “score” than an open-ended question,19 especially if a numeric response is mandated (as permitted by computerized vignettes). In a mail survey, however, numeric fill-in-the-blank items may be more prone to errors in data entry than numeric Likert scale questions that are truly closed-ended.19

Mode of Administration

Clinical vignettes can be administered in hard copy (paper and pencil), by telephone or in-person interviewing, and/or by computer or tablet. Factors to take into account when selecting a mode include location of respondents, respondent access to and comfort with computers, vignette design, social desirability bias (typically greater with telephone or in-person interviews), and budget constraints (see Table 3).

Realism

To simulate most closely the experience of providing patient care, vignettes should present patient cases that are similar in complexity to those seen in actual clinical practice.4,7 Incorporating audio or video to present the patient case may be one means of achieving this result.13 Another is to use a computerized vignette that imposes a sequential order on the physician’s response—that is, a physician may not change his or her planned physical examinations after selecting a treatment plan, for instance.3,7

Vignettes should also avoid prompting “satisficing,” or the act of providing a nonoptimized response—a common occurrence among survey respondents. Satisficing can result when a task is too difficult for the respondent or the respondent’s motivation to participate is low. Offering interesting vignettes of appropriate complexity can help reduce satisficing.21 Optimizing vignettes to minimize satisficing across a randomized sample of physicians who treat patients of varying complexity will invariably compromise the ability to target the vignette content to the responding physician, however.

Establishing Validity

Ideally, the criterion validity of vignettes would be reported in each research article. This is both time and cost prohibitive, however, as standardized patients are the gold standard comparison group. At a minimum, content validity should be strengthened by having the vignettes reviewed by clinical experts to ensure they accurately depict the situations under examination and that the question types, formats, and response options are appropriate.

Pre-Testing Vignettes

Vignettes should be pre-tested with physicians who have the same or similar characteristics as those in the target population.4 Pre-testing (followed by cognitive interviews) enables the researcher to ensure the instructions are easily understood and the vignette is easily interpretable. The questions and response options should be reviewed for clarity and correspondence to the vignette.4 Also, the vignette should yield results that have some degree of heterogeneity so differences can be detected. During pre-testing, burden on the respondent should be assessed by recording the time needed to complete the task.

Cognitive testing should assess for vignette equivalence—that is, check that all respondents identically interpret a vignette and do not make additional assumptions about the case being presented.22 For instance, the description of the symptoms of a vignette patient with heart failure should convey the same level of severity to all respondents, and the name provided for the hypothetical patient should not lead respondents to impose their own assumptions about the patient’s insurance status. Researchers should also take care to ensure their vignettes do not include excessive amounts of extraneous information that might “trick” respondents into providing responses that differ from their actual practice.4

Administration

The instructions that present a vignette to respondents may affect the accuracy of their responses. Researchers should be clear that the purpose of a vignette survey is not to test “textbook” answers or even to assess the performance of individual physicians, but rather, to obtain an understanding, in the aggregate, of the physician decisions that occur in practice. Offering anonymity to respondents may help elicit truthful responses, but it may also make the resulting data less useful (due to the inability to link to other data sources) and follow up with nonrespondents more difficult.4 Promising confidentiality and to limit analyses to only those with reasonably large sample sizes may be an effective compromise.

Validation studies have imposed time constraints on vignette responses to partially replicate the demands of a real clinical setting. The evidence suggests this constraint is important; clinicians who use vignettes as an opportunity to demonstrate their proficiency may act differently when subject to the time pressures of seeing actual patients.26 Although time constraints cannot realistically be enforced in a mail survey, the time to complete the vignette can be monitored when surveys are done online. Respondents given assurances of anonymity also may feel less pressure to perform and are thus less likely to spend an inordinate amount of time completing the vignette (alternatively, respondents might spend less time than they would in an actual clinical encounter).

KNOWLEDGE GAPS

Although vignettes have been used extensively, a number of questions remain about how the methodology should best be applied. Research on general physician surveys likely offers a few valuable lessons, but additional research is warranted to provide further guidance on how a vignette survey should be constructed to measure physician decisions and the effects of system-level and practice-level factors.

Formal Validation Outside Primary Care Settings

Research has formally validated clinical vignette methodology using an open-ended instrument in primary care settings and for a few conditions (see Appendix B).3,57,27 Rigorous testing of vignettes in a number of different settings, across specialty types, and using a range of vignette designs, would add much value to the body of evidence. Also valuable would be research on the ability of vignettes to capture the influence of particular contextual factors, including financial incentives, on physician decisions at the point of care.

Number of Vignettes

Of critical importance is the number of clinical vignettes needed within a single instrument to characterize physician behavior reliably over a given dimension of care, which may be as narrow as a single clinical scenario. Although it is clear that vignette responses of individual physicians should not be interpreted as representative of practices beyond those explicitly measured by the vignette,3 the evidence offers few insights on the relationship between the accuracy of a vignette survey and the number of vignettes that are used within the instrument. While multiple vignettes are likely to be needed,4 including too many vignettes in one instrument may reduce the quality of responses.28

Use of Closed-Ended Vignettes

As previously discussed, closed-ended vignettes may bias responses for some clinical scenarios by limiting them to a list of suggested options from which respondents may choose. More research is needed on the topic of vignettes for which the problems of a closed-ended structure emerge or are most severe. Knowing more about approaches to counter this bias would also be valuable for constructing an effective vignette survey.

Question Format

Despite multiple options for vignette question format (for example, dichotomous, multiple choice, and Likert scale), research into the implications of each has generally been limited to opinion surveys. A direct comparison of the various approaches would provide insights for future vignette surveys. Understanding which question type is ideal for a given scenario or clinical decision, for instance, would inform the development of vignette survey instruments, enhancing their validity.

CONCLUSIONS

Researchers interested in better understanding the causes of variation in physicians’ clinical decisions at the point of care can choose from a range of approaches, including medical record abstraction, claims analysis, use of standardized patients, or clinical vignettes. Although clinical vignettes may not be appropriate for measuring variations in all clinical decisions, their use has a number of advantages over the currently feasible alternatives, including the ability to control for differences in case mixture, avoid challenges posed by incomplete or inaccurate patient data, and feasibly generate a large sample size. Given these potential near-term advantages and the relatively limited research into the best practices for administering a paper-based, closed-ended vignette survey, further research into vignette methodology would be worthwhile. In addition to further validation tests across specialties and settings, analyses that examine vignette question design, psychometrics, and ability to accurately capture the influence of contextual factors on physician decisions will be particularly valuable.