Evaluation of the HealthImpact Diabetes Risk Model in the Veterans Health Administration

BACKGROUND: HealthImpact is a novel algorithm using administrative health care data to stratify patients according to risk for incident diabetes. OBJECTIVES: To (a) independently assess the predictive validity of HealthImpact and (b) explore its utility in diabetes screening within a nationally integrated health care system. METHODS: National Veterans Health Administration data were used to create 2 cohorts. The replication cohort included patients without diagnosed diabetes as of October 1, 2012, to determine if HealthImpact scores were significantly associated with diabetes (type 1 or 2) incidence within the subsequent 3 years. The utility cohort included patients without diagnosed diabetes as of August 1, 2015, and assessed diabetes screening rates in the 2 years surrounding this index date, stratified by HealthImpact scores. RESULTS: The 3-year incidence of diabetes in the replication cohort (n = 3,287,240) was 9.1%. Of 100,617 (3.1%) patients with HealthImpact scores > 90, 30,028 developed diabetes, yielding a positive predictive value of 29.8%. These patients accounted for 9.9% of all incident diabetes cases (sensitivity). Sensitivity and negative predictive value improved with descending HealthImpact threshold scores (e.g., > 75, > 50), whereas specificity and positive predictive value declined. Of 3,499,406 patients in the utility cohort, 85.3% received either a blood glucose or hemoglobin A1c test during the 2-year observation period. Among 101,355 patients with a HealthImpact score > 90, nearly all (98.3%) were screened, and 86.3% had an A1c test. CONCLUSIONS: Our independent analysis corroborates the validity of HealthImpact in stratifying patients according to diabetes risk. However, its practical utility to enhance diabetes screening in a real-world clinical environment will be strongly dependent on the pattern and frequency of existing screening practices.

A n estimated 30 million Americans suffer from diabetes and are at risk for complications such as neuropathy, diabetic kidney disease, and atherosclerotic cardiovascular disease. 1 While early diagnosis and treatment reduce morbidity and mortality, approximately 1 of every 4 individuals with diabetes remains undiagnosed. 1,2 One approach to complementing existing screening practices is to employ diabetes risk models to identify unscreened patients at high risk and target these individuals for testing. Of particular focus are models that can be applied in an automated fashion to electronic health records or administrative claims data. While many diabetes risk models have been developed, there is limited evidence that their existence has had a major impact on improving screening rates. 3 Suggested explanations include ambiguity in targeting patients and difficulty obtaining essential information. 3 For example, models might require data elements that are difficult to extract using automated methods, such as family history of disease or socioeconomic status. 3 HealthImpact is a recently developed diabetes risk model that relies solely on electronic health data, which eliminates some of these inherent challenges. 4 The goal of HealthImpact is to stratify patients according to diabetes risk, where a potential application is to target high-risk patients for diabetes screening. 4 The algorithm includes variables such as International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis codes and drug exposures that can be readily automated within many electronic health records and thus potentially implemented and maintained at relatively low cost. Our interest in applying this algorithm was not to replace • While early diagnosis and treatment reduces diabetes morbidity and mortality, approximately 1 of every 4 individuals remains undiagnosed, and supplemental approaches to screening are needed. • One such approach at a health care system level is to employ automated diabetes risk models using electronic health record data to identify unscreened patients at high risk and target these individuals for testing.

What is already known about this subject
• In an independent replication study, we confirmed that a recently developed computerized model stratifies patients according to risk for incident diabetes. • In our practice setting, however, > 98% of patients labeled as high risk for diabetes by the algorithm received diabetes screening within the context of current standard testing practices. • Since our diabetes screening rate was > 85% overall, the use of computerized algorithms to complement diabetes screening practices may have greater utility in health systems with lower baseline screening rates.

What this study adds
from administrative laboratory data based on a combination of Logical Observation Identifiers Names and Codes values and text field searches. This study was approved by the University of Iowa Institutional Review Board and the Iowa City Veterans Administration Research and Development Committee.

Replication Methods
The first study objective was to replicate the original HealthImpact validation study. 4 The cohort for this objective was constructed based on an index date of October  4 HealthImpact scores were calculated for the index date (October 1, 2012) and used to predict risk for incident diabetes during the subsequent 3 years of follow-up. Incident diabetes was defined as any diabetes medication dispensed or medical encounter coded for diabetes (ICD-9-CM codes 250.x, 357.2, 362.0, 366.41, or 648.0) observed during the follow-up period. While we had access to VHA electronic laboratory data, we did not use these test results in the definition of incident diabetes because lab data were not used in the original HealthImpact development methodology, which we attempted to replicate as closely as possible. 4 Measurement characteristics for predicting incident diabetes were reported at HealthImpact score thresholds of > 50, > 75, and > 90 and included sensitivity, specificity, positive predictive value, and negative predictive value. These thresholds were selected to be consistent with the results reported in the original study, where the developers did not advocate for establishing a fixed threshold score but stressed that different thresholds may be appropriate according to the needs of a particular application. 4

Utility Methods
A separate cohort was constructed to assess the potential utility of implementing HealthImpact to supplement current VHA diabetes screening practices. Inclusion and exclusion criteria were identical to the methods previously described for the replication cohort. An index date of August 1, 2015, was selected to provide the most recent data available at the time that the study was conducted and to observe laboratory tests occurring in the subsequent year. HealthImpact scores were calculated for this new index date according to the previously described methods. The objective was to determine the current frequency existing clinical judgment or current screening practices but rather as a supplemental effort to identify and offer screening to high-risk patients who have not been tested. However, there have been no independently conducted replication studies to date confirming the validity of the HealthImpact model, nor any studies examining its utility as a supplement to existing diabetes screening practices within an integrated health care system.
Our first objective was to replicate the methodology used in the original developmental studies to confirm the predictive validity of HealthImpact in stratifying risk for incident diabetes using national administrative data from our clinical practice environment, the Veterans Health Administration (VHA). 4 While HealthImpact may successfully stratify patients according to risk, it is only useful in real-world environments if it successfully identifies patients who would not otherwise be screened according to existing practices. For example, the VHA endorses diabetes screening practices recommended by the American Diabetes Association, which considers risk factors such as age, body mass index, family history, and other clinical factors. 5 Therefore, our second objective was to evaluate the potential utility of implementing this algorithm by examining diabetes screening rates in the VHA among patients with high HealthImpact scores.

■■ Methods Data Sources and HealthImpact Calculation
National administrative data from the Veterans Affairs Corporate Data Warehouse were accessed via the Veterans Affairs Informatics and Computing Infrastructure. HealthImpact scores were calculated according to published methodology using 3 types of electronic data: demographics, diagnostic codes, and dispensed medications. 4 Scoring is determined by a specified set of 47 regression-based coefficients indicating the presence or absence of model variables and summed at the patient level. A transform function is applied to create a standardized score ranging from 0-100. 4 Higher scores reflect higher risk for diabetes, where values of 50, 75, and 90 have been previously examined as potential thresholds to initiate laboratory testing for diabetes. 4 The developers have not endorsed any one threshold score but note that different values could be adopted based on weighing the balance of sensitivity and specificity for a given application.
Demographic characteristics include age, sex, and ZIP code, which HealthImpact uses as a proxy for race based on the distribution of non-white patients within ZIP code areas. 4,6 Diagnostic and medication exposure variables are considered present if observed within a 1-year time period. To calculate HealthImpact scores using VHA administrative data, we obtained ICD-9-CM diagnosis codes from inpatient admission and outpatient encounter datasets and medication exposure using text field searches of outpatient pharmacy dispensing records by drug name. We further examined the frequency of hemoglobin A1c and blood glucose testing. These tests were extracted of diabetes screening in patients who would be targeted for screening if HealthImpact scores were implemented. Screening frequency was assessed during the 2-year period surrounding the index date and was defined as a laboratory test for either blood glucose or A1c. A1c tests were included because some clinicians may forgo an initial blood glucose test in a variety of clinical circumstances and we would not want these patients to be classified as unscreened in the analysis. In a separate analysis, we also determined the rate of positive test results among patients who received an A1c test. Patients were considered to screen positive for diabetes at A1c values > 6.4% and for prediabetes at values ranging from 5.7%-6.4%. We did not assess the results of blood glucose tests because we could not confidently determine whether samples were collected under true fasting conditions. Furthermore, typical clinical practice during the study time frame was to order an A1c test as followup to elevated blood glucose test results suggesting diabetes.
The observed frequency of diabetes screening was contrasted between patients above and below the selected HealthImpact score thresholds (> 50, > 75, and > 90). Comparisons were also made between these groups for the rate of positive A1c results among those patients tested. Inferential statistics were not used in presenting these comparisons because access to national administrative data allowed us to include the entire population of veterans receiving care within VHA rather than needing to make inferences about the population from collecting a representative sample. Therefore, interpretation of these comparisons was based solely on clinical significance.

HealthImpact Replication
Initial selection included 4,419,977 patients aged 18-79 years with a VHA primary care visit in the 18 months before the index date of October 1, 2012. A total of 1,132,737 patients were excluded for preexisting diabetes, leaving a final analysis population of 3,287,240. The mean age was 56.8 years and the majority of patients were men (91.7%; Table 1). The most common medical conditions included in HealthImpact scoring were hypertension (40.9%) and hyperlipidemia (37.6%). The mean HealthImpact score was 50.8.
Measurement characteristics for predicting incident diabetes were determined at 3 HealthImpact score thresholds ( Table 2). The overall incidence of diabetes within the 3-year follow-up period was 9.1%. A total of 100,617 (3.1%) patients exceeded a HealthImpact threshold score of > 90, of which 30,028 (29.8%; positive predictive value) were diagnosed with or treated with medication for diabetes during the observation period. These cases accounted for 9.9% of all 299,291 incident diabetes cases in the population (sensitivity). In contrast, 51.6% of patients exceeded a HealthImpact score of 50. While sensitivity was higher at this threshold (72.5%), the positive predictive value of 12.9% was considerably lower than the > 90 threshold score and only marginally better than the overall diabetes incidence rate (9.1%), which is the expected value for positive predictive value due to chance. Overall, higher HealthImpact score thresholds produced higher estimates of specificity and positive predictive value but lower estimates of sensitivity and negative predictive value.    (Table 3). Overall, 85.3% of patients received either test within this 2-year period and 57.7% specifically had an A1c test. Stratifying the study population based on HealthImpact score thresholds revealed 2 important findings. First, the vast majority of patients exceeding HealthImpact score thresholds were screened in standard VHA practice. For example, of 101,355 patients with a HealthImpact score > 90, nearly all (98.3%) received some diabetes test during the 2-year observation period, and 86.3% had an A1c test. Therefore, implementing the HealthImpact tool would only identify a small number of patients for screening who would not already be screened by standard practice. Even at the lower HealthImpact score of > 50, more than 90% of patients exceeding this threshold were screened.
A second observation was that HealthImpact thresholds were associated with observed screening rates. For example, the overall screening rate in patients with HealthImpact scores > 90 was 98.3% compared with 84.9% in patients below this threshold (Table 3). Screening rates for A1c followed this same pattern, with 86.3% of patients tested above a score of 90 versus 56.8% below this threshold. This pattern was consistently observed across all HealthImpact threshold values and measures of screening rates. This provides evidence of concurrent validity for HealthImpact scores as this tool apparently taps into some of the same factors VHA providers consider when determining which patients are at risk and therefore should be screened for diabetes.
As a final examination of validity, we contrasted the likelihood of having a positive A1c test (A1c > 6.4), stratified by HealthImpact score thresholds (Table 4). Among patients tested, individuals with a HealthImpact score > 90 were almost 3 times as likely to incur a positive result (14.8%) compared with patients below this threshold (5.6%). Similar results were observed at lower HealthImpact threshold values and also in positive test results for patients classified as having prediabetes (A1c = 5.7-6.4). While not the primary intent of this analysis, it was noteworthy that more than half of patients tested had prediabetes. Relating back to the underlying population of 3,499,406 veterans, the prevalence of prediabetes was at least 29.2% (n = 1,024,745), where this is an underestimate since it assumes all veterans not screened did not have prediabetes.

■■ Discussion
Our results demonstrate the predictive validity for HealthImpact in detecting incident diabetes and generally corroborate the original developmental findings. 4 We observed a higher positive predictive value (29.8% vs. 22.25%) but lower sensitivity (9.9% vs. 32.35%) for patients above a HealthImpact threshold of 90. Specificity (97.6% vs. 94.92%) and negative predictive value (91.4% vs. 96.9%) were similar between studies at this threshold. Supplemental evidence for validity was provided by examining diabetes screening rates, where patients above HealthImpact thresholds were more likely to be screened than patients below HealthImpact thresholds. This observation suggests HealthImpact taps into some of the same factors providers consider when deciding to screen patients for diabetes and is evidence of concurrent validity. Moreover, higher HealthImpact scores were associated with an increased likelihood of screening positive for diabetes among those patients selected for testing. Overall, we confirm the developers' conclusion that HealthImpact provides a valid method for risk stratification of incident diabetes.

Proportion of Veterans Receiving a Diabetes Screening Test, Stratified by HealthImpact Score Threshold
While valid, the relatively low sensitivity of this tool clearly demonstrates that is not suited as a standalone approach to diabetes screening and could not be recommended as a replacement for existing clinical screening practices. Rather, our specific interest was whether this simple and low-cost approach could provide a complement to existing screening practices to identify patients who would not otherwise been tested. Thus, the utility of HealthImpact implementation in a real-world practice environment will be strongly dependent on existing diabetes screening practices. For example, more than 98% of VHA patients with HealthImpact scores > 90 and more than 92% of patients with scores > 50 were screened for diabetes in the current practice environment. Implementing HealthImpact would thus be unlikely to meaningfully increase the number of new diabetes cases identified in VHA. However, the overall diabetes screening frequency of over 85% in VHA is substantially higher than previously observed in community settings, which may be < 50%. 7 The relative utility of implementing an automated risk prediction model, such as HealthImpact, may prove more robust in settings where existing diabetes screening rates are low. Conversely, high rates of blood glucose and A1c testing in the VHA may suggest the potential for over-screening, and automated risk models could be useful in identifying cases where screening is unnecessary. While outside the scope of our current work, this is a potential avenue for future inquiry regarding the utility of diabetes risk models.
While the focus of our analysis was the identification of diabetes, our finding elicited some important observations concerning prediabetes. While only 5.6% of patients who received an A1c test were positive for diabetes, more than half were classified as having prediabetes. Without intervention, 17%-29% of patients with prediabetes will progress to diabetes within 4 years. 8 In addition, lifestyle modifications, including physical activity and weight loss, can be cost-effective ways to decrease progression to diabetes by as much as 58%. 8,9 Therefore, implementation of an automated risk prediction model such as HealthImpact to target and expand diabetes screening may have added benefits in early recognition of the prediabetes state, which may increase the number of patients who are able to prevent or delay progression to diabetes. We estimated the prevalence of prediabetes in the veteran population to be at least 29%, which is consistent with prior estimates of approximately 37% in general U.S. adults. 1 valid generalizability to other clinical populations. This primary limitation to the utility analysis was that we could not observe diabetes screening that occurred outside the VHA and thus our reported screening rates, while extremely high, might still be underestimates of true screening rates.

■■ Conclusions
Our independent analysis corroborates previous reports of HealthImpact as an approach to stratify patient risk for diabetes. However, the practical utility of implementing this model as a complement to diabetes screening in a real-world clinical environment will be strongly dependent on existing screening practices. Specifically, HealthImpact may prove beneficial in health care systems with low diabetes screening rates. Our findings also revealed the potential benefit that adopting a diabetes risk algorithm could have on identifying patients with prediabetes, where early intervention can substantially delay disease progression. Further investigation concerning the validity, utility, and implementation of automated diabetes risk models in clinical practice environments is warranted.