Incidence and prevalence of systemic sclerosis and systemic sclerosis with interstitial lung disease in the United States

BACKGROUND: Although systemic sclerosis (SSc) with interstitial lung disease (SSc-ILD) is a serious condition and incurs a substantial clinical burden, the epidemiology has not been well characterized. OBJECTIVE: To estimate the incidence and prevalence of SSc and SSc-ILD among commercially insured adults in the United States. METHODS: Adults with medical claims between 2011 and 2016 for SSc or SSc-ILD with and without high-resolution computed tomography scans were identified from the Optum Clinformatics Data Mart. Incidence and prevalence were calculated as rates per 100,000 person-years and 100,000 people, respectively. The crude and age- and sex-adjusted prevalence and incidence of SSc and SSc-ILD were estimated and stratified by year and geography. Sensitivity analyses were conducted based on different cohort identification algorithms. RESULTS: Overall, the crude incidence rates of SSc and SSc-ILD were 16.4 and 1.2 per 100,000 person-years, respectively, and the crude prevalence was 24.4 and 6.9 per 100,000 people, respectively. Patient characteristics were generally similar between the SSc and SSc-ILD groups. Mean age range was 59.2-59.9 years and 61.8-62.9 years in the SSc and SSc-ILD groups, respectively. SSc had an age- and sex-adjusted incidence rate of 15.1 per 100,000 person-years and an adjusted prevalence of 25.9 per 100,000 people. The adjusted incidence rate of SSc-ILD was 1.1 per 100,000 person-years and the adjusted prevalence was 7.3 per 100,000 people. CONCLUSIONS: This study provides current estimates of the national incidence and prevalence of SSc and SSc-ILD, which have not been previously well characterized. Further research in the future may help to support health management strategies and resource allocation for adults with SSc and SSc-ILD in the United States.

Systemic sclerosis (SSc) is a complex autoimmune connective tissue disease characterized by vasculopathy or blood vessel damage, immunologic abnormalities, and extensive fibrosis of the skin and internal organs. 1,2 SSc may be triggered by environmental events in genetically susceptible individuals. 3 SSc affects the lungs, heart, kidneys, gastrointestinal tract, and musculoskeletal system and has a substantial impact on morbidity, mortality, and quality of life. [4][5][6] Although a rare disease, SSc poses a substantial economic burden on the health care system and society. 4,7,8 In the United States, the average annual medical cost of patients with SSc, driven by ambulatory services, inpatient services, and medications, is more than 3 times greater than in those without SSc ($17,365 vs. $5,508; P < 0.001). 9 Evidence of interstitial lung disease (ILD) has been observed in more than 90% of patients with SSc (SSc-ILD) on autopsy and in 40%-75% of patients with SSc based on lung function changes. 10,11 Risk factors associated with the development and progression of ILD in patients with SSc include older age at disease onset, presence of diffuse cutaneous SSc, presence of anti-Scl-70/anti-topoisomerase I antibodies and/or absence of anticentromere antibodies, and African American ethnicity. [12][13][14] ILD is characterized by chronic inflammation and fibrosis that progress to respiratory failure and death. 15,16 Despite the seriousness and substantial clinical burden of SSc-ILD, the economic burden has not been well characterized except for a recent publication, reporting significantly higher health care resource utilization, direct and indirect costs, and work loss among patients with SSc-ILD compared with those without SSc-ILD. 17 Clinical heterogeneity and multiple organ manifestations of SSc make diagnosis and management particularly challenging, and the epidemiology remains difficult to establish. 5,18,19 Worldwide estimates of the incidence and prevalence of SSc vary by region, [20][21][22][23][24][25][26] with higher rates observed in the United States, Europe, Australia, and Argentina and lower rates in Scandinavia, Japan, the United Kingdom, Taiwan, and India. 23 Notable variations within the United States have been reported using different methodologies. [24][25][26] Furthermore, most of these estimates were made before the current diagnostic guidelines were established. 27 The epidemiologic challenges in SSc are also present in SSc-ILD, for which the incidence and prevalence remain largely unknown.
Understanding the occurrence and patient distribution of SSc and SSc-ILD in the United States is important for appropriately allocating and managing clinical-and population-level resources to counteract these debilitating conditions. The objective of this study was to estimate the crude and adjusted incidence and prevalence rates of SSc and SSc-ILD in the United States and to stratify the estimates by year and geography.

STUDY DESIGN
This retrospective cohort study was conducted using the Optum Clinformatics Data Mart (Eden Prairie, MN), which includes administrative claims (medical/pharmacy claims and linked demographic information) for more than 180 million commercially insured individuals in the United States. 28 All data were aggregated and deidentified to maintain confidentiality. This retrospective study was exempt from institutional review board approval because the claims data used did not contain any identifiable patient information and prospective data collection was not planned or implemented. Diagnoses of SSc and SSc-ILD were based on the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) and International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnostic codes (Supplementary Table 1, available in online article).

BASE CASE DEFINITIONS OF SSC AND SSC-ILD
Prevalent cases were identified between January 1, 2011, and December 31, 2016. Incident cases were identified during the same period with a 1-year look-back period to January 1, 2010, to rule out any previous diagnosis. In the base case analysis, prevalent SSc cases were defined by the presence of ≥ 2 medical claims for SSc on different dates within a 1-year period between January 1, 2011, and December 31, 2016. The date of first SSc diagnosis was defined as the index date. Incident SSc cases were required to have 1 year of continuous health insurance plan enrollment before the index date and ≥ 1 claim for SSc with no previous SSc diagnosis. Prevalent SSc-ILD cases were required to have ≥ 2 medical claims for SSc on different dates within 1 year and ≥ 2 medical claims for ILD on different dates within the entire study period. The later date of the first SSc or the first ILD claim was used as the index date. Incident SSc-ILD cases had ≥ 1 medical claim for SSc diagnosis,

STATISTICAL ANALYSES
Descriptive statistics were used to summarize demographic characteristics. Median and range and mean and standard deviation were reported for continuous variables, and before the first SSc or ILD diagnosis (whichever came first) and no previous SSc or ILD diagnosis during that period.

OUTCOMES AND ASSESSMENTS
Demographic characteristics including age, sex, insurance type, and geographic location or region were reported. Crude and age-and sex-adjusted prevalence and incidence

PATIENT DEMOGRAPHIC CHARACTERISTICS
The database contained approximately 48 million health plan enrollees eligible to be considered for the SSc and SSc-ILD prevalence cohorts and nearly 22 million eligible for the SSc and SSc-ILD incidence cohorts. Patient attrition for each cohort is provided in Figure 1.
The demographic characteristics of included patients are summarized in Table 1 frequency and percentage were reported for categorical variables. Incidence and prevalence were calculated as rates per 100,000 person-years and 100,000 people, respectively. The rates were adjusted by age and sex using weights based on the 2014 U.S. Census Bureau survey with 95% CIs reported for each adjusted rate. 29 The adjusted rate r adj = ∑ i w i r i , where w i is the adjusted weight based on U.S. population for the ith age and sex stratum, and r i is the stratified crude rate.
Sensitivity analyses were conducted to assess differences in the prevalence and incidence with different SSc and SSc-ILD case definitions, including (a) SSc prevalent cases defined by ≥ 1 medical claim (vs. base:  The adjusted annual incidence of SSc appeared to have decreased slightly over the study period, whereas the adjusted annual incidence of SSc-ILD remained relatively stable (Figure 2). The change from ICD-9-CM to ICD-10-CM in 2015 should be considered when interpreting the results. Statelevel incidence and prevalence of SSc and SSc-ILD during the study period are presented in Supplementary  Figures 1 and 2 (available in online article), respectively. Notably, in Vermont, Hawaii, Rhode Island, the Dakotas, Connecticut, Wyoming, Maine, and New Mexico, the incidence and prevalence estimates were high, but in Alaska and Puerto Rico, the corresponding number was zero (Supplementary Table 2, available in online article). Similar patterns were observed for both SSc and SSc-ILD.
Sensitivity analyses of different case definitions showed expectedly higher estimates when only 1 diagnosis code was required. The SSc-ILD case definition was most sensitive to the exclusion of an HRCT or CT scan requirement (Table 3).

Discussion
This retrospective cohort study reports the incidence and prevalence of SSc and SSc-ILD among insured U.S. adults over a 6-year analysis period. The overall age-and sex-adjusted incidence and prevalence estimates of SSc in the United States were 15.1 per 100,000 person-years and 25.9 per 100,000 people, respectively. The overall adjusted incidence and prevalence of SSc-ILD were estimated to be 1.1 per 100,000 person-years and 7.3 per 100,000 people, respectively. Given the lack of validated disease cohort definitions based on claims data sources, we examined several In total, 630 incident SSc-ILD cases were identified, with crude and adjusted incidence rates of 1.2 and 1.1 (95% CI = 1.0-1.2) per 100,000  An analysis of the Olmsted County (Minnesota) medical records database between 1980 and 2010 reported a range of SSc incidence rates from 1.4 (narrow criteria) to 2.4 (broad criteria) per 100,000 person-years. 30 These "narrow" and "broad" case definitions from manually extracted medical records were based on the 1980 American College of Rheumatology diagnostic criteria and another set of clinical criteria published in 1988, which were reviewed and adjudicated by a single rheumatologist. 30 It appears to be a more stringent algorithm. The data sources, sample sizes, and case definitions were very different from those used in our study. The Olmsted County study also reported an SSc point prevalence estimate of 39.9 per 100,000 people on December 31, 2010, 30 which was similar to the crude prevalence estimate of 27.7 per 100,000 people reported for Minnesota in our study.
A recent systematic review, including the studies discussed above, reported annual incidence and prevalence estimates of SSc in the United States ranging from 1.4 to 5.6 and 13.5 to 44.3 per 100,000 individuals, respectively. 31 The review also reported the high variability in the incidence and prevalence of SSc-ILD. Given the lack of consensus on how SSc-ILD should be diagnosed, the variability could be attributed to the difference in methods for diagnosing ILD such as pulmonary function tests and HRCT. 31 Differences in the methodologies (case definitions and statistical adjustments) and data sources (geographic person-years, much lower than the 15.1 identified here. 24 The study defined cases of SSc as insured adults with ≥ 1 inpatient medical claim or ≥ 2 outpatient or emergency room medical claims at least 30 days apart, whereas no minimum gap between 2 medical claims of SSc was required in our study. The data source was much smaller (approximately 35 vs. 180 million covered lives) and may have had different

FIGURE 2
Age-and Sex-Adjusted Annual Incidence (95% CI) of SSc and SSc-ILD 2004 to 2013. 38 Furthermore, most patients with rheumatoid arthritis-related ILD were women. 38 When interpreting the results, the strengths and potential limitations of this study should be considered. This is one of the largest and most recent studies on the incidence and prevalence of SSc and the first for SSc-ILD in the United States. Our results provide insights into the epidemiology of SSc and SSc-ILD chronologically and by state-level geography across the United States. Considering the lack of clinical information and validated case definitions in administrative claims data, we applied several case definition scenarios to identify patients with SSc or SSc-ILD. Research exploring our observed sensitivity of estimates to imaging procedure criteria may help solidify more consistent approaches in the future.

LIMITATIONS
Our findings may not be generalizable to populations outside the Optum Clinformatics Data Mart administrative claims database. Case definition variables were limited to those available in the data sources used for this study as we did not have access to the medical records. Not all demographic information is captured in claims data, such as race or ethnicity. Administrative claims may also have missing or misclassified data due to variability in medical coding practices. Moreover, a diagnosis code may be a marker for a rule-out criterion instead of the actual disease. 24 Conversion from ICD-9-CM to ICD-10-CM may have mapping issue and cause sample loss in patient identification. Incident cases may not have been true newly diagnosed cases because region and study period) are likely responsible for the variability in epidemiological estimates over the years. [23][24][25][26][32][33][34] The prevailing age range and sex of patients with SSc and SSc-ILD have not varied widely, as patients aged older than 45 years and women have been consistently prominent in the populations. 24,26,30,32,34 Consistent with the results of this study, results of previous studies have shown that more than 80% of patients with SSc are women and are aged 45 years or older. 24,26,30 The prevalence of other autoimmune diseases such as systemic lupus erythematosus (SLE) also varies due to small study populations, differences in case definitions, and study methods. 35 Estimates from the California Lupus Surveillance Project showed that the age-standardized annual incidence and prevalence of SLE were 4.6 per 100,000 person-years and 84.8 per 100,000 people, respectively, for 2007-2009. 36 Women had a higher incidence than men (8.6 versus 0.7 per 100,000 person-years). Results from the Manhattan Lupus Surveillance Program showed that the age-standardized incidence and prevalence of SLE were 4.6 per 100,000 person-years and 62.2 per 100,000 people, respectively, for 2007-2009. Similarly, women had an almost 8 times higher incidence than men (7.9 versus 1.0 per 100,000 person-years). 37 Results of a retrospective cohort analysis of U.S.-based administrative claims databases showed that the incidence and prevalence of ILD associated with rheumatoid arthritis-another autoimmune disease-ranged from 2.7 to 3.8 cases per 100,000 people and 3.2 to 6.0 cases per 100,000 people, respectively, from

Conclusions
This study is the most recent and, to our knowledge, the most robust as-

ACKNOWLEDGMENTS
Writing, editorial support, and formatting assistance was provided by Suchita Nath-Sain, PhD, and Jeff Frimpter, MPH, of Cactus Life Sciences (part of Cactus Communications), which was contracted and compensated by BIPI for these services.