Stakeholders find that step therapy should be evidence-based, flexible, and transparent: assessing appropriateness using a consensus approach

BACKGROUND: Step therapy, one approach to utilization management, is used by health plans to ensure safe and clinically appropriate care while managing cost. Several patient and provider groups have each developed principles to guide the appropriate use of step therapy; however, no comprehensive multistakeholder informed set of criteria exist. OBJECTIVE: To assess multistakeholder consensus on criteria for the development and implementation of step therapy for pharmaceutical therapies. Stakeholders were asked to (a) assess the appropriateness of step therapy as a utilization management tool; (b) rate specific criteria across 5 domains (development, implementation, communication, appeals, and evaluation) of step therapy; and (c) categorize these criteria as standards or best practices. METHODS: We conducted a multiphase project culminating in a roundtable of experts representing patient, provider, plan, pharmacy, policy, and ethical perspectives. We first reviewed guiding principles, position statements, and legislative activity to draft criteria regarding step therapy protocol development, implementation, communication, and evaluation. To assess consensus across a convenience sample of experts, we employed an iterative 4-step modified Delphi method. Panelists were asked to (a) rate the overall appropriateness of step therapy, (b) rate the appropriateness of specific criteria, and (c) identify each as a standard or best practice. Appropriateness was rated from 1-9 and categorized in terciles (1-3: not appropriate, 4-6: neither, 7-9: appropriate) to assess quantitative agreement, disagreement, and indeterminate agreement. RESULTS: After the second round of voting, roundtable panelists (n = 16) disagreed on the appropriateness of step therapy for utilization management (50% appropriate, 31.25% neither, and 18.75% inappropriate). Agreement was achieved on 21 criteria across 5 themes (clinical criteria as the foundation for protocol development, implementation of protocols, transparency and communication of processes, navigation of the appeals process, and evaluation of health and administrative impact). Fourteen and seven criteria were categorized as standards and best practices, respectively. CONCLUSIONS: The stakeholders in this panel differed in their assessments of the appropriateness of step therapy but agreed regarding how these protocols should be developed, implemented, communicated, and evaluated. Most criteria were rated as standards that can be used by stakeholders when developing, implementing, and assessing step therapy processes today.

consensus across a convenience sample of experts, we employed an iterative 4-step modified Delphi method. Panelists were asked to (a) rate the overall appropriateness of step therapy, (b) rate the appropriateness of specific criteria, and (c) identify each as a standard or best practice. Appropriateness was rated from 1-9 and categorized in terciles (1-3: not appropriate, 4-6: neither, What is already known about this subject • Step therapy is a common, and growing, utilization management tool.
• Evidence of the effect of step therapy on health care costs and outcomes is mixed-the short-term cost savings may be outweighed by long-term increases in other health care use.
• Independent provider and patient groups have outlined guiding principles for step therapy protocols; legislation has been enacted in 24 states to minimize provider administrative burdens and to ensure patient protections when implementing step therapy.

What this study adds
• Using an iterative approach, we achieved consensus on the appropriateness of 21 criteria focused on the development, implementation, communication, and evaluation of step therapy protocols among a multistakeholder panel of 16 experts.
• Criteria related to the use of robust clinical evidence when developing step protocols, transparent protocols and processes for patients and providers, and appeals flexibility were rated as standards (14/21) to which step therapy policies should adhere to today.
• Criteria related to electronic tracking of protocols and review processes, as well as publicly sharing results of protocol evaluations, were rated as best practices (7/21) that require further research, policy solutions, or infrastructure changes.
Utilization management tools used by payers, including step therapy (ST), allow for clinically appropriate, safe, and cost-effective patient care. When multiple treatments exist, patients are often required to try a clinically recognized firstline therapy before payment approval of more complex or expensive treatment options. In other scenarios, ST is used to manage treatments with safety, efficacy, or cost concerns. Among employer-sponsored health plans, ST is common but varies in frequency of use and the number of steps in the protocol. 1 Recently, the Centers for Medicare & Medicaid Services (CMS) finalized a rule allowing Medicare Advantage plans to use ST in Part B drug coverage policies to increase competition and manage costs. 2 Despite growing use of ST, the effect on health care costs and patient outcomes remains limited and mixed. 3 ST encourages appropriate use of first-line drugs and has been shown to reduce drug spending in the short term for employer-sponsored and Medicare Part D plans and select therapeutic areas. 4,5 However, it has also been shown to increase treatment discontinuation and medical resource use, such as emergency department visits. [5][6][7][8] The primarily observational and survey data are limiting factors to assessing the effect.
Physician and patient organizations have outlined ST principles that address physician administrative burden and potential unintended consequences. [9][10][11][12][13] These groups emphasize flexibility to account for unique patient circumstances and timely and appropriate communication of protocols to patients. 9,10 Several states have legislated patient protections for ST. 13,14 Stakeholder concerns and the growing use of ST, along with the uncertainty of its effect, underscores the need for standards when implementing ST protocols.

Methods
We reviewed the grey literature to identify guiding principles and position statements using the following search terms: ST principles, prior authorization guidelines, ST legislation, and utilization management guidelines. We searched legislative tracking tools for legislation regarding ST. 13,14 Using these results, we drafted a checklist of criteria that encompassed common themes. We then engaged a subgroup of 6 experts representing patients, providers, plans, pharmacists, policy, and ethics perspectives to provide clarity to the criteria.
We used a 4-step modified Delphi method to assess consensus on the checklist criteria. 15,16 This iterative method, often used in health care research, uses a systematic approach to collect and aggregate informed judgments from a group of well-informed experts on subjects where there is a lack of agreement, uncertainty, or lack of evidence. A roundtable was convened with 16 panelists, including the 6 experts previously mentioned, from the patient (n = 4), physician (n = 5), payer (n = 4), pharmacist (n = 1), health policy (n = 1), and ethics (n = 1) communities (see Acknowledgments). This convenience sample of experts was identified from the relevant peer-reviewed and grey literature or from their roles at relevant stakeholder organizations involved in ST processes.
The Delphi method included a premeeting round of voting, telephone interviews, and a group discussion followed by a second round of voting. During the roundtable, we provided panelists with their premeeting votes compared with the anonymized distribution of votes from the panel.
Assessing consensus was a 2-step process. We first asked panelists to rate the appropriateness of ST overall and the individual criterion on a scale from 1 (very inappropriate) to 9 (very appropriate). We then calculated median ratings and quantitative estimation of agreement or disagreement across 3 terciles (1-3: inappropriate, 4-6: neither, 7-9: appropriate).
Quantitative agreement (or disagreement) as determined by the modified Delphi calculation based on 16 panelists is determined by less (or more) than 4 responses at either end of the appropriateness terciles (Supplementary Figure 1, available in online article). For example, if 15 panelists voted a criterion as appropriate (7)(8)(9) and the remaining panelist rated it as inappropriate (1)(2)(3), we concluded agreement. If 5 panelists voted a criterion as inappropriate, 4 voted appropriate, and the remaining 7 felt it was neither (4-6), 7-9: appropriate) to assess quantitative agreement, disagreement, and indeterminate agreement.

RESULTS:
After the second round of voting, roundtable panelists (n = 16) disagreed on the appropriateness of step therapy for utilization management (50% appropriate, 31.25% neither, and 18.75% inappropriate). Agreement was achieved on 21 criteria across 5 themes (clinical criteria as the foundation for protocol development, implementation of protocols, transparency and communication of processes, navigation of the appeals process, and evaluation of health and administrative impact). Fourteen and seven criteria were categorized as standards and best practices, respectively.

CONCLUSIONS:
The stakeholders in this panel differed in their assessments of the appropriateness of step therapy but agreed regarding how these protocols should be developed, implemented, communicated, and evaluated. Most criteria were rated as standards that can be used by stakeholders when developing, implementing, and assessing step therapy processes today.

CLINICAL CRITERIA FOUNDATION FOR ST PROTOCOL DEVELOPMENT
We found agreement on the appropriateness of 3 criteria related to ST protocol development. First, most panelists (15/16) indicated that ST should be based on high-quality, up-to-date clinical evidence and prioritized before economic evidence. One plan representative noted their fiduciary responsibility and rated this as neither appropriate nor inappropriate.
Second, all panelists (16/16) agreed that ST protocols should be developed by an objective multidisciplinary review committee, free from potential conflicts of interest. However, panelists highlighted potential conflicts in integrated delivery systems where members of the committee are employed by the health plan.
Third, most panelists (15/16) agreed that treatment failure, or risk of failure, should be defined using conditionspecific parameters. Patient and ethics representatives emphasized the importance of broad markers of failure such as loss of productivity. The group agreed that failure should include clinical metrics such as lack of efficacy and potential for adverse events, as well as additional concerns such as quality of life and provider judgment.
All 3 criteria were categorized as standards ( Figure 1).

IMPLEMENTATION OF ST PROTOCOLS AND PROCESSES
We found quantitative agreement on the appropriateness of 5 of the 6 criteria associated with ST implementation and indeterminate agreement on 1 criterion. First, all panelists (16/16) agreed that patients should face no more steps than clinically reasonable. However, determining what is "clinically reasonable" requires granular, timely, and unbiased clinical guidelines not always available.
Second, all panelists (16/16) concurred that trial duration should be specified for each treatment in the protocol, and time to "failure" should minimize patient harm. One panelist noted harm can extend beyond clinical measures if treatment affects a patient's quality of life or work status. Administrative burden should also be minimized.
Third, while many panelists (11/16) noted that responses to exception requests should occur rapidly, not all panelists rated this criterion as appropriate. Plan representatives explained that the response "clock" should begin once the plan has all documentation. In contrast, provider representatives emphasized the barriers to the seamless exchange of information across stakeholders due to lack of interoperability across systems.
Fourth, most panelists (13/16) agreed that once a plan authorizes a therapy, approval should remain as long as the we would conclude disagreement. We concluded indeterminate agreement if neither criterion was met.
Criteria with a median appropriateness score of 7 or above or indeterminate agreement remained in the final checklist. By keeping indeterminate criteria in the checklist, the second level of differentiation, standards versus best practices, highlighted where the uncertainty remained. Our study aimed to capture those differences.
Panelists identified each criterion as a standard, a best practice, or neither. A criterion was a standard, a goal achievable today, if at least 75% of panelists rated it as a standard. If a criterion was rated by less than 75% of the panelists as a standard, but at least 75% of panelists rated it either a standard or a best practice, it was categorized as a best practice, or an aspirational goal that could be achieved through policy or infrastructure changes. We reported the final voting after round 2.
Overall, 50% of panelists rated ST as appropriate, while the other half were split between neither (31%) and inappropriate (19%; Figure 1). We concluded indeterminate agreement.
Several plan representatives highlighted the role of ST to ensure clinically appropriate medication use and improve patient health. Patient and provider representatives cautioned that ST is used to manage costs with few concerns regarding patient outcomes.
However, we found that panelists agreed on how ST should be implemented (Figure 1). At the conclusion of the roundtable, 21 criteria of the original 23 remained. Median scores across criteria ranged from 7 to 9; we found quantitative agreement or concluded indeterminate agreement among all checklist criteria.
Based on the 75% standard cut-off, the majority of the criteria (14/21) were rated as standards, and 7 were best practices (Figure 1, also Supplementary Figure 2 and  Supplementary Table 1, available in online article).
We summarize the final roundtable ratings, categorizations, and discussion to provide context.

Theme 1:
Clinical criteria foundation for step therapy protocol development A. Clinical evidence is considered before cost when reviewing medical products for coverage and potential for step therapy policy B. Coverage policies, including step therapy, for medical products will be reviewed by an objective, external multidisciplinary committee and include the following inputs and processes:° Member perspectives are considered in decision-making process° All available clinical evidence, including studies on therapeutic need, efficacy, safety, and effectiveness° Established clinical practice guidelines or compendia are incorporated into the review of clinical evidence° Policies are updated regularly to reflect updated evidence for existing therapies C. Failure, or risk of failure, is defined using condition-specific parameters (for lack of efficacy and adverse events) for any medical product when included in a step therapy protocol based on any of the following:°  Third, most panelists (15/16) agreed it is appropriate to track exception or appeal requests electronically. Panelists identified the need for pharmacists to access this information given their role at the point of service; challenges persist because of e-prior authorization software limitations.
Most panelists (15/16) rated access to electronic tracking approval processes as an appropriate component of ST protocols. However, the group rated this a standard for doctors and a best practice for patients and for pharmacists. Panelists believed online portals could ensure patient privacy.
Finally, most panelists (15/16) agreed that communicating any formulary changes to patients in accessible language is necessary to keep them informed and ensure continuity of care.
Electronic tracking of ST protocol and review processes was the only criterion categorized as a best practice in this theme (Figure 1).

NAVIGATION OF THE APPEALS PROCESS
Criteria in this theme describe the steps to appeal a decision. We concluded agreement on 2 criteria and indeterminate agreement for the other 2.
First, all panelists (16/16) agreed that the option to appeal a plan decision should be clear to the patient and the appeal submission process should be electronic and accessible to patients and providers. Panelists rated this a standard. patient remains a beneficiary. However, plan representatives commented that changes might be warranted if there was (a) new evidence for an existing drug or (b) a newly approved therapy might be more appropriate. Most panelists agreed (15/16) that plans should prevent interruptions in care by allowing exceptions to ST protocols if the patient is stable on treatment.
Finally, several panelists (11/16) felt that a patient's previously completed steps should be acceptable to subsequent plans. Improvements to electronic health information exchange should minimize administrative burden associated with switching plans. We concluded indeterminate agreement on this criterion.
One criterion-adherence to a time frame for response to exception requests-was identified as a best practice (Figure 1).

TRANSPARENCY AND COMMUNICATION OF ST PROCESSES
We found quantitative agreement on all 6 criteria regarding ST processes and protocol communications.
First, all panelists (16/16) agreed that formulary details should be easily accessible on the plan website.
Second, most panelists (15/16) agreed that details should be communicated in an easily accessible manner to providers and patients.

Theme 4:
Navigation of the exceptions and appeals processes A. Benefits administrator provides a clear, readily accessible, electronic process for health care provider and patient to submit requests for appeals and exceptions, including the appeal request itself and supporting clinical documentation (i.e., peer-reviewed medical literature, clinical guidelines, patient charts/history/test results) B. Benefits administrator facilitates communication between prescribing health care provider and a provider of the same training and specialty/subspecialty for discussion of medical necessity issues C. If the exception or appeal is denied, health plan will provide relevant supporting documentation to applicant for next steps including clinical justification for the decision, alternative options covered by the health plan and process for requesting external review D. Benefits administrator provides appeals approval statistics on health plan website for public review  decision rationale and the option for an external appeal and review to the provider and patient. Fourth, panelists were split (11/16) on whether it is appropriate for plans to make appeal statistics publicly available. Provider representatives advocated for appeals statistics to be available during plan enrollment. A plan representative noted this information is reported to CMS or state insurance review boards. However, plan representatives questioned who might access this information, what data should be available, whether measures are comparable, and the potential for misinterpretation. This criterion was rated a best practice. Second, some panelists (9/16) recognized that the need for peer-to-peer conversations during appeals review should include providers of similar specialties. Patient and provider representatives commented that straightforward requests could be addressed by general practitioners, but more complex situations may require providers of the same specialty. Further, 1 plan representative explained that the chief medical officer has final authorization regarding internal appeals; therefore, designating internal appeal requests to specialists may be difficult. This criterion was rated a best practice.
Third, all panelists (16/16) agreed, and rated as a standard, that if an appeal is denied, the plan should provide the Plan communicates changes a E-process for appeals and

Conclusions
Step therapy is a common and growing utilization man-

EVALUATION OF HEALTH AND ADMINISTRATIVE IMPACT
We found agreement on the appropriateness of 2 criteria related to the effect of ST. First, all panelists (16/16) agreed that plans should track ST processes and outcomes (Figure 1). Plan representatives indicated that they track this information internally and, sometimes, report these data to accreditation bodies. While panelists agreed that documenting the results of a policy is needed, they voiced concerns with external reporting. Panelists rated this a best practice.
Second, most panelists (12/16) believed it is appropriate for plans to weigh the administrative costs of implementing ST. One provider representative explained that any policy that affects a patient's access to care should be regularly evaluated. Others emphasized the administrative burden on physicians and pharmacists. However, as a plan representative noted, some ST protocols are implemented to prevent potential safety or abuse issues, so any benefits to patient safety should outweigh the administrative costs. The group agreed that these efforts to track and evaluate ST protocols were appropriate. However, differences in the need for public reporting likely led to the best practice categorization.

Discussion
A diverse group of stakeholders varied in their perspectives regarding the appropriateness of ST. This is not unexpected. Stakeholder benefits or concerns vary because of economics, care delivery, efficiency, and patient outcomes. ST protocols also vary in how they are implemented and then communicated to patients and providers across treatments and plans. Given these variations, the more important finding from our study was that stakeholders agreed on 21 criteria that plans should incorporate when developing and implementing ST programs. Most criteria were identified as standards for ST programs, while others were identified as best practices.
Previous work has focused on individual conditions, has been driven by 1 stakeholder perspective, or did not use a formal consensus process. The multistakeholder roundtable that we convened addressed issues related to ST that apply across therapeutic areas or patient populations. Further, the results from our approach to evaluate consensus are not dissimilar from those recommended by other professional bodies. [17][18][19] Our panelists highlighted the importance of defining nuanced terms such as treatment "failure" and evaluating ST processes.