How to handle missing data in the intention-to-treat analysis of pragmatic randomized controlled clinical trials

Negida Academy
Feb 26, 2021
Attrition bias is a systematic error caused by unequal lossร‚ of participants from a randomized controlled trial (RCT). In clinical trials, participants might withdraw due to unsatisfactory treatment efficacy, intolerable adverse events, or even death. Additionally, patients who do not comply with the treatment schedule or who seek additional interventions outside of the study protocol are more likely to be excluded from the study due to violation of the study protocol. These dropouts can influence (1) the statistical power of the study and (2) the balance of confounders between the groups.

(1) The statistical power of a study refers to the ability to detect an effect if one exists. Say you compare two treatments and you find there is no significant difference between them. If you do not have sufficient statistical power, you do not know whether you failed to find a difference because:

a) there is truly no difference between the groups.

or

2) you were just not able to detect the difference.

To have sufficient statistical power, you need to make sure your sample size is large enough. (This should be calculated prior to your study).

(2) Confounders are variables that the researcher failed to control or eliminate that could affect the outcome of the study. For example, if a treatment group and a control group differ in terms of variables such as gender or socioeconomic status, these could be potential confounders. So if, for example, a disproportionate number of women drop out from one group, this could affect the balance of confounders between the study groups. This is because, ideally, we want the study groups to be as similar as possible, differing only in terms of the intervention they receive.

How to overcome attrition bias

To avoid a substantial decrease in study power, it is recommended that investigators enroll more participants than the minimum required sample size. This allows researchers to compensate for expected withdrawals. However, although this step is important, it is not sufficient to totally avoid bias, even if the number of remaining patients (after withdrawals) is enough to give the required statistical power.

This is because, as mentioned, random allocation of participants to the study group ensures balancing of known (and unknown) confounders. This is vital to ensure the validity of randomized controlled trials. Therefore, the withdrawal of a disproportion of participants from one group can impact the distribution of confounding variables among the study groups.

To avoid the bias that can arise from this, it is necessary to include those patients who drop out in the analysis. Accordingly, the intention to treat analysis (ITT) has been introduced as a statistical solution. This is where all randomized patients are included in the final analysis (irrespective of any noncompliance or withdrawal from the study). The challenging step in the ITT analysis is to estimate the end-point values of non-compliant or lost patients because these data usually are not available. However, there are multiple approaches to estimate these data:

(1) Last observation carried forward (LOCF) analysis

In this approach, investigators use the last observation data as an end-point data of the lost patients. For example, imagine a participant was meant to be followed-up at 6 weeks, 10 weeks, and 16 weeks. However, at 16 weeks of follow up they could not be contacted. In this case, the data they gave at 10 weeks is โ€˜carried forwardโ€™ and assumed to be his or her score at 16 weeks. However, this method should be used cautiously, because it assumes that participants gradually improve throughout a study. The problem is that this scenario can give biased results when the underlying disease has a progressive nature, meaning that the disease deteriorates over time.

For example: imagine an RCT about neuroprotection against Parkinsonโ€™s disease. (Neuroprotective strategies are strategies that can be applied early in the disease, with the intention of delaying disease progression). However, when the investigators use the last observation value, it is likely that the lost patients will have values that indicate less disease progression than the actual end-point. That is, whether they receive the active intervention or control, their earlier values are likely to be better. And this couldร‚ lead to an overestimate of the intervention efficacy.

(2) Multiple imputations method

The aim of this approach is to predict the endpoint value of lost patients using regression models. In this approach, imputations are performed through regression models. Then random errors are added to the expected values through a random number generator. Essentially, this approach aims to consolidate the values that have been recorded into a single (estimated) result.

(3) Analysis of the worst-case scenario

In an RCT, because participants are randomly allocated to the treatment and control groups, any systematic differences between them are most likely attributed to the treatment. So one approach is for researchers to assume the worst-case scenario to fill the data for the participants lost to attrition.

If the outcome is dichotomous (e.g. mortality), then we can assume the worst event (i.e. death) for drop-outs from the experimental group and the best event (i.e. survival) for any participants who dropped out of the control group.

If the outcome is continuous, we can assign the best baseline value and the worst endpoint value to the drop-outs.

This approach yields a conservative estimation of the treatment effect. So if the treatment was found superior to the control, we can be confident the treatment really was more effective. This is because, if anything, this type of analysis would underestimate the magnitude of the treatment effect. However, a key problem with this approach is that poor compliance may not necessarily mean the treatment was ineffective. If this analysis shows the treatment is not superior to the control, we cannot be sure why this is. It could be because: 1) the treatment is truly ineffective or 2) it is a consequence of the drop-outs. Therefore, to avoid misinterpretation, it is advisable to analyze RCTs using multiple approaches, including per-protocol analysis and multiple ITT scenarios whenever possible.

Example


Altman and colleagues performed an RCT comparing two treatments for pelvic organ prolapse. This is where there is bulging of one or more of the pelvic organs into the vagina. (The 2 treatments compared were anterior colporrhaphy versus transvaginal mesh).

In this study, they performed:

1) per-protocol analysis (i.e. only including those patients who completed both the treatment and follow-up).

and

2) another analysis scenario (ITT analysis assuming the worst-case scenario).

In the statistical analysis section, they wrote these two sentences:

โ€œanalyses included both a per-protocol analysis and a conservative sensitivity analysis of the binary primary outcome. For purposes of the sensitivity analysis, we assumed a worst-case scenario for the mesh-repair group. (i.e. for all patients with missing data in the mesh-repair group, the study treatment was considered to be unsuccessful, whereas, for patients with missing data in the colporrhaphy group, the study treatment was considered to be successful)โ€.

Additionally, when reporting the primary outcome, the authors mentioned: โ€œThe result of the per-protocol analysis was similar to that of the intention-to-treat analysis (adjusted odds ratio, 4.3; 95% CI, 2.6 to 7.2)โ€. Therefore, we can be confident that the treatment effect reported in this study is not likely to be influenced by the differential loss of some participants during the follow-up.

Conclusion

Differential loss of participants from RCTs results in attrition bias. The ITT analysis is recommended to minimize attrition bias. It is recommended that researchers:

(1) try to obtain, where possible, data about drop-outs from other sources (e.g. death registry).

(2) try to impute the missing data using multiple approaches.

(3) perform multiple types of analyses, including per-protocol analysis and ITT scenarios.

That way, when the different analyses lead to the same conclusions, we can be more confident the conclusions are robust.