Causal survival analysis

Estimation of the Average Treatment Effect (ATE): Practical Recommendations

Creative Commons BY License ISSN 2824-7795

Authors
Affiliations
Imke Mayer
Bernard Sebastien
Published

February 20, 2025

Modified

February 20, 2025

Keywords

Restricted Mean Survival Time, Randomized Control Trial, Observational Study, Censoring

Status
build status
Abstract
Causal survival analysis combines survival analysis and causal inference to evaluate the effect of a treatment or intervention on a time-to-event outcome, such as survival time. It offers an alternative to relying solely on Cox models for assessing these effects. In this paper, we present a comprehensive review of estimators for the average treatment effect measured with the restricted mean survival time, including regression-based methods, weighting approaches, and hybrid techniques. We investigate their theoretical properties and compare their performance through extensive numerical experiments. Our analysis focuses on the finite-sample behavior of these estimators, the influence of nuisance parameter selection, and their robustness and stability under model misspecification. By bridging theoretical insights with practical evaluation, we aim to equip practitioners with both state-of-the-art implementations of these methods and practical guidelines for selecting appropriate estimators for treatment effect estimation. Among the approaches considered, G-formula two-learners, AIPCW-AIPTW, Buckley-James estimators, and causal survival forests emerge as particularly promising.

1 Introduction

1.1 Context and motivations

Causal survival analysis is a growing field that integrates causal inference (; ) with survival analysis () to evaluate the impact of treatments on time-to-event outcomes, while accounting for censoring situations where only partial information about an event’s occurrence is available.

Being a relatively new domain, the existing literature, though vast, remains fragmented. As a result, a clear understanding of the theoretical properties of various estimators is challenging to obtain. Moreover, the implementation of proposed methods is limited, leaving researchers confronted with a range of available estimators and the need to make numerous methodological decisions. There is a pressing need for a comprehensive survey that organizes the available methods, outlines the underlying assumptions, and provides an evaluation of estimator performance — particularly in finite sample settings. Such a survey also has the potential to help identify remaining methodological challenges that need to be addressed. This need becomes increasingly urgent as causal survival analysis gains traction in both theoretical and applied domains. For instance, its applications to external control arm analyses are particularly relevant in the context of single-arm clinical trials, where traditional comparator arms are unavailable. Regulatory guidelines have begun to acknowledge and support such semi-experimental approaches, reflecting the broader evolution of trial design and therapeutic innovation in precision medicine, see for instance ().

By synthesizing the theoretical foundations, assumptions, and performance of various estimators, a survey on existing causal survival analysis methods would provide researchers and practitioners with the necessary tools to make informed methodological choices. This is crucial for fostering robust and reliable applications of causal survival analysis in both academic research and practical settings, where precise and valid results are paramount.

In this paper, we focus our attention to the estimation of the Restricted Mean Survival Time (RMST), a popular causal measure in survival analysis which offers an intuitive interpretation of the average survival time over a specified period. In particular, we decided to not cover the estimation of Hazard Ratio (HR), which has been prominently used but often questioned due to its potential non-causal nature (). Additionally, unlike the Hazard Ratio, the RMST has the desirable property of being a collapsible measure, meaning that the population effect can be expressed as a weighted average of subgroup effects, with positive weights that sum to one ().

1.2 Definition of the estimand: the RMST

We set the analysis in the potential outcome framework, where a patient, described by a vector of covariates XRpX \in \mathbb{R}^p, either receives a treatment (A=1A=1) or is in the control group (A=0A=0). The patient will then survive up to a certain time T(0)R+T(0) \in \mathbb{R}^+ in the control group, or up to a time T(1)R+T(1)\in \mathbb{R}^+ in the treatment group. In practice, we cannot simultaneously have access to T(0)T(0) and T(1)T(1), as one patient is either treated or control, but only to TT defined as follows:

Assumption. (Stable Unit Treatment Value Assumption: SUTVA) T=AT(1)+(1A)T(0).(1) T = AT(1) + (1-A)T(0). \tag{1}

Due to potential censoring, the outcome TT is not completely observed. The most common form of censoring is right-censoring (also known as type II censoring), which occurs when the event of interest has not taken place by the end of the observation period, indicating that it may have occurred later if the observation had continued (). We focus in this study on this type of censoring only and we assume that we observe T~=TC=min(T,C)\tilde T= T \wedge C = \min(T,C) for some censoring time CR+C \in \mathbb{R}^+. When an observation is censored, the observed time is equal to the censoring time.

We also assume that we know whether an outcome is censored or not. In other words, we observe the censoring status variable Δ=I{TC}\Delta = \mathbb{I}\{T \leqslant C\}, where I{}\mathbb{I}\{\cdot\} is the indicator function. Δ\Delta is 11 if the true outcome is observed, and 00 if it is censored.

We assume observing a nn-sample of variables (X,A,T~,Δ)(X,A,\widetilde T,\Delta) stemming from an nn-sample of the partially unobservable variables (X,A,T(0),T(1),C)(X,A,T(0),T(1),C). A toy exemple of such data is given in .

Table 1: Example of a survival dataset. In practice, only X,A,T~X,A,\widetilde T and Δ\Delta are observed.
ID Covariates Treatment Censoring Status Potential outcomes True outcome Observed outcome
i X1X_{1} X2X_{2} X3X_{3} AA CC Δ\Delta T(0)T(0) T(1)T(1) TT T~\tilde{T}
1 1 1.5 4 1 ? 1 ? 200 200 200
2 5 1 2 0 ? 1 100 ? 100 100
3 9 0.5 3 1 200 0 ? ? ? 200

Our aim is to estimate the Average Treatment Effect (ATE) defined as the difference between the Restricted Mean Survival Time of the treated and controls ().

Definition 1 (Causal effect: Difference between Restricted Mean Survival Time)

θRMST=E[T(1)τT(0)τ], \theta_{\mathrm{RMST}} = \mathbb{E}\left[T(1) \wedge \tau - T(0) \wedge \tau\right], where ab:=min(a,b)a \wedge b := \min(a,b) for a,bRa,b \in \mathbb{R}.

We define the survival functions S(a)(t):=P(T(a)>t)S^{(a)}(t):=\mathbb{P}(T(a) > t) for a{0,1}a \in \{0,1\}, i.e., the probability that a treated or non-treated individual will survive beyond a given time tt. Likewise, we let S(t):=P(T>t)S(t) := \mathbb{P}(T >t), and SC(t):=P(C>t)S_C(t) := \mathbb{P}(C > t). We also let G(t):=P(Ct)G(t) := \mathbb{P}(C \geqslant t) be the left-limit of the survival function SCS_C. Because T(a)τT(a) \wedge \tau are non-negative random variables, one can easily express the restricted mean survival time using the survival functions:

E(T(a)τ)=0P(T(a)τ>t)dt=0τS(a)(t)dt.(2) \mathbb{E}(T(a) \wedge \tau) = \int_{0}^{\infty} \mathbb{P}(T(a)\wedge \tau > t)dt = \int_{0}^{\tau}S^{(a)}(t)dt. \tag{2}

Consequently, θRMST\theta_{\mathrm{RMST}} can be interpreted as the mean difference between the survival function of treated and control until a fixed time horizon τ\tau. A difference in RMST θRMST=10\theta_{\mathrm{RMST}} = 10 days with τ=200\tau=200 means that, on average, the treatment increases the survival time by 10 days at 200 days. We give a visual interpretation of RMST in .

Figure 1: Plot of the estimated survival curves on synthetic toy-data. The θRMST\theta_{\mathrm{RMST}} at τ=50\tau=50 corresponds to the yellow shaded area between the two survival curves. The curves have been estimated using Kaplan-Meier estimator, see .

Although the present work focuses on the estimation of the difference in RMST, we would like to stress that the causal effect can be assessed through other measures, such as for instance the difference of the survival functions θSP:=S(1)(τ)S(0)(τ) \theta_{\mathrm{SP}} := S^{(1)}(\tau) - S^{(0)}(\tau) for some time τ\tau, see for instance (). As mentionned in , another widely used measure (though not necessarily causal) is the hazards ratio, defined as θHR:=λ(1)(τ)λ(0)(τ), \theta_{\mathrm{HR}} := \frac{\lambda^{(1)}(\tau)}{\lambda^{(0)}(\tau)}, where the hazard function λ(a)\lambda^{(a)} is defined as λ(a)(t):=limh0+P(T(a)[t,t+j)T(a)t)h. \lambda^{(a)}(t) := \lim_{h \to 0^+}\frac{\mathbb{P}(T(a) \in [t,t+j)|T(a) \geqslant t)}{h}. in a continuous setting, or as λ(a)(t):=P(T(a)=tT(a)t)\lambda^{(a)}(t) := \mathbb{P}(T(a) = t|T(a) \geqslant t) when the survival times are discrete. The hazard functions and the survival functions are linked through the identities S(a)(t)=exp(Λ(a)(t))whereΛ(a)(t):=0tλ(a)(s) ⁣ds,(3) S^{(a)}(t) = \exp\left(-\Lambda^{(a)}(t)\right) \quad \text{where} \quad \Lambda^{(a)}(t) := \int_0^t \lambda^{(a)}(s)\mathop{}\!\mathrm{d}s, \tag{3} in the continuous case. The functions Λ(a)\Lambda^{(a)} are call the cumulative hazard functions. In the discrete case, we have in turn S(a)(t)=tkt(1λ(a)(tk)),(4) S^{(a)}(t) = \prod_{t_k \leqslant t} \left(1-\lambda^{(a)}(t_k)\right), \tag{4} where {t1,,tK}\{t_1,\dots,t_K\} are the atoms of T(a)T^{(a)}. These hazard functions are classically used to model the survival times and the censoring times, see .

1.3 Organisation of the paper

In this paper, we detail the minimal theoretical framework that allows the analysis of established RMST estimators in the context of both Randomized Controlled Trials () and observational data (). We give their statistical properties (consistency, asymptotic normality) along with proofs when possible. We then conduct in a numerical study of these estimators through simulations under various experimental conditions, including independent and conditionally independent censoring and correct and incorrect model specifications. We conclude in with practical recommendations on estimator selection, based on criteria such as asymptotic behavior, computational complexity, and efficiency.

1.4 Notations

We provide in a summary of the notation used throughout the paper.

Table 2: Summary of the notations.
Symbol Description
XX Covariates
AA Treatment indicator (A=1(A=1 for treatment, A=0A=0 for control))
TT Survival time
T(a),a{0,1}T(a), a \in \{0,1\} Potential survival time respectively with and without treatment
S(a),a{0,1}S^{(a)},a \in \{0,1\} Survival function S(a)(t)=P(T(a)>t)S^{(a)}(t) =\mathbb{P}(T(a) > t) of the potential survival times
λ(a),a{0,1}\lambda^{(a)},a \in \{0,1\} Hazard function λ(a)(t)=limh0+P(T(a)[t,t+h)T(a)t)/h\lambda^{(a)}(t) =\lim_{h \to 0^+}\mathbb{P}(T(a) \in [t,t+h) |T(a)\geqslant t)/h of the potential survival times
Λ(a),a{0,1}\Lambda^{(a)},a \in \{0,1\} Cumulative hazard function of the potential survival times
CC Censoring time
SCS_C Survival function SC(t)=P(C>t)S_C(t) =\mathbb{P}(C > t) of the censoring time
GG Left-limit of the survival function G(t)=P(Ct)G(t) =\mathbb{P}(C \geqslant t) of the censoring time
T~\widetilde{T} Observed time (TCT \wedge C)
Δ\Delta Censoring status I{TC}\mathbb{I}\{T \leqslant C \}
Δτ\Delta^{\tau} Censoring status of the restricted time Δτ=max{Δ,I{T~τ}}\Delta^{\tau} = \max\{\Delta, \mathbb{I}\{\widetilde{T} \geqslant\tau\}\}
{t1,t2,,tK}\{t_{1},t_{2},\dots,t_{K}\} Discrete times
e(x)e(x) Propensity score E[AX=x]\mathbb{E} [A| X = x]
μ(x,a),a{0,1}\mu(x,a), a \in \{0,1\} E[TτX=x,A=a]\mathbb{E}[T \wedge \tau \mid X=x,A=a ]
S(tx,a),a{0,1}S(t|x,a), a \in \{0,1\} Conditional survival function, P(T>tX=x,A=a)\mathbb{P}(T > t | X=x, A =a).
λ(a)(tx),a{0,1}\lambda^{(a)}(t|x), a \in \{0,1\} Conditional hazard functions of the potential survival times
G(tx,a),a{0,1}G(t|x,a), a \in \{0,1\} left-limit of the conditional survival function of the censoring P(CtX=x,A=a)\mathbb{P}(C\geqslant t|X=x,A=a)
QS(tx,a),a{0,1}Q_{S}(t|x,a), a \in \{0,1\} E[TτX=x,A=a,Tτ>t]\mathbb{E}[T \wedge \tau \mid X=x,A=a, T \wedge \tau>t]

2 Causal survival analysis in Randomized Controlled Trials

Randomized Controlled Trials (RCTs) are the gold standard for establishing the effect of a treatment on an outcome, because treatment allocation is controlled through randomization, which ensures (asymptotically) the balance of covariates between treated and controls, and thus avoids problems of confounding between treatment groups. The core assumption in a classical RCT is the random assignment of the treatment ().

Assumption. (Random treatment assignment) There holds: AT(0),T(1),X(5) A \perp\mkern-9.5mu\perp T(0),T(1),X \tag{5}

We also assume that there is a positive probability of receiving the treatment, which we rephrase under the following assumption.

Assumption. (Trial positivity) 0<P(A=1)<1(6) 0 < \mathbb{P}(A=1) < 1 \tag{6}

Under Assumptions and , classical causal identifiability equations can be written to express θRMST\theta_{\mathrm{RMST}} without potential outcomes.

θRMST=E[T(1)τT(0)τ]=E[T(1)τA=1]E[T(0)τA=0](Random treatment assignment)=E[TτA=1]E[TτA=0].(SUTVA)(7) \begin{aligned} \theta_{\mathrm{RMST}} &= \mathbb{E}[T(1) \wedge \tau - T(0) \wedge \tau]\\ &= \mathbb{E}[T(1) \wedge \tau | A = 1] - \mathbb{E}[ T(0) \wedge \tau| A= 0] && \tiny\text{(Random treatment assignment)} \\ &= \mathbb{E}[T \wedge \tau | A = 1] - \mathbb{E}[ T \wedge \tau| A= 0]. && \tiny\text{(SUTVA)} \end{aligned} \tag{7}

However, still depends on TT, which remains only partially observed due to censoring. To ensure that censoring does not compromise the identifiability of treatment effects, we must impose certain assumptions on the censoring process, standards in survival analysis, namely, independent censoring and conditionally independent censoring. These assumptions lead to different estimation approaches. We focus on two strategies: those that aim to directly estimate E[TτA=a]\mathbb{E}[T \wedge \tau | A = a] directly (e.g., through censoring-unbiased transformations, see ), and those that first estimate the survival curves to derive RMST via (such as the Kaplan-Meier estimator and all its variants, see the next Section).

2.1 Independent censoring: the Kaplan-Meier estimator

In a first approach, one might assume that the censoring times are independent from the rest of the variables.

Assumption. (Independent censoring) CT(0),T(1),X,A.(8) C \perp\mkern-9.5mu\perp T(0),T(1),X,A. \tag{8}

Under , subjects censored at time tt are representative of all subjects who remain at risk at time tt. represents the causal graph when the study is randomized and outcomes are observed under independent censoring.

Figure 2: Causal graph in RCT survival data with independent censoring.

We also assume that there is no almost-sure upper bound on the censoring time before τ\tau, which we rephrase under the following assumption.

Assumption. (Positivity of the censoring process) There exists ε>0\varepsilon> 0 such that G(t)εfor allt[0,τ).(9) G(t) \geqslant\varepsilon\quad \text{for all} \quad t \in [0,\tau). \tag{9}

If indeed it was the case that P(C<t)=1\mathbb{P}(C < t) = 1 for some t<τt < \tau, then we would not be able to infer anything on the survival function on the interval [t,τ][t,\tau] as all observation times T~i\widetilde T_i would be in [0,t][0,t] almost surely. In practice, adjusting the threshold time τ\tau can help satisfy the positivity assumption. For instance, in a clinical study, if a subgroup of patients has zero probability of remaining uncensored at a given time, τ\tau can be modified to ensure that participants have a feasible chance of remaining uncensored up to the revised threshold.

The two Assumptions and together allow the distributions of T(a)T(a) to be identifiable, in the sense that there exists an identity that expresses S(a)S^{(a)} as a function of the joint distribution of (T~,Δ,A=a)(\widetilde T,\Delta,A=a), see for instance Ebrahimi, Molefe, and Ying () for such a formula in a non-causal framework. This enables several estimation strategies, the most well-known being the Kaplan-Meier product-limit estimator.

To motivate the definition of the latter and explicit the identifiability identity, we set the analysis in the discrete case. We let {tk}k1\{t_k\}_{k \geqslant 1} be a set of positive and increasing times and assume that T{tk}k1T \in \{t_k\}_{k \geqslant 1} almost surely. Then for any t[0,τ]t \in [0,\tau], it holds, letting t0=0t_0 = 0 by convention, thanks to ,

S(tA=a)=P(T>tA=a)=tkt(1P(T=tkT>tk1,A=a))=tkt(1P(T=tk,A=a)P(Ttk,A=a)).\begin{align*} S(t| A=a) &= \mathbb{P}(T > t|A=a) = \prod_{t_k \leqslant t} \left(1 - \mathbb{P}(T = t_k | T > t_{k-1}, A=a)\right) \\ &= \prod_{t_k \leqslant t} \left(1 - \frac{\mathbb{P}(T = t_{k}, A=a)}{\mathbb{P}(T \geqslant t_{k},A=a)}\right). \end{align*}

Using Assumptions and , we find that P(T=tk,A=a)P(Ttk,A=a)=P(T=tk,Ctk,A=a)P(Ttk,Ctk,A=a)=P(T~=tk,Δ=1,A=a)P(T~tk,A=a),(10) \frac{\mathbb{P}(T = t_{k}, A=a)}{\mathbb{P}(T \geqslant t_{k},A=a)} = \frac{\mathbb{P}(T = t_{k},C \geqslant t_k,A=a)}{\mathbb{P}(T \geqslant t_{k}, C\geqslant t_k,A=a)} = \frac{\mathbb{P}(\widetilde T = t_{k}, \Delta = 1,A=a)}{\mathbb{P}( \widetilde T \geqslant t_{k},A=a)}, \tag{10} yielding the final identity S(tA=a)=tkt(1P(T~=tk,Δ=1,A=a)P(T~tk,A=a)).(11) S(t|A=a) = \prod_{t_k \leqslant t} \left(1-\frac{\mathbb{P}(\widetilde T = t_{k}, \Delta = 1,A=a)}{\mathbb{P}( \widetilde T \geqslant t_{k},A=a)}\right). \tag{11} Notice that the right hand side only depends on the distribution of the observed tuple (A,T~,Δ)(A,\widetilde T,\Delta). This last equation suggests in turn to introduce the quantities Dk(a):=i=1nI(T~i=tk,Δi=1,A=a)andNk(a):=i=1nI(T~itk,A=a),(12) D_k(a) := \sum_{i=1}^n \mathbb{I}(\widetilde T_i = t_k, \Delta_i = 1, A=a) \quad\text{and}\quad N_k(a) := \sum_{i=1}^n \mathbb{I}(\widetilde T_i \geqslant t_k, A=a), \tag{12} which correspond respectively to the number of deaths Dk(a)D_k(a) and of individuals at risk Nk(a)N_k(a) at time tkt_k in the treated group (a=1) or in the control group (a=0).

Definition 2 (Kaplan-Meier estimator, Kaplan and Meier ()) With Dk(a)D_k(a) and Nk(a)N_k(a) defined in , we let

S^KM(tA=a):=tkt(1Dk(a)Nk(a)).(13) \widehat{S}_{\mathrm{KM}}(t|A=a) := \prod_{t_k \leqslant t}\left(1-\frac{D_k(a)}{N_k(a)}\right). \tag{13}

The assiociated RMST estimator is then simply defined as θ^KM=0τS^KM(tA=1)S^KM(tA=0)dt.(14) \widehat{\theta}_{\mathrm{KM}} = \int_{0}^{\tau}\widehat{S}_{KM}(t|A=1)-\widehat{S}_{KM}(t|A=0)dt. \tag{14} The Kaplan-Meier estimator is the Maximum Likelihood Estimator (MLE) of the survival functions, see for instance Kaplan and Meier (). Furthermore, because Dk(a)D_k(a) and Nk(a)N_k(a) are sums of i.i.d. random variables, the Kaplan-Meier estimator inherits some convenient statistical properties.

Proposition 1 Under Assumptions , , , and , and for all t[0,τ]t \in [0,\tau], the estimator S^KM(tA=a)\widehat S_{\mathrm{KM}}(t|A=a) of S(a)(t)S^{(a)}(t) is strongly consistent and admits the following bounds for its bias: 0S(a)(t)E[S^KM(tA=a)]O(P(Nk(a)=0)), 0 \leqslant S^{(a)}(t) - \mathbb{E}[\widehat S_\mathrm{KM}(t|A=a)] \leqslant O(\mathbb{P}(N_k(a) = 0)), where kk is the greatest time tkt_k such that ttkt \geqslant t_k.

Gill () gives a more precise lower-bound on the bias in the case of continuous distributions, which was subsequently refined by Zhou (). The bound we give, although slightly looser, still exhibits the same asymptotic regime. In particular, as soon as S(a)(t)>0S^{(a)}(t) > 0 (and Assumption holds), then the bias decays exponentially fast towards 00. We give in a simple proof of our bound is our context.

Proposition 2 Under Assumptions , , , and , and for all t[0,τ]t \in [0,\tau], S^KM(tA=a)\widehat S_{\mathrm{KM}}(t|A=a) is asymptotically normal and n(S^KM(tA=a)S(a)(t))\sqrt{n}\left(\widehat S_{\mathrm{KM}}(t|A=a) - S^{(a)}(t)\right) converges in distribution towards a centered Gaussian of variance VKM(tA=a):=S(a)(t)2tkt1sk(a)sk(a)rk(a), V_{\mathrm{KM}}(t|A=a) := S^{(a)}(t)^2 \sum_{t_k \leqslant t} \frac{1-s_k(a)}{s_k(a) r_k(a)}, where sk(a)=S(a)(tk)/S(a)(tk1)s_k(a) = S^{(a)}(t_k)/S^{(a)}(t_{k-1}) and rk(a)=P(T~tk,A=a)r_k(a) = \mathbb{P}(\widetilde T \geqslant t_k, A=a).

The proof of can be found in . Because Dk(a)/Nk(a)D_k(a)/N_k(a) is a natural estimator of 1sk(a)1-s_k(a) and, 1nNk(a)\frac{1}{n} N_k(a) a natural estimator for rk(a)r_k(a), the asymptotic variance of the Kaplan-Meier estimator can be estimated with the so-called Greenwood formula, as already derived heuristically in Kaplan and Meier ():

Var^(S^KM(tA=a)):=S^KM(tA=a)2tktDk(a)Nk(a)(Nk(a)Dk(a)).(15) \widehat{\mathrm{Var}} \left(\widehat{S}_{\mathrm{KM}}(t|A=a)\right) := \widehat{S}_{\mathrm{KM}}(t|A=a)^2 \sum_{t_k \leqslant t} \frac{D_k(a)}{N_k(a)(N_k(a)-D_k(a))}. \tag{15}

We finally mention that the KM estimator as defined in still makes sense in a non-discrete setting, and one only needs to replace the fixed grid {tk}\{t_k\} by the values at which we observed an event (T~i=tk,Δi=1)(\widetilde T_i = t_k, \Delta_i =1). We refer to Breslow and Crowley () for a study of this estimator in the continuous case and to Aalen, Borgan, and Gjessing (), Sec 3.2 for a general study of the KM estimator through the prism of point processes.

2.2 Conditionally independent censoring

An alternative hypothesis in survival analysis that relaxes the assumption of independent censoring is conditionally independent censoring, also refered sometimes as informative censoring. It allows to model more realistic censoring processes, in particular in situations where there are reasons to believe that CC may be dependent from AA and XX (for instance, if patient is more like to leave the study when treated because of side-effects of the treatment).

Assumption. (Conditionally independent censoring) CT(0),T(1)  X,A(16) C \perp\mkern-9.5mu\perp T(0),T(1) ~ | ~X,A \tag{16}

Under , subjects within a same stratum defined by X=xX=x and A=aA=a have equal probability of censoring at time tt, for all tt. In case of conditionally independent censoring, we also need to assume that all subjects have a positive probability to remain uncensored at their time-to-event.

Assumption. (Positivity / Overlap for censoring) There exists ε>0\varepsilon> 0 such that for all t[0,τ)t\in [0,\tau), it almost surely holds G(tA,X)ε.(17) G(t|A,X) \geqslant\varepsilon. \tag{17}

represents the causal graph when the study is randomized with conditionally independent censoring.

Figure 3: Causal graph in RCT survival data with dependent censoring.

Under dependent censoring, the Kaplan-Meier estimator as defined in can fail to estimate survival, in particular because does not hold anymore. Alternatives include plug-in G-formula estimators () and unbiased transformations ().

2.2.1 The G-formula and the Cox Model

Because the censoring is now independent from the potential outcome conditionally to the covariates, it would seem natural to model the response of the survival time conditionally to these covariates too: μ(x,a):=E[TτX=x,A=a]. \mu(x,a) := \mathbb{E}[T \wedge \tau |X = x, A = a].

Building on , one can express the RMST as a function of μ\mu: θRMST=E[E[TτX,A=1]]E[TτX,A=0]]=E[μ(X,1)μ(X,0)].\begin{align*} \theta_{\mathrm{RMST}} = \mathbb{E}\left[\mathbb{E}[T \wedge \tau |X, A = 1]\right] - \mathbb{E}\left[ T \wedge \tau|X, A= 0]\right] = \mathbb{E}[\mu(X,1)-\mu(X,0)]. \end{align*}

An estimator μ^\widehat\mu of μ\mu would then straightforwardly yield an estimator of the difference in RMST through the so-called G-formula plug-in estimator:

θ^Gformula=1ni=1nμ^(Xi,1)μ^(Xi,0).(18) \widehat{\theta}_{\mathrm{G-formula}} = \frac{1}{n} \sum_{i=1}^n \widehat\mu\left(X_i, 1\right)-\widehat\mu\left(X_i, 0\right). \tag{18}

We would like to stress that a G-formula approach works also in a observational context as the one introduced in . However, because the estimation strategies presented in the next sections relies on estimating nuisance parameters, and that this latter task is often done in the same way as we estimate the conditional response μ\mu, we decided to not delay the introduction of the G-formula any further, and we present below a few estimation methods for μ\mu. These methods are sub-divised in two categories: TT-learners, where μ(,1)\mu(\cdot,1) is estimated separately from μ(,0)\mu(\cdot,0), and _SS-learners, where μ^\widehat\mu is obtained by fitting a single model based on covariates (X,A)(X,A).

Cox’s Model. There are many ways to model μ\mu in a survival context, the most notorious of which being the Cox proportional hazards model (). It relies on a semi-parametric modelling the conditional hazard functions λ(a)(tX)\lambda^{(a)}(t|X) as λ(a)(tX)=λ0(a)(t)exp(Xβ(a)), \lambda^{(a)}(t|X) = \lambda_0^{(a)}(t) \exp(X^\top\beta^{(a)}), where λ0(a)\lambda^{(a)}_0 is a baseline hazard function and β(a)\beta^{(a)} has the same dimension as the vector of covariate XX. The conditional survival function then take the simple form (in the continuous case) S(a)(tX)=S0(a)(t)exp(Xβ(a)), S^{(a)}(t|X) = S^{(a)}_0(t)^{\exp(X^\top\beta^{(a)})}, where S0(a)(t)S^{(a)}_0(t) is the survival function associated with λ0(a)\lambda_0^{(a)}. The vector β(a)\beta^{(a)} is classically estimated by maximizing the so-called partial likelihood function as introduced in the original paper of Cox (): L(β):=Δi=1exp(Xiβ)T~jT~iexp(Xjβ), \mathcal{L}(\beta) := \prod_{\Delta_i = 1} \frac{\exp(X_i^\top \beta)}{\displaystyle\sum_{\widetilde T_j \geqslant\widetilde T_i} \exp(X_j^\top \beta)},

while the cumulative baseline hazard function can be estimated through the Breslow’s estimator (): Λ^0(a)(t)=Δi=1,T~it1T~jT~iexp(Xjβ^(a)) \widehat\Lambda_0^{(a)}(t) = \sum_{\Delta_i = 1, \widetilde T_i \leqslant t} \frac{1}{\displaystyle\sum_{\widetilde T_j \geqslant\widetilde T_i} \exp(X_j^\top \widehat\beta^{(a)})} where β^(a)\widehat\beta^{(a)} is a partial likelihood maximizer. One can show that (β^(a),Λ^0(a))(\widehat\beta^{(a)},\widehat\Lambda_0^{(a)}) is the MLE of the true likelihood, when Λ^0(a)\widehat\Lambda_0^{(a)} is optimized over all step fonctions of the form Λ0(t):=Δi=1hi,hiR+. \Lambda_0(t) := \sum_{\Delta_i=1} h_i, \quad h_i \in \mathbb{R}^+. This fact was already hinted in the original paper by Cox and made rigorous in many subsequent papers, see for instance Fan, Feng, and Wu (). Furthermore, if the true distribution follows a Cox model, then both β^(a)\widehat\beta^{(a)} and Λ^0(a)\widehat\Lambda_0^{(a)} are strongly consistent and asymptotically normal estimator of the true parameters β(a)\beta^{(a)} and Λ(a)\Lambda^{(a)}, see Kalbfleisch and Prentice (), Sec 5.7. When using a TT-learner approach, one simply finds (β^(a),Λ^0(a))(\widehat\beta^{(a)},\widehat\Lambda_0^{(a)}) for a{0,1}a \in \{0,1\} by considering the control group and the treated group separately. When using a SS-learner approach, the treatment status AA becomes a covariate a the model becomes λ(tX,A)=λ0(t)exp(Xβ+αA).(19) \lambda(t|X,A) = \lambda_0(t) \exp(X^\top\beta+\alpha A). \tag{19} for some αR\alpha \in \mathbb{R}. One main advantage of Cox’s model is that it makes it very easy to interpret the effect of a covariate on the survival time. If indeed α>0\alpha > 0, then the treatment has a negative effect of the survival times. Likewise, if βi>0\beta_i >0, then the ii-th coordinate of XX has a negative effect as well. We finally mention that the hazard ratio takes a particularly simple form under the later model since θHR=eα. \theta_{\mathrm{HR}} = e^{\alpha}. In particular, it does not depends on the time horizon τ\tau, and is thus sometimes refered to as proportional hazard. illustrates the estimation of the difference in Restricted Mean Survival Time using G-formula with Cox models.

Figure 4: Illustration of the G-formula for estimating θRMST\theta_{\mathrm{RMST}} in an RCT when only one covariate X1X_1 influences the outcome.

Weibull Model. A popular parametric model for survival is the Weibull Model, which amounts to assume that λ(a)(tX)=λ0(a)(t)exp(Xβ) \lambda^{(a)}(t|X) = \lambda_0^{(a)}(t) \exp(X^\top\beta) where λ0(a)(t)\lambda_0^{(a)}(t) is the instant hazard function of a Weibull distribution, that is to say a function proportional to tγt^{\gamma} for some shape parameter γ>0\gamma >0. We refer to Zhang () for a study of this model.

Survival Forests. On the non-parametric front, we mention the existence of survival forests ().

2.2.2 Censoring unbiased transformations

Censoring unbiased transformations involve applying a transformation to TT. Specifically, we compute a new time TT^* of the form T:=T(T~,X,A,Δ)={ϕ0(T~τ,X,A)ifΔτ=0,ϕ1(T~τ,X,A)ifΔτ=1.(20) T^* := T^*(\widetilde T,X,A,\Delta) = \begin{cases} \phi_0(\widetilde T \wedge \tau,X,A) \quad &\text{if} \quad \Delta^\tau = 0, \\ \phi_1(\widetilde T \wedge \tau,X,A) \quad &\text{if} \quad \Delta^\tau = 1. \end{cases} \tag{20} for two wisely chosen transformations ϕ0\phi_0 and ϕ1\phi_1, and where Δτ:=I{TτC}=Δ+(1Δ)I(T~τ)(21)\Delta^{\tau}:=\mathbb{I}\{T \wedge \tau \leqslant C\} = \Delta+(1-\Delta)\mathbb{I}(\widetilde T \geqslant\tau) \tag{21} is the indicator of the event where the individual is either uncensored or censored after time τ\tau. The idea behind the introduction of Δτ\Delta^\tau is that because we are only interested in computed the expectation of the survival time thresholded by τ\tau, any censored observation coming after time τ\tau can in fact be considered as uncensored (Δτ=1\Delta^\tau = 1).

A censoring unbiased transformation TT^* shall satisfy: for a{0,1}a \in \{0,1\}, it holds E[TA=a,X]=E[T(a)τX]almost surely.(22) \mathbb{E}[T^*|A=a,X] = \mathbb{E}[T(a) \wedge \tau |X] \quad\text{almost surely.} \tag{22} A notable advantage of this approach is that it enables the use of the full transformed dataset (Xi,Ai,Ti)(X_i,A_i,T^*_i) as if no censoring occured. Because it holds E[E[TA=a,X]]=E[I{A=a}P(A=a)T],(23) \mathbb{E}[\mathbb{E}[T^*|A=a,X]] = \mathbb{E}\left[\frac{\mathbb{I}\{A=a\}}{\mathbb{P}(A=a)} T^*\right], \tag{23} there is a very natural way to derive an estimator of the difference in RMST from any censoring unbiased transformation TT^* as: θ^=1ni=1n(Aiπ1Ai1π)Ti(24) \widehat\theta = \frac1n\sum_{i=1}^n \left(\frac{A_i}{\pi}-\frac{1-A_i}{1-\pi} \right) T^*_i \tag{24} where π=P(A=1)(0,1)\pi = \mathbb{P}(A=1) \in (0,1) by Assumption and where Ti=T(T~i,Xi,Ai,Δi)T^*_i = T^*(\widetilde T_i,X_i,A_i,\Delta_i). We easily get the following result.

Proposition 3 Under Assumptions and , the estimator θ^\widehat\theta derived as in from a square integrable censoring unbiased transformations satisfying is an unbiased, strongly consistent, and asymptotically normal estimator of the difference in RMST.

Square integrability will be ensured any time the transformnation is bounded, which will always be the case of the ones considered in this work. It is natural in a RCT setting to assume that probability of being treated π\pi is known. If not, it is usual to replace π\pi by its empirical counterpart π^=n1/n\widehat\pi = n_1/n where na=i1{A=a}n_a = \sum_{i} \mathbb{1}\{A=a\}. The resulting estimator takes the form θ^=1n1Ai=1Ti1n0Ai=0Ti.(25) \widehat\theta = \frac1{n_1}\sum_{A_i = 1} T^*_i - \frac1{n_0}\sum_{A_i = 0} T^*_i. \tag{25} Note however that this estimator is slighlty biased due to the estimation of π\pi (see for instance Colnet et al. (), Lemma 2), but it is still strongly consistent and asymptotically normal, and its biased is exponentially small in nn.

Proposition 4 Under Assumptions and , the estimator θ^\widehat\theta derived as in from a square integrable censoring unbiased transformations satisfying is a strongly consistent, and asymptotically normal estimator of the difference in RMST.

The two most popular transformations are Inverse-Probability-of-Censoring Weighting (Koul, Susarla, and Ryzin ()) and Buckley-James (Buckley and James ()), both illustrated in and detailed below. In the former, only non-censored observations are considered and they are weighted while in the latter, censored observations are imputed with an estimated survival time.

Figure 5: Illustration of Inverse-Probability-of-Censoring and Buckley-James transformations.

The Inverse-Probability-of-Censoring Weighted transformation

The Inverse-Probability-of-Censoring Weighted (IPCW) transformation, introduced by (Koul, Susarla, and Ryzin ()) in the context of censored linear regression, involves discarding censored observations and applying weights to uncensored data. More precisely, we let TIPCW=ΔτG(T~τX,A)T~τ,(26) T^*_{\mathrm{IPCW}}=\frac{\Delta^\tau}{G(\widetilde{T}\wedge \tau|X,A)} \widetilde{T} \wedge \tau, \tag{26} where we recall that G(tX,A):=P(CtX,A)G(t|X,A) :=\mathbb{P}(C \geqslant t|X,A) is the left limit of the conditional survival function of the censoring. This estimator assigns higher weights to uncensored subjects within a covariate group that is highly prone to censoring, thereby correcting for conditionally independent censoring and reducing selection bias ().

Proposition 5 Under Assumptions , , , and , the IPCW transform is a censoring unbiased transformation in the sense of .

The proof of is in . The IPCW depends on the unknown conditional survival function of the censoring G(X,A)G(\cdot|X,A), which thus needs to be estimated. Estimating conditional censoring or the conditional survival function can be approached similarly, as both involve estimating a time—whether for survival or censoring. Consequently, we can use semi-parametric methods, such as the Cox model, or non-parametric approaches like survival forests, and we refer to for a development on these approaches. Once an estimator G^(A,X)\widehat G(\cdot|A,X) of the later is provided, one can construct an estimator of the difference in RMST based on or θ^IPCW=1ni=1n(Aiπ1Ai1π)TIPCW,i,(27) \widehat\theta_{\mathrm{IPCW}} = \frac1n \sum_{i=1}^n \left(\frac{A_i}{\pi}-\frac{1-A_i}{1-\pi}\right) T^*_{\mathrm{IPCW},i}, \tag{27} or θ^IPCW=1n1Ai=1TIPCW,i1n0Ai=0TIPCW,i.(28) \widehat\theta_{\mathrm{IPCW}} = \frac1{n_1}\sum_{A_i = 1} T^*_{\mathrm{IPCW},i} - \frac1{n_0}\sum_{A_i = 0} T^*_{\mathrm{IPCW},i}. \tag{28} where we recall that na:=#{i[n]  Ai=a}n_a := \#\{i \in [n]~|~A_i=a\}. By , and , we easily deduce that θ^IPCW\widehat\theta_{\mathrm{IPCW}} is asymptotically consistent as soon as G^\widehat G is.

Corollary 1 Under Assumptions, , , and , if G^\widehat G is uniformly weakly (resp. strongly) consistent then so is θ^IPCW\widehat\theta_{\mathrm{IPCW}}, either as in defined in or in .

This result simply comes from the fact that θ^IPCW\widehat\theta_{\mathrm{IPCW}} depends continuously on G^\widehat G and that GG is lower-bounded (Assumption ). Surprisingly, we found limited use of this estimator in the literature, beside its first introduction in Koul, Susarla, and Ryzin (). This could potentially be explained by the fact that, empirically, we observed that this estimator is highly variable. Consequently, we do not explore its properties further and will not include it in the numerical experiments. A related and more popular estimator is the IPCW-Kaplan-Meier, defined as follows.

Definition 3 (IPCW-Kaplan-Meier) We let again G^(X,A)\widehat G(\cdot|X,A) be an estimator of the (left limit of) the conditional censoring survival function and we introduce

DkIPCW(a):=i=1nΔiτG^(T~iτXi,A=a)I(T~i=tk,Ai=a)andNkIPCW(a):=i=1nΔiτG^(T~iτXi,A=a)I(T~itk,Ai=a),\begin{align*} D_k^{\mathrm{IPCW}}(a) &:= \sum_{i=1}^n \frac{\Delta_i^\tau}{\widehat G(\widetilde T_i \wedge\tau | X_i,A=a)} \mathbb{I}(\widetilde T_i = t_k, A_i=a) \\ \quad\text{and}\quad N^{\mathrm{IPCW}}_k(a) &:= \sum_{i=1}^n \frac{\Delta_i^\tau}{\widehat G(\widetilde T_i \wedge\tau | X_i,A=a)} \mathbb{I}(\widetilde T_i \geqslant t_k, A_i=a), \end{align*}

be the weight-corrected numbers of deaths and of individual at risk at time tkt_k. The IPCW version of the KM estimator is then defined as: S^IPCW(tA=a)=tkt(1DkIPCW(a)NkIPCW(a)). \begin{aligned} \widehat{S}_{\mathrm{IPCW}}(t | A=a) &= \prod_{t_k \leqslant t}\left(1-\frac{D_k^{\mathrm{IPCW}}(a)}{N_k^{\mathrm{IPCW}}(a)}\right). \end{aligned}

Note that the quantity π\pi is not present in the definition of DkIPCW(a)D_k^{\mathrm{IPCW}}(a) and NkIPCW(a)N_k^{\mathrm{IPCW}}(a) because it would simply disappear in the ratio DkIPCW(a)/NkIPCW(a)D_k^{\mathrm{IPCW}}(a)/N_k^{\mathrm{IPCW}}(a). The subsequent RMST estimator then take the form θ^IPCWKM=0τS^IPCW(tA=1)S^IPCW(tA=0)dt.(29) \widehat{\theta}_{\mathrm{IPCW-KM}} = \int_{0}^{\tau}\widehat{S}_{\mathrm{IPCW}}(t|A=1)-\widehat{S}_{\mathrm{IPCW}}(t|A=0)dt. \tag{29} Like before for the classical KM estimator, this new reweighted KM estimator enjoys good statistical properties.

Proposition 6 Under Assumptions , , , and , and for all t[0,τ]t \in [0,\tau], the oracle estimator SIPCW(tA=a)S^*_{\mathrm{IPCW}}(t|A=a) defined as in with G^=G\widehat G = G is a stronlgy consistent and asymptotically normal estimator of S(a)(t)S^{(a)}(t) .

The proof of can be found in . Because the evaluation of NkIPCW(a)N_k^{\textrm{IPCW}}(a) now depends on information gathered after time tkt_k (through the computation of the weights), the previous proofs on the absence of bias and on the derivation of the asymptotic variance unfortunately do not carry over in this case. Whether its bias is exponentially small and whether the asymptotic variance can be derived in a closed form are questions left open. We are also not aware of any estimation schemes for the asymptotic variance in this context. In the case where we do not have access to the oracle survival function GG, we can again still achieve consistency if the estimator G^(A,X)\widehat G(\cdot|A,X) that we provide is consistent.

Corollary 2 Under Assumptions , , and , if G^\widehat G is uniformly weakly (resp. strongly) consistent then so is S^IPCW(tA=a)\widehat S_{\mathrm{IPCW}}(t|A=a).

This corollary ensures that the corresponding RMST estimator defined in will be consistent as well.

The Buckley-James transformation

One weakness of the IPCW transformation is that it discards all censored data. The Buckley-James (BJ) transformation, introduced by (Buckley and James ()), takes a different path by leaving all uncensored values untouched, while replacing the censored ones by an extrapolated value. Formally, it is defined as follows:

TBJ=Δτ(T~τ)+(1Δτ)QS(T~τX,A),(30) \begin{aligned} T^*_{\mathrm{BJ}} &= \Delta^\tau (\widetilde{T}\wedge\tau) + (1-\Delta^\tau) Q_S(\widetilde T \wedge \tau|X,A), \end{aligned} \tag{30}

where, for tτt \leqslant\tau, QS(tX,A):=E[TτX,A,Tτ>t]=1S(tX,A)tτS(uX,A) ⁣duQ_S(t|X,A) :=\mathbb{E}[T \wedge \tau | X,A,T \wedge \tau > t]= \frac{1}{S(t|X,A)}\int_{t}^{\tau} S(u|X,A) \mathop{}\!\mathrm{d}u where S(tX,A=a):=P(T(a)>tX)S(t|X,A=a) := \mathbb{P}(T(a) > t|X) is the conditional survival function. We refer again to for a diagram of this transformation.

Proposition 7 Under Assumptions , , and , the BJ transform is a censoring unbiased transformation in the sense of .

The proof of can be found in . Again, the BJ transformation depends on a nuisance parameter (here QS(X,A)Q_S(\cdot|X,A)) that needs to be estimated. We again refer to for a brief overview of possible estimation strategies for QSQ_S. Once provided with an estimator Q^S(A,X)\widehat Q_S(\cdot|A,X), a very natural estimator of the RMST based on the BJ transformation and on or would be θ^BJ=1ni=1n(Aiπ1Ai1π)TBJ,i,(31) \widehat\theta_{\mathrm{BJ}} = \frac1n \sum_{i=1}^n \left(\frac{A_i}{\pi}-\frac{1-A_i}{1-\pi}\right) T^*_{\mathrm{BJ},i}, \tag{31} or θ^BJ=1n1Ai=1TBJ,i1n0A0=1TBJ,i.(32) \widehat\theta_{\mathrm{BJ}} = \frac1{n_1} \sum_{A_i=1} T^*_{\mathrm{BJ},i}-\frac1{n_0} \sum_{A_0=1} T^*_{\mathrm{BJ},i}. \tag{32} Like for the IPCW transformation, the BJ transformation yields a consistent estimate of the RMST as soon as the model is well-specified.

Corollary 3 Under Assumptions , , and , if Q^S\widehat Q_S is uniformly weakly (resp. strongly) consistent then so is θ^BJ\widehat\theta_{\mathrm{BJ}} defined as in or .

The proof is again a mere application of Propositions , and , and relies on the continuity of SQSS \mapsto Q_S. The BJ transformation is considered as the best censoring transformation of the original response in the following sense.

Theorem 1 For any transformation TT^* of the form , it holds E[(TBJTτ)2]E[(TTτ)2]. \mathbb{E}[(T^*_{\mathrm{BJ}}-T \wedge \tau)^2] \leqslant\mathbb{E}[(T^*-T \wedge \tau)^2].

This result is stated in Fan and Gijbels () but without a proof. We detail it in for completeness.

2.2.3 Augmented corrections

The main disadvantage of the two previous transformations is that they heavily rely on the specification of good estimator G^\widehat G (for IPCW) or S^\widehat S (for BJ). In order to circumvent this limitation, D. Rubin and Laan () proposed the following transformations, whose expression is based on theory of semi-parametric estimation developed in Laan and Robins (),
TDR=ΔτT~τG(T~τX,A)+(1Δτ)QS(T~τX,A)G(T~τX,A)0T~τQS(tX,A)G(tX,A)2 ⁣dPC(tX,A),(33) T^*_\mathrm{DR} = \frac{\Delta^\tau \widetilde T\wedge \tau}{G(\widetilde T \wedge \tau|X,A)} + \frac{(1-\Delta^\tau)Q_S(\widetilde T \wedge \tau |X,A)}{G(\widetilde T \wedge \tau |X,A)}- \int_0^{\widetilde T \wedge \tau} \frac{Q_S(t|X,A)}{G(t|X,A)^2} \mathop{}\!\mathrm{d}\mathbb{P}_C(t|X,A), \tag{33} where  ⁣dPC(tX,A)\mathop{}\!\mathrm{d}\mathbb{P}_C(t|X,A) is the distribution of CC conditional on the covariates XX and AA. We stress that this distribution is entirely determined by the G(X,A)G(\cdot|X,A), so that this transformation only depends on the knowledge of both conditional survival functions GG and SS, will be thus sometimes denoted TDR(G,S)T^*_\mathrm{DR}(G,S) to underline this dependency. This transformation is not only a censoring unbiased transform in the sense of , but is also doubly robust in the following sense.

Proposition 8 We let F,RF,R be two conditional survival functions. Under Assumptions , , , and , if FF also satisfies Assumption , and if F(X,A)F(\cdot|X,A) is absolutely continuous wrt G(X,A)G(\cdot|X,A), then the transformation TDR=TDR(F,R)T^*_\mathrm{DR} = T^*_\mathrm{DR}(F,R) is a censoring unbiased transformation in the sense of whenever F=GF = G or R=SR=S.

The statement and proof of this results is found in D. Rubin and Laan () in the case where CC and TT are continuous. A careful examination of the proofs show that the proof translates straight away to our discrete setting.

3 Causal survival analysis in observational studies

Unlike RCT, observational data — such as from registries, electronic health records, or national healthcare systems — are collected without controlled randomized treatment allocation. In such cases, treated and control groups are likely unbalanced due to the non-randomized design, which results in the treatment effect being confounded by variables influencing both the survival outcome TT and the treatment allocation AA. To enable identifiability of the causal effect, additional standard assumptions are needed.

Assumption. (Conditional exchangeability / Unconfoundedness) It holds AT(0),T(1)X(34) A \perp\mkern-9.5mu\perp T(0),T(1) | X \tag{34}

Under , the treatment assignment is randomly assigned conditionally on the covariates XX. This assumption ensures that there are no unmeasured confounder as the latter would make it impossible to distinguish correlation from causality.

Assumption. (Positivity / Overlap for treatment) Letting e(X):=P(A=1X)e(X) := \mathbb{P}(A=1|X) be the propensity score, there holds 0<e(X)<1almost surely.(35) 0 < e(X) <1 \quad \text{almost surely.} \tag{35}

The assumption requires adequate overlap in covariate distributions between treatment groups, meaning every observation must have a non-zero probability of being treated. Because Assumption does not hold anymore, neither does the previous idenfiability . In this new context, we can write

θRMST=E[T(1)τT(0)τ]=E[E[T(1)τX]E[T(0)τX]]=E[E[T(1)τX,A=1]E[T(0)τX,A=0]](unconfoundness)=E[E[TτX,A=1]E[TτX,A=0]].(SUTVA)(36) \begin{aligned} \theta_{\mathrm{RMST}} &= \mathbb{E}[T(1) \wedge \tau - T(0) \wedge \tau]\\ &= \mathbb{E}\left[\mathbb{E}[T(1) \wedge \tau|X] - \mathbb{E}[T(0) \wedge \tau|X] \right] && \\ &=\mathbb{E}\left[\mathbb{E}[T(1) \wedge \tau|X, A=1] - \mathbb{E}[T(0) \wedge \tau|X, A=0]\right] && \tiny\text{(unconfoundness)} \\ &= \mathbb{E}\left[\mathbb{E}[T \wedge \tau|X, A=1] - \mathbb{E}[T \wedge \tau|X, A=0]\right]. && \tiny\text{(SUTVA)} \end{aligned} \tag{36} In another direction, one could wish to identify the treatment effect through the survival curve as in : S(a)(t)=P(T(a)>t)=E[P(T>tX,A=a)].(37) S^{(a)}(t) = \mathbb{P}(T(a) > t) = \mathbb{E}\left[\mathbb{P}(T > t | X, A=a)\right]. \tag{37} Again, both identities still depend on the unknown quantity TT and suggest two different estimation strategies. These strategies differ according to the censoring assumptions and are detailed below.

3.1 Independent censoring

illustrates a causal graph in observational survival data with independent censoring (Assumption ).

Figure 6: Causal graph in observational survival data with independent censoring.

Under Assumption , we saw in that the Kaplan-Meier estimator could conveniently handle censoring. Building on , we can write S(1)(t)=E[E[I{A=1,T>t}X]E[I{A=1}X]]=E[AI{T>t}e(X)], S^{(1)}(t) = \mathbb{E}\left[\frac{\mathbb{E}[\mathbb{I}\{A=1,T > t\}|X]}{\mathbb{E}[\mathbb{I}\{A=1\}|X]} \right]=\mathbb{E}\left[\frac{A\mathbb{I}\{T > t\}}{e(X)} \right], which suggests to adapt the classical KM estimator by reweighting it by the propensity score. The use of propensity score in causal inference has been initially introduced by Rosenbaum and Rubin () and further developed in Hirano, Imbens, and Ridder (). It was extended to survival analysis by Xie and Liu () through the adjusted Kaplan-Meier estimator as defined below.

Definition 4 (IPTW Kaplan-Meier estimator) We let e^()\widehat e(\cdot) be an estimator of the propensity score e()e(\cdot). We introduce

DkIPTW(a):=i=1n(ae^(Xi)+1a1e^(Xi))I(T~i=tk,Δi=1,Ai=a)andNkIPTW(a):=i=1n(ae^(Xi)+1a1e^(Xi))I(T~itk,Ai=a),\begin{align*} D_k^{\mathrm{IPTW}}(a) &:= \sum_{i=1}^n \left(\frac{a}{\widehat e(X_i)}+\frac{1-a}{1- \widehat e(X_i)}\right)\mathbb{I}(\widetilde T_i = t_k, \Delta_i = 1, A_i=a) \\ \quad\text{and}\quad N^{\mathrm{IPTW}}_k(a) &:= \sum_{i=1}^n \left(\frac{a}{\widehat e(X_i)}+\frac{1-a}{1- \widehat e(X_i)}\right) \mathbb{I}(\widetilde T_i \geqslant t_k, A_i=a), \end{align*}

be the numbers of deaths and of individual at risk at time tkt_k, reweighted by the propensity score. The Inverse-Probability-of-Treatment Weighting (IPTW) version of the KM estimator is then defined as: S^IPTW(tA=a)=tkt(1DkIPTW(a)NkIPTW(a)).(38) \begin{aligned} \widehat{S}_{\mathrm{IPTW}}(t | A=a) &= \prod_{t_k \leqslant t}\left(1-\frac{D_k^{\mathrm{IPTW}}(a)}{N_k^{\mathrm{IPTW}}(a)}\right). \end{aligned} \tag{38}

We let SIPTW(tA=a)S^*_{\mathrm{IPTW}}(t | A=a) be the oracle KM-estimator defined as above with e^()=e()\widehat e(\cdot) = e(\cdot). Although the reweighting slightly changes the analysis, the good properties of the classical KM carry on to this setting.

Proposition 9 Under Assumptions , , , and The oracle IPTW Kaplan-Meier estimator SIPTW(tA=a)S^*_{\mathrm{IPTW}}(t | A=a) is a strongly consistent and asymptotically normal estimator of S(a)(t)S^{(a)}(t).

The proof of this result simply relies again on the law of large number and the δ\delta-method and can be found in . Because now SIPTWS^*_{\mathrm{IPTW}} is a continuous function of e()e(\cdot), and because ee and 1e1-e are lower-bounded as per Assumptions , we easily derive the following corollary.

Corollary 4 Under the same assumptions as , if e^()\widehat e(\cdot) satisfies e^e0\|\widehat e-e\|_{\infty} \to 0 almost surely (resp. in probability), then the IPTW Kaplan-Meier estimator S^IPTW(tA=a)\hat S_{\mathrm{IPTW}}(t | A=a) is a strongly (resp. weakly) consistent estimator of S(a)(t)S^{(a)}(t).

The resulting RMST estimator simply takes the form: θ^IPTWKM=0τS^IPTW(tA=1)S^IPTW(tA=0)dt.(39) \widehat{\theta}_{\mathrm{IPTW-KM}} = \int_{0}^{\tau}\widehat{S}_{\mathrm{IPTW}}(t|A=1)-\widehat{S}_{\mathrm{IPTW}}(t|A=0)dt. \tag{39} which will be consistent under the same Assumptions as the previous Corollary. Note that, we are not aware of any formal results concerning the bias and the asymptotic variance of the oracle estimator SIPTW(tA=a)S^*_{\mathrm{IPTW}}(t | A=a), and we refer to Xie and Liu () for heuristics concerning these questions.

3.2 Conditional independent censoring

Under Assumptions (uncounfoundedness) and (conditional independent censoring), the causal effect is affected both by confounding variables and by conditional censoring. The associated causal graph is depicted in . In this setting, one can still use the GG-formula exactly as in .

A natural alternative approach is to weight the IPCW and BJ transformations from by the propensity score to disentangle both confounding effects and censoring at the same time.

Figure 7: Causal graph in observational survival data with dependent censoring.

We mention that the G-formula approach developed in does work in that context. In particular, Chen and Tsiatis () prove consistency and asymptotic normality results for Cox estimators in a observational study, and they give an explicit formulation of the asymptotic variance as a function of the parameters of the Cox model. In the non-parametric world, Foster, Taylor, and Ruberg () and Künzel et al. () empirically study this estimator using survival forests, with the former employing it as a T-learner (referred to as Virtual Twins) and the latter as an S-learner.

3.2.1 IPTW-IPCW transformations

One can check that the IPCW transformation as introduced in is also a censoring unbiased transformation in that context.

Proposition 10 Under Assumptions , , , and , the IPTW-IPCW transform is a censoring unbiased transformation in the sense of .

The proof of can be found in . Deriving an estimator of the difference in RMST is however slightly different in that context. In particular, rewrites E[E[TX,A=1]]=E[Ae(X)T], \mathbb{E}[\mathbb{E}[T^*|X,A=1]] = \mathbb{E}\left[\frac{A}{e(X)} T^*\right], Which in turn suggests to define θ^IPTWIPCW=1ni=1n(Ae(X)1A1e(X))TIPCW,i.(40) \widehat\theta_{\mathrm{IPTW-IPCW}} = \frac1n\sum_{i=1}^n \left(\frac{A}{e(X)}-\frac{1-A}{1-e(X)} \right) T^*_{\mathrm{IPCW},i}. \tag{40}

This transformation now depends on two nuisance parameters, namely the conditional survival function of the censoring (through TIPCWT^*_{\mathrm{IPCW}}) and the propensity score. Once estimators of these quantities are provided, one could look at the corresponding quantity computed with these estimators.

Proposition 11 Under Assumptions , , , and , and if G^(X,A)\widehat G(\cdot|X,A) and e^()\widehat e (\cdot) are uniformly weakly (resp. strongly) consistent estimators, then estimator defined with e^\widehat e and G^\widehat G is a weakly (resp. strongly) consistent estimator of the difference in RMST.

The proof of can be found in . We can also use the same strategy as for the IPCW transformation and incorporate the new weights into a Kaplan-Meier estimator.

Definition 5 (IPTW-IPCW-Kaplan-Meier) We let again G^(X,A)\widehat G(\cdot|X,A) and e^()\widehat e(\cdot) be estimators of the conditional censoring survival function and of the propensity score. We introduce

DkIPTWIPCW(a):=i=1n(Aie^(Xi)+1Ai1e^(Xi))ΔiτG^(T~iτXi,A=a)I(T~i=tk,Ai=a)andNkIPTWIPCW(a):=i=1n(Aie^(Xi)+1Ai1e^(Xi))ΔiτG^(T~iτXi,A=a)I(T~itk,Ai=a),\begin{align*} D_k^{\mathrm{IPTW-IPCW}}(a) &:= \sum_{i=1}^n \left(\frac{A_i}{\widehat e(X_i)}+\frac{1-A_i}{1-\widehat e(X_i)} \right)\frac{\Delta_i^\tau}{\widehat G(\widetilde T_i \wedge\tau | X_i,A=a)} \mathbb{I}(\widetilde T_i = t_k, A_i=a) \\ \quad\text{and}\quad N^{\mathrm{IPTW-IPCW}}_k(a) &:= \sum_{i=1}^n \left(\frac{A_i}{\widehat e(X_i)}+\frac{1-A_i}{1-\widehat e(X_i)} \right)\frac{\Delta_i^\tau}{\widehat G(\widetilde T_i \wedge\tau | X_i,A=a)} \mathbb{I}(\widetilde T_i \geqslant t_k, A_i=a), \end{align*}

be the weight-corrected numbers of deaths and of individual at risk at time tkt_k. The IPTW-IPCW version of the KM estimator is then defined as: S^IPTWIPCW(tA=a)=tkt(1DkIPTWIPCW(a)NkIPTWIPCW(a)). \begin{aligned} \widehat{S}_{\mathrm{IPTW-IPCW}}(t | A=a) &= \prod_{t_k \leqslant t}\left(1-\frac{D_k^{\mathrm{IPTW-IPCW}}(a)}{N_k^{\mathrm{IPTW-IPCW}}(a)}\right). \end{aligned}

The difference in RMST estimated with IPTW-IPCW-Kaplan-Meier survival curves is then simply as

θ^IPTWIPCWKM=0τS^IPTWIPCW(tA=1)S^IPTWIPCW(tA=0)dt.(41) \widehat{\theta}_{\mathrm{IPTW-IPCW-KM}} = \int_{0}^{\tau}\widehat{S}_{\mathrm{IPTW-IPCW}}(t|A=1)-\widehat{S}_{\mathrm{IPTW-IPCW}}(t|A=0)dt. \tag{41}

Proposition 12 Under Assumptions , , , and , and for all t[0,τ]t \in [0,\tau], if the oracle estimator SIPTWIPCW(tA=a)S^*_{\mathrm{IPTW-IPCW}}(t | A=a) defined as in with G^(A,X)=G(A,X)\widehat G(\cdot|A,X) = G(\cdot|A,X) and e^=e\widehat e = e is a strongly consistent and asymptotically normal estimator of S(a)(t)S^{(a)}(t) .

The proof of can be found in . Under consistency of the estimators of the nuisance parameters, the previous proposition implies that this reweighted Kaplan-Meier is a consistent estimator of the survival curve, which in turn implies consistency of θ^IPTWIPCWKM\widehat{\theta}_{\mathrm{IPTW-IPCW-KM}}.

Corollary 5 Under Assumptions , , , and , and for all t[0,τ]t \in [0,\tau], if the nuisance estimators G^(A,X)\widehat G(\cdot|A,X) and e^\widehat e are weakly (resp. strongly) uniformly consistent, then S^IPTWIPCW(tA=a)\widehat{S}_{\mathrm{IPTW-IPCW}}(t | A=a) is a weakly (resp. strongly) consistent estimator of S(a)(t)S^{(a)}(t).

We are not aware of general formula for the asymptotic variances in this context. We mention nonetheless that Schaubel and Wei () have been able to derive asymptotic laws in this framework in the particular case of Cox-models.

3.2.2 IPTW-BJ transformations

Like IPCW tranformation, BJ transformation is again a censoring unbiased transformation in an observational context.

Proposition 13 Under Assumptions , , , and , the IPTW-BJ transform is a censoring unbiased transformation in the sense of .

The proof of can be found in . The corresponding estimator of the difference in RMST is θ^IPTWBJ=1ni=1n(Ae(X)1A1e(X))TBJ,i.(42) \hat \theta_{\mathrm{IPTW-BJ}} = \frac1n\sum_{i=1}^n \left(\frac{A}{e(X)}-\frac{1-A}{1-e(X)} \right) T^*_{\mathrm{BJ},i}. \tag{42}

This transformation depends on the conditional survival function SS (through TBJT^*_{\mathrm{BJ}}) and the propensity score. Consistency of the nuisance parameter estimators implies again consistency of the RMST estimator.

Proposition 14 Under Assumptions , , , and , and if S^(X,A)\widehat S(\cdot|X,A) and e^()\widehat e (\cdot) are uniformly weakly (resp. strongly) consistent estimators, then θ^IPTWBJ\widehat\theta_{\mathrm{IPTW-BJ}} defined with S^\widehat S and e^\widehat e is a weakly (resp. strongly) consistent estimator of the RMST.

The proof of can be found in .

3.2.3 Double augmented corrections

Building on the classical doubly-robust AIPTW estimator from causal inference (), we could incorporate the doubly-robust transformations of to obtain a quadruply robust tranformation ΔQR=ΔQR(G,S,μ,e):=(Ae(X)1A1e(X))(TDR(G,S)μ(X,A))+μ(X,1)μ(X,0), \Delta^*_{\mathrm{QR}} = \Delta^*_{\mathrm{QR}}(G,S,\mu,e) := \left(\frac{A}{e(X)}-\frac{1-A}{1-e(X)}\right)(T^*_{\mathrm{DR}}(G,S)-\mu(X,A))+\mu(X,1)-\mu(X,0), where we recall that TDRT^*_{\mathrm{DR}} is defined in . This transformations depends on four nuisance parameters: GG and SS through TDRT^*_{\mathrm{DR}}, and now the propensity score ee and the conditional response μ\mu. This transformation doesn’t really fall into the scope of censoring unbiased transform, but it is easy to show that ΔQR\Delta^*_{\mathrm{QR}} is quadruply robust in the following sense.

Proposition 15 Let F,RF,R be two conditional survival function, pp be a propensity score, and ν\nu be a conditional response. Then, under the same assumption on F,RF,R as in , and under Assumptions , , , and , the transformations ΔQR=ΔQR(F,R,p,ν)\Delta^*_{\mathrm{QR}} = \Delta^*_{\mathrm{QR}}(F,R,p,\nu) satisfies, fo a0,1a \in {0,1}, E[ΔQRX]=E[T(1)τT(0)τX]if{F=GorR=Sandp=eorν=μ. \mathbb{E}[ \Delta^*_\mathrm{QR} |X] = \mathbb{E}[T(1)\wedge \tau-T(0)\wedge \tau |X]\quad\text{if}\quad \begin{cases} F = G \quad &\text{or}\quad R=S \quad \text{and} \\ p=e \quad &\text{or}\quad \nu=\mu. \end{cases}

This result is similar to Ozenne et al. (), Thm 1, and its proof can be found in . Based on estimators (G^,S^,μ^,e^)(\widehat G, \widehat S, \widehat\mu, \widehat e) of (G,S,μ,e)(G,S,\mu,e), one can then propose the following estimator of the RMST, coined the AIPTW-AIPCW estimator in Ozenne et al. (): θ^AIPTWAIPCW:=1ni=1nΔQR,i(G^,S^,μ^,e^)=1ni=1n(Aie^(Xi)1Ai1e^(Xi))(TDR(G^,S^)iμ^(Xi,Ai))+μ^(Xi,1)μ^(Xi,0).(43) \begin{aligned} \widehat\theta_{\mathrm{AIPTW-AIPCW}} &:= \frac1n \sum_{i=1}^n \Delta_{\mathrm{QR},i}^*(\widehat G, \widehat S, \widehat\mu, \widehat e) \\ &=\frac1n \sum_{i=1}^n \left(\frac{A_i}{\widehat e(X_i)}-\frac{1-A_i}{1-\widehat e(X_i)}\right)(T^*_{\mathrm{DR}}(\widehat G,\widehat S)_i-\widehat\mu(X_i,A_i)) + \widehat\mu(X_i,1)-\widehat\mu(X_i,0). \end{aligned} \tag{43} This estimator enjoys good asymptotic properties under parametric models, as detailed in Ozenne et al. ().

4 Implementation

In this section, we first summarize the various estimators and their properties. We then provide custom implementations for all estimators, even those already available in existing packages. These manual implementations serve two purposes: first, to make the methods accessible to the community when no existing implementation is available; and second, to facilitate a deeper understanding of the methods by detailing each step, even when a package solution exists. Finally, we present the packages available for directly computing θRMST\theta_{\mathrm{RMST}}.

4.1 Summary of the estimators

provides an overview of the estimators introduced in this paper, along with the corresponding nuisance parameters needed for their estimation and an overview of their statistical properties in particular regarding their sensitivity to misspecification of the nuisance parameters.

Table 3: Estimators of the difference in RMST and nuisance parameters needed to compute each estimator. Empty boxes indicate that the nuisance parameter is not needed in the estimator thus misspecification has no impact. Estimators in italic are those that are already implemented in available packages.
Estimator RCT Obs Ind Cens Dep Cens Outcome model Censoring model Treatment model Robustness
Unadjusted KM X\color{green}X X\color{green}X
IPCW-KM X\color{green}X X\color{green}X X\color{green}X GG
BJ X\color{green}X X\color{green}X X\color{green}X SS
IPTW-KM X\color{green}X X\color{green}X X\color{green}X ee
IPCW-IPTW-KM X\color{green}X X\color{green}X X\color{green}X X\color{green}X GG ee
IPTW-BJ X\color{green}X X\color{green}X X\color{green}X X\color{green}X SS ee
G-formula X\color{green}X X\color{green}X X\color{green}X X\color{green}X μ\mu
AIPTW-AIPCW X\color{green}X X\color{green}X X\color{green}X X{\color{green}X} S,μS,\mu GG ee X\color{green}X (Prp )

4.2 Implementation of the estimators

Across different implementations, we use the following shared functions provided in the utilitary.R\texttt{utilitary.R} file.

  • estimate_propensity_score\texttt{estimate\_propensity\_score}: function to estimate propensity scores e(X)e(X) using either parametric (i.e. logistic regression with the argument type_of_model = "glm"\texttt{type\_of\_model = "glm"}) or non-parametric methods (i.e. probability forest with the argument type_of_model ="probability forest"\texttt{type\_of\_model ="probability forest"} based on the probability_forest\texttt{probability\_forest} function from the grf () package). This latter can include cross-fitting (n.folds1\texttt{n.folds\>1}).

  • estimate_survival_function\texttt{estimate\_survival\_function}: function to estimate the conditional survival model, which supports either Cox models (argument type_of_model = "cox"\texttt{type\_of\_model = "cox"}) or survival forests (argument type_of_model = "survival forest"\texttt{type\_of\_model = "survival forest"}) which uses the function survival_forest\texttt{survival\_forest} from the grf () package. This latter can include cross-fitting (n.folds1\texttt{n.folds\>1}). The estimation can be done with a single learner (argument learner = "S-learner"\texttt{learner = "S-learner"}) or two learners (argument learner = "T-learner"\texttt{learner = "T-learner"}).

  • estimate_hazard_function\texttt{estimate\_hazard\_function}: function to estimate the instantaneous hazard function by deriving the cumulative hazard function at each time point. This cumulative hazard function is estimated from the negative logarithm of the survival function.

  • Q_t_hat\texttt{Q\_t\_hat}: function to estimate the remaining survival function at all time points and for all individuals which uses the former estimate_survival_function\texttt{estimate\_survival\_function}.

  • Q_Y\texttt{Q\_Y}: function to select the value of the remaining survival function from Q_t_hat\texttt{Q\_t\_hat} at the specific time-to-event.

  • integral_rectangles\texttt{integral\_rectangles}: function to estimate the integral of a decreasing step function using the rectangle method.

  • expected_survival\texttt{expected\_survival}: function to estimate the integral with x,yx,y coordinate (estimated survival function) using the trapezoidal method.

  • integrate\texttt{integrate}: function to estimate the integral at specific time points Y.grid\texttt{Y.grid} of a given integrand\texttt{integrand} function which takes initially its values on times\texttt{times} using the trapezoidal method.

Unadjusted Kaplan-Meier

Although Kaplan-Meier is implemented in the survival\texttt{survival} package (), we provide a custom implementation, Kaplan_meier_handmade\texttt{Kaplan\_meier\_handmade}, for completeness. The difference in Restricted Mean Survival Time, estimated using Kaplan-Meier as in can then be calculated with the RMST_1\texttt{RMST\_1} function.

As an alternative, one can also use the survfit\texttt{survfit} function in the survival package () for Kaplan-Meier and specify the rmean\texttt{rmean} argument equal to τ\tau in the corresponding summary function:

IPCW Kaplan-Meier

We first provide an adjusted.KM\texttt{adjusted.KM} function which is then used in the IPCW_Kaplan_meier\texttt{IPCW\_Kaplan\_meier} function to estimate the difference in RMST θ^IPCW\hat{\theta}_{\mathrm{IPCW}} as in . The survival censoring function G(tX)G(t|X) is computed with the estimate_survival_function\texttt{estimate\_survival\_function} utility function from the utilitary.R\texttt{utilitary.R} file.

One could also use the survfit\texttt{survfit} function in the survival package () in adding IPCW weights for treated and control group and specify the rmean\texttt{rmean} argument equal to τ\tau in the corresponding summary function:

This alternative approach for IPCW Kaplan-Meier would also be valid for IPTW and IPTW-IPCW Kaplan-Meier.

Buckley-James based estimator

The function BJ\texttt{BJ} estimates θRMST\theta_{\mathrm{RMST}} by implementing the Buckley-James estimator as in . It uses two functions available in the utilitary.R\texttt{utilitary.R} file, namely Q_t_hat\texttt{Q\_t\_hat} and Q_Y\texttt{Q\_Y}.

IPTW Kaplan-Meier

The function IPTW_Kaplan_meier\texttt{IPTW\_Kaplan\_meier} implements the IPTW-KM estimator in . It uses the estimate_propensity_score\texttt{estimate\_propensity\_score} function from the utilitary.R\texttt{utilitary.R}.

G-formula

We implement two versions of the G-formula: g_formula_T_learner\texttt{g\_formula\_T\_learner} and g_formula_S_learner\texttt{g\_formula\_S\_learner}. In g_formula_T_learner\texttt{g\_formula\_T\_learner}, separate models estimate survival curves for treated and control groups, whereas g_formula_S_learner\texttt{g\_formula\_S\_learner} uses a single model incorporating both covariates and treatment status to estimate survival time. The latter approach is also available in the RISCA package but is limited to Cox models.

IPTW-IPCW Kaplan-Meier

The IPTW_IPCW_Kaplan_meier\texttt{IPTW\_IPCW\_Kaplan\_meier} function implements the IPTW-IPCW Kaplan Meier estimator from . It uses the utilitary functions from the utilitary.R\texttt{utilitary.R} file estimate_propensity_score\texttt{estimate\_propensity\_score} and estimate_survival_function\texttt{estimate\_survival\_function} to estimate the nuisance parameters, and the function adjusted.KM\texttt{adjusted.KM} which computes an adjusted Kaplan Meier estimator using the appropriate weight.

IPTW-BJ estimator

The IPTW_BJ\texttt{IPTW\_BJ} implements the IPTW-BJ estimator in . It uses the utilitary functions, from the utilitary.R\texttt{utilitary.R} file, estimate_propensity_score\texttt{estimate\_propensity\_score}, Q_t_hat\texttt{Q\_t\_hat} and Q_Y\texttt{Q\_Y} to estimate the nuisance parameters.

AIPTW-AIPCW

The AIPTW_AIPCW\texttt{AIPTW\_AIPCW} function implements the AIPTW_AIPCW estimator in using the utilitary function from the utilitary.R\texttt{utilitary.R} file estimate_propensity_score\texttt{estimate\_propensity\_score}, Q_t_hat\texttt{Q\_t\_hat}, Q_Y\texttt{Q\_Y}, and estimate_survival_function\texttt{estimate\_survival\_function} to estimate the nuisance parameters.

4.3 Available packages

Currently, there are few sustained implementations available for estimating RMST in the presence of right censoring. Notable exceptions include the packages survRM2 (), grf () and RISCA (). Those packages are implemented in the utilitary.R\texttt{utilitary.R} files.

SurvRM2

The difference in RMST with Unadjusted Kaplan-Meier θ^KM\hat \theta_{KM} () can be obtained using the function rmst2\texttt{rmst2} which takes as arguments the observed time-to-event, the status, the arm which corresponds to the treatment and τ\tau.

RISCA

The RISCA package provides several methods for estimating θRMST\theta_{\mathrm{RMST}}. The difference in RMST with Unadjusted Kaplan-Meier θ^KM\hat \theta_{KM} () can be derived using the survfit\texttt{survfit} function from the the survival package () which estimates Kaplan-Meier survival curves for treated and control groups, and then the rmst\texttt{rmst} function calculates the RMST by integrating these curves, applying the rectangle method (type=“s”), which is well-suited for step functions.

The IPTW Kaplan-Meier () can be applied using the ipw.survival\texttt{ipw.survival} and rmst\texttt{rmst} functions. The ipw.survival function requires user-specified weights (i.e. propensity scores). To streamline this process, we define the RISCA_iptw\texttt{RISCA\_iptw} function, which combines these steps and utilizes the estimate_propensity_score\texttt{estimate\_propensity\_score} from the utilitary.R\texttt{utilitary.R} file.

A single-learner version of the G-formula, as introduced in , can be implemented using the gc.survival\texttt{gc.survival} function. This function requires as input the conditional survival function which should be estimated beforehand with a Cox model via the coxph\texttt{coxph} function from the survival\texttt{survival} package (). Specifically, the single-learner approach applies a single Cox model incorporating both covariates and treatment, rather than separate models for each treatment arm. We provide a function RISCA_gf\texttt{RISCA\_gf} that consolidates these steps.

grf

The grf\texttt{grf} package () enables estimation of the difference between RMST using the Causal Survival Forest approach (), which extends the non-parametric causal forest framework to survival data. The RMST can be estimated with the causal_survival_forest\texttt{causal\_survival\_forest} function, requiring covariates XX, observed event times, event status, treatment assignment, and the time horizon τ\tau as inputs. The average_treatment_effect\texttt{average\_treatment\_effect} function then evaluates the treatment effect based on predictions from the fitted forest.

5 Simulations

We compare the behaviors and performances of the estimators through simulations conducted across various experiemental contexts. These contexts include scenarios based on RCTs and observational data, with both independent and dependent censoring. We also use semi-parametric and non-parametric models to generate censoring and survival times, as well as logistic and nonlinear models to simulate treatment assignment.

5.1 RCT

Data Generating Process

We generate RCTs with independent censoring (Scenario 1) and conditionally independent censoring (Scenario 2). We sample nn iid datapoints (Xi,Ai,C,Ti(0),Ti(1))i[n](X_{i},A_{i},C,T_{i}(0), T_{i}(1))_{i \in [n]} where Ti(0),Ti(1)T_{i}(0), T_{i}(1) and CC follows Cox’s models. More specifically, we set

  • XN(μ,Σ)X \sim \mathcal{N}\left(\mu,\Sigma\right) where μ=(1,1,1,1)\mu = (1,1,-1,1) and Σ=Id4\Sigma = \mathrm{Id}_4.

  • The hazard function of T(0)T(0) is λ(0)(tX)=0.01exp{β0X}whereβ0=(0.5,0.5,0.5,0.5).\lambda^{(0)}(t|X)= 0.01 \exp \left\{ \beta_0^\top X\right\} \quad \text{where} \quad \beta_0 = (0.5,0.5,-0.5,0.5).

  • The survival times in the treatment group are given by T(1)=T(0)+10T(1) = T(0) + 10.

  • The hazard function of the censoring time CC is simply taken as λC(tX)=0.03\lambda_C(t|X)=0.03 in Scenario 1, and in Scenario 2 as λC(tX)=0.03exp{βCX}whereβC=(0.7,0.7,0.25,0.1).\lambda_C(t|X)=0.03 \cdot \exp \left\{\beta_C^\top X\right\} \quad \text{where} \quad \beta_C = (0.7,0.7,-0.25,-0.1).

  • The treatment allocation is independent of XX: e(X)=0.5e(X)=0.5.

  • The threshold time τ\tau is set to 25.

The descriptive statistics of the two datasets are displayed in Annex (). The graph of the difference in RMST as a function of τ\tau for the two scenarii are displayed below; θRMST\theta_{\mathrm{RMST}} is the same in both setting.

[1] "The ground truth for RCT scenario 1 and 2 at time 25 is 7.1"

Estimation of the RMST

For each Scenario, we estimate the difference in RMST using the methods summarized in . The methods used to estimate the nuisance components are indicated in brackets: either logistic regression or random forests for propensity scores and either cox models or survival random forests for survival and censoring models. A naive estimator where censored observations are simply removed and the survival time is averaged for treated and controls is also provided for a naive baseline.

shows the distribution of the difference in RMST for 100 simulations in Scenario 1 and different sample sizes: 500, 1000, 2000, 4000. The true value of θRMST\theta_{\mathrm{RMST}} is indicated by red dotted line.

Figure 8: Results of the ATE for the simulation of a RCT with independent censoring.

In this setting, and in accordance with the theory, the simplest estimator (unadjusted KM) performs just as well as the others, and presents an extremely small bias (as derived in ).

The naive estimator is biased, as expected, and the bias in both the G-formula (RISCA) and the manual G-formula S-learner arises because the treatment effect is additive T(1)=T(0)+10T(1) = T(0) + 10 and violates the assumption that TT would follow a Cox model in the variables (X,A)(X,A). However, TA=aT|A=a is a Cox-model for a{0,1}a \in \{0,1\}, which explain the remarkable performance of G-formula (Cox/T-learners) and some of the other models based on a Cox estimation of SS.

Other estimators (IPTW KM (Reg.Log), IPCW KM (Cox), IPTW-IPCW KM (Cox & Log.Reg), IPTW-BJ (Cox & Log.Reg), AIPTW-AIPCW (Cox & Cox & Log.Reg)) involve unnecessary nuisance parameter estimates, such as propensity scores or censoring models. Despite this, their performance remains relatively stable in terms of variability, and there are roughly no differences between using (semi-)parametric or non-parametric estimation methods for nuisance parameters except for IPCW KM and IPTW-IPCW KM where there is a slight bias when using forest-based methods.

Figure 9: Estimation results of the ATE for the simulation of a RCT with dependent censoring.

shows the results for the RCT simulation with conditionally independent censoring (Scenario 2). In this setting, the Naive estimator remains biased. Similarly, both the unadjusted Kaplan-Meier (KM) and its SurvRM2 equivalent, as well as the treatment-adjusted IPTW KM and its RISCA equivalent, are biased due to their failure to account for dependent censoring. As in Scenario 1, G-formula (Cox/ S-learner) and its RISCA equivalent also remain biased. The IPCW KM (Cox) is slightly biased up to 4,000 observations and quite variable due to extreme censoring probabilities. IPTW-IPCW KM (Cox & Log.Reg.) is not biased but shows high variance. In contrast, the Buckley-James estimator BJ (Cox) is unbiased even with as few as 500 observations. The BJ estimator also demonstrates smaller variance than IPCW methods. G-formula (Cox/ T-learners) and AIPCW-AIPTW (Cox & Cox & Log.Reg.) estimators seem to perform well, even in small samples. The forest versions of these estimators seem more biased, except Causal Survival Forest and the AIPTW-AIPCW (Forest). Notably, all estimators exhibit higher variability compared to Scenario 1.

5.2 Observational data

Data Generating Process

As for Scenarii 1 and 2, we carry out two simulations of an observational study with both independent and conditional independent censoring. The only difference lies in the simulation of the propensity score, which is no longer constant. For the simulation, an iid nn-sample (Xi,Ai,C,Ti(0),Ti(1))i[n](X_{i},A_{i},C,T_{i}(0), T_{i}(1))_{i \in [n]} is generated as in , except for the treatment allocation process that is given by:

logit(e(X))=βAXwhereβA=(1,1,2.5,1), \operatorname{logit}(e(X))= \beta_A^\top X \quad \text{where} \quad \beta_A = (-1,-1,-2.5,-1),
where we recall that logit(p)=log(p/(1p))\operatorname{logit}(p) = \log(p/(1-p)). The descriptive statistics for the two observational data with independent (Obs1\texttt{Obs1}) and conditionally independent censoring (Obs2\texttt{Obs2}) are displayed in Appendix (). Note that we did not modify the survival distribution, the target difference in RMST is thus the same.

[1] "The ground truth for Obs scenario 1 at time 25 is 7.1"
[1] "The ground truth for Obs scenario 2 at #time 25 is 7.1"

Estimation of the RMST

below shows the distribution of the estimators of θRMST\theta_{\mathrm{RMST}} for the observational study with independent censoring.

Figure 10: Estimation results of the ATE for the simulation of an observational study with independent censoring.

In the simulation of an observational study with independent censoring, confounding bias is introduced, setting it apart from RCT simulations. As expected, estimators that fail to adjust for this bias, such as unadjusted Kaplan-Meier (KM), IPCW KM (Cox), and their equivalents, are biased. However, estimators like IPTW KM (Log.Reg.), IPTW-IPCW KM (Cox & Log. Reg.) are unbiased, even if the latter estimate unnecessary nuisance components. Results with IPTW BJ (Cox & Log.Reg) are extremely variable.

The top-performing estimators in this scenario are G-formula (Cox/ T-learners) and AIPCW-AIPTW (Cox & Cox & Log.Reg.), which are unbiased even with 500 observations. The former has the lowest variance. All estimators that use forests to estimate nuisance parameters are biased across sample sizes from 500 to 8000. Although Causal Survival Forest and AIPW-AIPCW (Forest) are expected to eventually converge, they remain extremely demanding in terms of sample size. This setting thus highlights that one should either have an a priori knowledge on the specification of the models or large sample size.

below shows the distribution of the θRMST\theta_{\mathrm{RMST}} estimates for the observational study with conditionally independent censoring. The red dashed line represents the true θRMST\theta_{\mathrm{RMST}} for τ=25\tau=25.

Figure 11: Estimation results of the ATE for the simulation of an observational study with dependent censoring.

In the simulation of an observational study with conditionally independent censoring, estimators that do not account for both censoring and confounding bias, such as KM, IPCW KM, IPTW KM, and their package equivalents, are biased. The top-performing estimators in this scenario are G-formula (Cox/ T-learners) and AIPCW-AIPTW (Cox & Cox & Log.Reg.), which are unbiased even with 500 observations. The former has the lowest variance as expected, see . Surprisingly, the G-formula (Cox/S-learner) and its equivalent from the RISCA package perform quite competitively, showing only a slight bias despite the violation of the proportional hazards assumption. All estimators that use forests to estimate nuisance parameters are biased across sample sizes from 500 to 8000. Although Causal Survival Forest and AIPTW-AIPCW (Forest) are expected to eventually converge, they remain extremely demanding in terms of sample size.

5.3 Mispecification of nuisance components

Data Generating Process

We generate an observational study with covariate interactions and conditionally independent censoring. The objective is to assess the impact of misspecifying nuisance components; specifically, we will use models that omit interactions to estimate these components. This approach enables us to evaluate the robustness properties of various estimators. In addition, in this setting forest based methods are expected to behave better.

We generate nn samples (Xi,Ai,C,Ti(0),Ti(1))(X_{i},A_{i},C,T_{i}(0), T_{i}(1)) as follows:

  • XN(μ,Σ)X \sim \mathcal{N}\left(\mu, \Sigma\right) and μ=(0.5,0.5,0.7,0.5)\mu = (0.5,0.5,0.7,0.5), Σ=Id4\Sigma = \mathrm{Id}_4.

  • The hazard function of T(0)T(0) is given by λ(0)(tX)=exp{β0Y}whereβ0=(0.2,0.3,0.1,0.1,1,0,0,0,0,1),andY=(X12,X22,X32,X42,X1X2,X1X3,X1X4,X2X3,X2X4,X3X4). \begin{aligned} \lambda^{(0)}(t|X) = \exp\{\beta_0^\top Y\} \quad \text{where} \quad \beta_0 &= (0.2,0.3,0.1,0.1,1,0,0,0,0,1), \\ \text{and} \quad Y &= (X_1^2,X_2^2,X_3^2,X_4^2,X_1 X_2,X_1X_3,X_1X_4,X_2 X_3,X_2X_4, X_3 X_4). \end{aligned}

  • The distribution of T(1)T(1) is the one of T(0)T(0) but shifted: T(1)=T(0)+1T(1) = T(0)+1.

  • The hazard function of CC is given by λC(tX)=exp{βCY}whereβC=(0.05,0.05,0.1,0.1,0,1,0,1,0,0). \begin{aligned} \lambda_C(t|X) = \exp\{\beta_C^\top Y\} \quad \text{where} \quad \beta_C &= (0.05,0.05,-0.1,0.1,0,1,0,-1,0,0). \end{aligned}

  • The propensity score is logit(e(x))=βAYwhereβA=(0.05,0.1,0.5,0.1,1,0,1,0,0,0). \begin{aligned} \mathrm{logit}(e(x)) = \beta_A^\top Y \quad \text{where} \quad \beta_A= (0.05,-0.1,0.5,-0.1,1,0,1,0,0,0). \end{aligned} When the model is well-specified, the full vector (X,Y)(X,Y) is given as an input of the nuisance parameter models. When it is not, only XX and the first half of YY corresponding to (X12,X22,X32,X42)(X_1^2,X_2^2,X_3^2,X_4^2) is given as an input.

The descriptive statistics are given in Appendix ().

[1] "The ground truth for mis scenario at time 0.45 is 0.26"

Estimation of the RMST

First, we estimate θRMST\theta_{\mathrm{RMST}} without any model misspecification to confirm the consistency of the estimators under correctly specified nuisance models. More specifically, it means that for parametric propensity score models, semi-parametric censoring and survival models, we use models including interactions and squared assuming knowledge on the data generating process.

Next, we introduce misspecification individually for the treatment model, censoring model, and outcome model (), i.e., we use models without interaction to estimate parametric and semi-parametric nuisance components while the data are generated with interactions.
We further examine combined misspecifications for pairs of models: treatment and censoring, treatment and outcome, and outcome and censoring. Finally, we assess the impact of misspecifying all nuisance models simultaneously ().

Figure 12: Estimation results of the ATE for the simulation of an observational study with dependent censoring and non linear relationships.
Figure 13: Estimation results of the ATE for an observational study with dependent censoring in case of a single misspecification.

When there is no misspecification in , as expected, IPTW-BJ (Cox & Log.Reg), G-formula (Cox/ T-learners) and AIPTW-AIPCW (Cox & Cox & Reg.Log) are unbiased. IPTW-IPCW KM (Cox & Log.Reg) exhibits a bias but seems to converge at larger sample size. Regarding forest-based methods, IPTW-BJ (Forest), AIPTW-AIPCW (Forest) and Causal Survival Forest estimate accurately the difference in RMST. Surprisingly, G-formula (Forest/ T-learners), G-formula (Forest/ S-learner) and IPTW-IPCW KM (Forest) exhibit small bias but are expected to eventually converge at large sample size.

shows that AIPTW-AIPCW (Cox & Cox & Reg.Log) is convergent when there is one nuisance misspecification. In contrary, the other estimators are biased when one of its nuisance parameter is misspecified.

Figure 14: Estimation results of the ATE for an observational study with dependent censoring in case of a two or more misspecifications.

shows that, as expected, when all nuisance models are misspecified, all estimators exhibit bias. AIPTW-AIPCW (Cox & Cox & Reg.Log) seems to converge in case where either the outcome and censoring models, or the treatment and censoring models are misspecificed which deviates from initial expectations. It was anticipated that AIPTW-AIPCW would converge solely when both the censoring and treatment models were misspecified.

6 Conclusion

Based on the simulations and theoretical results, it might be advisable to stay away from the IPCW and IPTW-IPCW estimators, as they often exhibit excessive variability. Instead, we recommend implementing BJ which seems like a more stable transformation as IPCW, as well as Causal Survival Forest, G-formula (T-learners), IPTW-BJ, and AIPTW-AIPCW in both their Cox, Logistic Regression and forest versions. By qualitatively combining the results from these more robust estimators, we can expect to gain a fairly accurate understanding of the treatment effect.

It is important to note that our simulations utilize large sample sizes with relatively simple relationships, which may not fully capture the complexity of real-world scenarios. In practice, most survival analysis datasets tend to be smaller and more intricate, meaning the stability of certain estimators observed in our simulations may not generalize to real-world applications. Testing these methods on real-world datasets would provide a more comprehensive evaluation of their performance in practical settings.

An interesting direction for future work would be to focus on variable selection. Indeed, there is no reason to assume that the variables related to censoring should be the same as those linked to survival or treatment allocation. We could explore differentiating these sets of variables and study the impact on the estimators’ variance. Similarly to causal inference settings without survival data, we might expect, for instance, that adding precision variables—-those solely related to the outcome—-could reduce the variance of the estimators.

Additionally, the estimators of the Restricted Mean Survival Time (RMST) provide a valuable alternative to the Hazard Ratio. The analysis and code provided with this article enables further exploration of the advantages of the estimators of RMST for causal analysis in survival studies. This could lead to a deeper understanding of how these estimators can offer more stable and interpretable estimates of treatment effects, particularly in complex real-world datasets.

7 Disclosure

This study was funded by Sanofi. Charlotte Voinot and Bernard Sebastien are Sanofi employees and may hold shares and/or stock options in the company. Clément Berenfeld, Imke Mayer and Julie Josse have nothing to disclose.

References

Aalen, O., O. Borgan, and H. Gjessing. 2008. Survival and Event History Analysis: A Process Point of View. Statistics for Biology and Health. Springer New York. https://books.google.de/books?id=wEi26X-VuCIC.
Breslow, N. 1974. “Covariance Analysis of Censored Survival Data.” Biometrics 30 (1): 89–99. http://www.jstor.org/stable/2529620.
Breslow, N., and J. Crowley. 1974. A Large Sample Study of the Life Table and Product Limit Estimates Under Random Censorship.” The Annals of Statistics 2 (3): 437–53. https://doi.org/10.1214/aos/1176342705.
Buckley, and James. 1979. Linear regression with censored data.” Biometrika 66 (3): 429–36. https://doi.org/10.1093/biomet/66.3.429.
Chen, Pei-Yun, and Anastasios A. Tsiatis. 2001. “Causal Inference on the Difference of the Restricted Mean Lifetime Between Two Groups.” Biometrics 57 (4): 1030–38. https://doi.org/10.1111/j.0006-341X.2001.01030.x.
Colnet, Bénédicte, Julie Josse, Gaël Varoquaux, and Erwan Scornet. 2022. “Reweighting the RCT for Generalization: Finite Sample Error and Variable Selection.” arXiv Preprint arXiv:2208.07614.
Cox, D. R. 1972. “Regression Models and Life-Tables.” Journal of the Royal Statistical Society: Series B (Methodological) 34 (2): 187–202. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x.
Cui, Yifan, Michael R Kosorok, Erik Sverdrup, Stefan Wager, and Ruoqing Zhu. 2023. Estimating heterogeneous treatment effects with right-censored data via causal survival forests.” Journal of the Royal Statistical Society Series B: Statistical Methodology 85 (2): 179–211. https://doi.org/10.1093/jrsssb/qkac001.
Ebrahimi, Nader, Daniel Molefe, and Zhiliang Ying. 2003. “Identifiability and Censored Data.” Biometrika 90 (3): 724–27.
European Medecines Agency, EMA. 2024. “Reflection Paper on Use of Real-World Data in Non-Interventional Studies to Generate Real-World Evidence.” https://www.ema.europa.eu/en/reflection-paper-use-real-world-data-non-interventional-studies-generate-real-world-evidence-scientific-guideline.
Fan, Jianqing, Yang Feng, and Yichao Wu. 2010. “High-Dimensional Variable Selection for Cox’s Proportional Hazards Model.” In Borrowing Strength: Theory Powering Applications–a Festschrift for Lawrence d. Brown, 6:70–87. Institute of Mathematical Statistics.
Fan, Jianqing, and Irène Gijbels. 1994. “Censored Regression: Local Linear Approximations and Their Applications.” Journal of the American Statistical Association 89 (426): 560–70. https://doi.org/10.1080/01621459.1994.10476781.
Foster, Jared C., Jeremy M. G. Taylor, and Stephen J. Ruberg. 2011. “Subgroup Identification from Randomized Clinical Trial Data.” Statistics in Medicine 30 (24): 2867–80. https://doi.org/10.1002/sim.4322.
Foucher, Yohann, Florent Le Borgne, and Arthur Chatton. 2019. “RISCA: Causal Inference and Prediction in Cohort-Based Analyses.” CRAN: Contributed Packages. The R Foundation. https://doi.org/10.32614/cran.package.risca.
Gill, Richard. 1983. “Large Sample Behaviour of the Product-Limit Estimator on the Whole Line.” The Annals of Statistics, 49–58.
Hajime, Uno, Tian Lu, Horiguchi Miki, Cronin Angel, Battioui Chakib, and Bell James. 2015. “survRM2: Comparing Restricted Mean Survival Time.” CRAN: Contributed Packages. The R Foundation. https://doi.org/10.32614/cran.package.survrm2.
Hernán, Miguel A, and James M Robins. 2010. “Causal Inference.” CRC Boca Raton, FL.
Hirano, Keisuke, Guido W Imbens, and Geert Ridder. 2003. “Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score.” Econometrica 71 (4): 1161–89.
Howe, Chanelle J, Stephen R. Cole, Bryan Lau, Sonia Napravnik, and Joseph J. Eron. 2016. “Selection Bias Due to Loss to Follow up in Cohort Studies.” Epidemiology 27 1: 91–97. https://api.semanticscholar.org/CorpusID:21993915.
Huitfeldt, A., M. Stensrud, and E. Suzuki. 2019. “On the Collapsibility of Measures of Effect in the Counterfactual Causal Framework.” Emerging Themes in Epidemiology 16: 1–12.
Ishwaran, Hemant, Udaya B. Kogalur, Eugene H. Blackstone, and Michael S. Lauer. 2008. Random survival forests.” The Annals of Applied Statistics 2 (3): 841–60. https://doi.org/10.1214/08-AOAS169.
Kalbfleisch, John D, and Ross L Prentice. 2002. The Statistical Analysis of Failure Time Data. John Wiley & Sons.
Kaplan, E. L., and Paul Meier. 1958. “Nonparametric Estimation from Incomplete Observations.” Journal of the American Statistical Association 53 (282): 457–81. https://doi.org/10.1080/01621459.1958.10501452.
Koul, H., V. Susarla, and J. Van Ryzin. 1981. Regression Analysis with Randomly Right-Censored Data.” The Annals of Statistics 9 (6): 1276–88. https://doi.org/10.1214/aos/1176345644.
Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. 2019. “Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences 116 (10): 4156–65. https://doi.org/10.1073/pnas.1804597116.
Laan, Mark J. van der, and James M. Robins. 2003. “Introduction.” In Unified Methods for Censored Longitudinal Data and Causality, 8–101. New York, NY: Springer New York. https://doi.org/10.1007/978-0-387-21700-0_1.
Martinussen, Torben, Stijn Vansteelandt, and Per Andersen. 2020. “Subtleties in the Interpretation of Hazard Contrasts.” Lifetime Data Analysis 26 (October). https://doi.org/10.1007/s10985-020-09501-5.
Ozenne, Brice Maxime Hugues, Thomas Harder Scheike, Laila Stærk, and Thomas Alexander Gerds. 2020. “On the Estimation of Average Treatment Effects with Right-Censored Time to Event Outcome and Competing Risks.” Biometrical Journal 62 (3): 751–63. https://doi.org/10.1002/bimj.201800298.
Robins, James M., Andrea Rotnitzky, and Lue Ping Zhao. 1994. “Estimation of Regression Coefficients When Some Regressors Are Not Always Observed.” Journal of the American Statistical Association 89 (427): 846–66.
Rosenbaum, Paul R., and Donald B. Rubin. 1983. The central role of the propensity score in observational studies for causal effects.” Biometrika 70 (1): 41–55. https://doi.org/10.1093/biomet/70.1.41.
Royston, Patrick, and Mahesh KB Parmar. 2013. “Restricted Mean Survival Time: An Alternative to the Hazard Ratio for the Design and Analysis of Randomized Trials with a Time-to-Event Outcome.” BMC Medical Research Methodology 13: 1–15.
Rubin, Daniel, and Mark J. van der Laan. 2007. The International Journal of Biostatistics 3 (1). https://doi.org/10.2202/1557-4679.1052.
Rubin, Donald B. 1974. “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.” Journal of Educational Psychology 66 (5): 688–701. https://doi.org/10.1037/h0037350.
Schaubel, Douglas E., and Guanghui Wei. 2011. Double Inverse-Weighted Estimation of Cumulative Treatment Effects under Nonproportional Hazards and Dependent Censoring.” Biometrics 67 (1): 29–38. https://doi.org/10.1111/j.1541-0420.2010.01449.x.
Therneau, Terry M. 2001. “Survival: Survival Analysis.” CRAN: Contributed Packages. The R Foundation. https://doi.org/10.32614/cran.package.survival.
Tibshirani, Julie, Susan Athey, Erik Sverdrup, and Stefan Wager. 2017. “Grf: Generalized Random Forests.” CRAN: Contributed Packages. The R Foundation. https://doi.org/10.32614/cran.package.grf.
Turkson, Anthony, Francis Ayiah-Mensah, and Vivian Nimoh. 2021. “Handling Censoring and Censored Data in Survival Analysis: A Standalone Systematic Literature Review.” International Journal of Mathematics and Mathematical Sciences 2021 (September): 1–16. https://doi.org/10.1155/2021/9307475.
Xie, Jun, and Chaofeng Liu. 2005. “Adjusted Kaplan–Meier Estimator and Log-Rank Test with Inverse Probability of Treatment Weighting for Survival Data.” Statistics in Medicine 24 (20): 3089–3110. https://doi.org/10.1002/sim.2174.
Zhang, Zhongheng. 2016. “Parametric Regression Model for Survival Data: Weibull Regression Model as an Example.” Annals of Translational Medicine 4 (24).
Zhou, M. 1988. “Two-Sided Bias Bound of the Kaplan-Meier Estimator.” Probability Theory and Related Fields 79 (2): 165–73.

8 Appendix A: Proofs

8.1 Proofs of

Proof. (). Consistency is a trivial consequence of the law of large number and the identity . To show that S^KM\widehat S_{\mathrm{KM}} is unbiased, let us introduce Fk\mathcal{F}_k be the filtration generated by the set of variables {Ai,I{T~i=tj},I{T~i=tj,Δi=1}  j[k],i[n]}. \{A_i, \mathbb{I}\{\widetilde T_i = t_j\}, \mathbb{I}\{\widetilde T_i = t_j, \Delta_i=1\}~|~j \in [k], i \in [n]\}. which corresponds to the known information up to time tkt_k, so that Dk(a)D_k(a) is Fk\mathcal{F}_k-measurable but Nk(a)N_k(a) is Fk1\mathcal{F}_{k-1}-measurable. One can write that, for k2k\geqslant 2 E[I{T~i=tk,Δi=1,Ai=a}  Fk1]=E[I{T~i=tk,Δi=1,Ai=a}  I{T~itk},Ai]=I{Ai=a}E[I{Ti=tk,Citk}  I{Titk,Citk},Ai]=I{Ai=a}I{Citk}E[I{Ti=tk}  I{Titk},Ai]=I{T~itk,Ai=a}(1S(a)(tk)S(a)(tk1)),\begin{align*} \mathbb{E}[\mathbb{I}\{\widetilde T_i = t_k, \Delta_i = 1, A_i=a\}~|~\mathcal{F}_{k-1}] &= \mathbb{E}[\mathbb{I}\{\widetilde T_i = t_k, \Delta_i = 1, A_i=a\}~|~\mathbb{I}\{\widetilde T_i \geqslant t_k\},A_i] \\ &= \mathbb{I}\{A_i=a\} \mathbb{E}[\mathbb{I}\{T_i = t_k, C_i \geqslant t_k\}~|~\mathbb{I}\{T_i \geqslant t_k, C_i \geqslant t_k\}, A_i] \\ &= \mathbb{I}\{A_i=a\} \mathbb{I}\{C_i \geqslant t_k\} \mathbb{E}[\mathbb{I}\{T_i = t_k\} ~|~ \mathbb{I}\{T_i \geqslant t_k\}, A_i] \\ &= \mathbb{I}\{\widetilde T_i \geqslant t_k, A_i=a\}\left(1- \frac{S^{(a)}(t_{k})}{S^{(a)}(t_{k-1})}\right), \end{align*} where we used that Ti(a)T_i(a) is idependant from AiA_i by Assumption . We then easily derive from this that E[(1Dk(a)Nk(a))I{Nk(a)>0}|Fk1]=S(a)(tk)S(a)(tk1)I{Nk(a)>0}, \mathbb{E}\left[\left(1-\frac{D_k(a)}{N_k(a)} \right) \mathbb{I}\{N_k(a) >0\}\middle |\mathcal{F}_{k-1}\right] = \frac{S^{(a)}(t_k)}{S^{(a)}(t_{k-1})} \mathbb{I}\{N_k(a) >0\}, and then that E[S^KM(tkA=a)|Fk1]=S(a)(tk)S(a)(tk1)S^KM(tk1A=a)+O(I{Nk(a)=0}), \mathbb{E}\left[\widehat S_{\mathrm{KM}}(t_k|A=a)\middle |\mathcal{F}_{k-1}\right] = \frac{S^{(a)}(t_k)}{S^{(a)}(t_{k-1})} \widehat S_{\mathrm{KM}}(t_{k-1}|A=a) + O(\mathbb{I}\{N_k(a) =0\}),

By induction, we easily find that E[S^KM(tA=a)]=tjtS(a)(tj)S(a)(tj1)+O(tjtP(Nj(a)=0))=S(a)(t)+O(P(Nk(a)=0)) \mathbb{E}[\widehat S_{\mathrm{KM}}(t|A=a)] = \prod_{t_j \leqslant t} \frac{S^{(a)}(t_j)}{S^{(a)}(t_{j-1})} + O\left(\sum_{t_j \leqslant t}\mathbb{P}(N_j(a) =0)\right)= S^{(a)}(t) + O(\mathbb{P}(N_k(a) =0)) where tkt_k is the greatest time such that tktt_k \leqslant t.

Proof. (). The asymptotic normality is a mere consequence of the joint asymptotic normality of (Nk(a),Dk(a))tkt(N_k(a),D_k(a))_{t_k \leqslant t} with an application of the δ\delta-method. To access the asymptotic variance, notice that, using a similar reasonning as in the previous proof: E[(1Dk(a)/Nk(a))2Fk1]=E[1Dk(a)/Nk(a)Fk1(a)]2+1Nk(a)2Var(Dk(a)Fk1)=sk2(a)+sk(a)(1sk(a))Nk(a)I{Nk(a)>0}+O(I{Nk(a)=0}).\begin{align*} \mathbb{E}[(1-D_k(a)/N_k(a))^2|\mathcal{F}_{k-1}] &= \mathbb{E}[1-D_k(a)/N_k(a)|\mathcal{F}_{k-1}(a)]^2+\frac{1}{N_k(a)^2}\mathrm{Var}(D_k(a)|\mathcal{F}_{k-1}) \\ &= s_k^2(a)+ \frac{s_k(a)(1-s_k(a))}{N_k(a)}\mathbb{I}\{N_k(a) > 0\} + O(\mathbb{I}\{N_k(a) = 0\}). \end{align*} Now we know that Nk(a)=nrk(a)+nOP(1)N_k(a) = n r_k(a) + \sqrt{n} O_{\mathbb{P}}(1), with the OP(1)O_{\mathbb{P}}(1) having uniformly bounded moments. So that we deduce that E[(1Dk(a)/Nk(a))2Fk1]=sk2(a)+sk(a)(1sk(a))nrk(a)+1n3/2OP(1), \begin{aligned} \mathbb{E}[(1-D_k(a)/N_k(a))^2|\mathcal{F}_{k-1}] &= s_k^2(a)+ \frac{s_k(a)(1-s_k(a))}{n r_k(a)} + \frac{1}{n^{3/2}} O_{\mathbb{P}}(1), \end{aligned} where OP(1)O_{\mathbb{P}}(1) has again bounded moments. Using this identity, we find that nVarS^KM(tA=a)=n(ESKM(tA=a)2S(a)(t)2)=nS(a)(t)2(E[tkt(1+1n1sk(a)sk(a)rk(a)+1n3/2OP(1))]1). \begin{aligned} n \mathrm{Var}\widehat S_{\mathrm{KM}} (t|A=a) &= n \left(\mathbb{E}S_{\mathrm{KM}} (t|A=a)^2 - S^{(a)} (t)^2 \right) \\ &= n S^{(a)}(t)^2 \left(\mathbb{E}\left[\prod_{t_k \leqslant t} \left(1+\frac1n\frac{1-s_k(a)}{s_k(a) r_k(a)} + \frac{1}{n^{3/2}} O_{\mathbb{P}}(1) \right)\right]-1\right). \end{aligned} Expending the product and using that the OP(1)O_{\mathbb{P}}(1)’s have bounded moments, we finally deduce that E[tkt(1+1n1sk(a)sk(a)rk(a)+1n3/2OP(1))]1=1ntkt1sk(a)sk(a)rk(a)+1n3/2O(1), \begin{aligned} \mathbb{E}\left[\prod_{t_k \leqslant t} \left(1+\frac1n\frac{1-s_k(a)}{s_k(a) r_k(a)} + \frac{1}{n^{3/2}} O_{\mathbb{P}}(1) \right)\right]-1 = \frac1n \sum_{t_k \leqslant t}\frac{1-s_k(a)}{s_k(a) r_k(a)} + \frac{1}{n^{3/2}} O(1), \end{aligned}

nVarS^KM(tA=a)=VKM(tA=a)+O(n1/2), n\mathrm{Var}\widehat S_{\mathrm{KM}} (t|A=a) = V_{\mathrm{KM}}(t|A=a) + O(n^{-1/2}), which is what we wanted to show.

8.2 Proofs of

Proof. (). Assumption allows the tranformation to be well-defined. Furthermore, it holds E[TIPCWA=a,X]=E[Δτ×T~τG(T~τA,X)|A=a,X]=E[Δτ×T(a)τG(T(a)τA,X)|A=a,X]=E[E[I{T(a)τC}×T(a)τG(T(a)τA,X)|A,X,T(1)]|A=a,X]=E[T(a)τA=a,X]=E[T(a)τX].\begin{align*} E[T^*_{\mathrm{IPCW}}|A=a,X] &= E\left[\frac{ \Delta^\tau \times\widetilde T \wedge \tau}{G(\widetilde T \wedge \tau | A,X)} \middle | A = a,X\right] \\ &= E\left[\frac{ \Delta^\tau \times T(a) \wedge \tau}{G(T(a) \wedge \tau | A,X)} \middle | A = a,X\right]\\ &= E\left[ E\left[\frac{ \mathbb{I}\{T(a)\wedge \tau \leqslant C \} \times T(a) \wedge \tau}{G(T(a) \wedge \tau | A,X)}\middle| A, X,T(1) \right] \middle | A = a,X\right] \\ &= E\left[T(a) \wedge \tau|A=a, X\right] \\ &= E\left[T(a) \wedge \tau | X \right]. \end{align*} We used in the second equality that on the event {Δτ=1,A=a}\{\Delta^\tau=1, A=a\}, it holds T~τ=Tτ=T(a)τ\widetilde T \wedge \tau = T \wedge \tau = T(a) \wedge \tau. We used in the fourth equality that G(T(a)τA,X)=E[I{T(a)τC}X,T(a),A]G(T(a) \wedge \tau | A,X) = E[\mathbb{I}\{T(a) \wedge \tau \leqslant C\}|X,T(a),A] thanks to Assumption , and in the last one that AA is idependent from XX and T(a)T(a) thanks to Assumption .

Proof. (). Similarly to the computations done in the proof of , it is easy to show that E[ΔiτG(T~τX,A)I(T~i=tk,A=a)]=P(A=a)P(T(a)=tk), \mathbb{E}\left[\frac{\Delta_i^\tau}{G(\widetilde T \wedge\tau | X,A)} \mathbb{I}(\widetilde T_i = t_k, A=a)\right] = \mathbb{P}(A=a) \mathbb{P}(T(a) = t_k), and likewise that E[ΔiτG(T~τX,A)I(T~itk,A=a)]=P(A=a)P(T(a)tk), \mathbb{E}\left[\frac{\Delta_i^\tau}{G(\widetilde T \wedge\tau | X,A)} \mathbb{I}(\widetilde T_i \geqslant t_k, A=a)\right] = \mathbb{P}(A=a) \mathbb{P}(T(a) \geqslant t_k), so that S^IPCW(t)\widehat S_{\mathrm{IPCW}}(t) converges almost surely towards the product limit tkt(1P(T(a)=tk)P(T(a)tk))=S(a)(t), \prod_{t_k \leqslant t} \left(1-\frac{\mathbb{P}(T(a) = t_k)}{\mathbb{P}(T(a) \geqslant t_k)}\right) = S^{(a)}(t), yielding strong consistency. Asymptotic normality is straightforward.

Proof. (). There holds E[TBJX,A=a]=E[ΔτT(a)τ+(1Δτ)E[Tτ×I{Tτ>C}C,A,X]P(T>CC,A,X)|X,A=a]=E[ΔτT(a)τX]+E[I{Tτ>C}E[Tτ×I{Tτ>C}C,A,X]E[I{TτC}C,A,X]|X,A=a](). \begin{aligned} \mathbb{E}[T_{\mathrm{BJ}}^*|X,A=a] &= \mathbb{E}\left[\Delta^\tau T(a) \wedge \tau + (1-\Delta^\tau)\frac{\mathbb{E}[T \wedge \tau \times \mathbb{I}\{T \wedge \tau > C\}|C,A,X]}{\mathbb{P}(T > C|C,A,X)} \middle | X,A=a \right] \\ &= \mathbb{E}[\Delta^\tau T(a) \wedge \tau | X ] + \underbrace{\mathbb{E}\left[ \mathbb{I}\{T \wedge \tau > C\} \frac{\mathbb{E}[T \wedge \tau \times \mathbb{I}\{T \wedge \tau > C\}|C,A,X]}{\mathbb{E}[\mathbb{I}\{T \wedge \tau \geqslant C\}|C,A,X]}\middle | X,A=a \right]}_{(\star)}. \end{aligned} Now we easily see that conditionning wrt XX in the second term yields ()=E[E[Tτ×I{Tτ>C}C,A,X]|X,A=a]=E[(1Δτ)TτX,A=a]=E[(1Δτ)T(a)τX],\begin{align*} (\star) &= \mathbb{E}\left[ \mathbb{E}[T \wedge \tau \times \mathbb{I}\{T \wedge \tau > C\}|C,A,X] \middle | X,A=a \right] \\ &= \mathbb{E}[(1-\Delta^\tau) T \wedge \tau | X,A=a ] \\ &= \mathbb{E}[(1-\Delta^\tau) T(a) \wedge \tau | X], \end{align*} ending the proof.

Proof. (). We let T=Δτϕ1+(1Δτ)ϕ0T^* = \Delta^\tau \phi_1 + (1-\Delta^\tau)\phi_0 be a transformation of the form . There holds E[(TTτ)2]=E[Δτ(ϕ1Tτ)2]+E[(1Δτ)(ϕ0Tτ)2]. \mathbb{E}[(T^*-T \wedge \tau)^2] = \mathbb{E}[\Delta^\tau (\phi_1-T \wedge \tau)^2] + \mathbb{E}[(1-\Delta^\tau)(\phi_0-T \wedge \tau)^2]. The first term is non negative and is zero for the BJ transformation. Since ϕ0\phi_0 is a function of (T~,X,A)(\widetilde T, X, A) and that T~=C\widetilde T = C on {Δτ=0}\{\Delta^\tau = 0\}, the second term can be rewritten in the following way. We let RR be a generic quantity that does not depend on ϕ0\phi_0. E[(1Δτ)(ϕ0T)2]=E[I{Tτ>C}ϕ022I{Tτ>C}ϕ0Tτ]+R=E[P(Tτ>CC,A,X)ϕ022E[TτI{Tτ>C}C,A,X]ϕ0]+R=E[P(Tτ>CC,A,X)(ϕ0E[TτI{Tτ>C}C,A,X]P(Tτ>CC,A,X))2]+R.\begin{align*} \mathbb{E}&[(1-\Delta^\tau)(\phi_0-T)^2] = \mathbb{E}\left[\mathbb{I}\{T\wedge \tau > C\} \phi_0^2 - 2 \mathbb{I}\{T\wedge \tau > C\} \phi_0 T \wedge \tau\right] + R \\ &= \mathbb{E}\left[ \mathbb{P}(T\wedge \tau > C|C,A,X) \phi_0^2 - 2 \mathbb{E}[T\wedge \tau \mathbb{I}\{T\wedge \tau > C\}|C,A,X] \phi_0 \right] + R \\ &= \mathbb{E}\left[\mathbb{P}(T\wedge \tau > C|C,A,X) \left(\phi_0- \frac{\mathbb{E}[T\wedge \tau \mathbb{I}\{T\wedge \tau > C\}|C,A,X]}{\mathbb{P}(T\wedge \tau > C|C,A,X) }\right)^2\right] + R. \end{align*} Now the first term in the right hand side is always non-negative, and is zero for the BJ tranformation.

8.3 Proofs of

Proof. (). The fact that it is strongly consistent and asymptotically normal is again a simple application of the law of large number and of the δ\delta-method. We indeed find that, for tkτt_k \leqslant\tau E[1e(Xi)1{T~i=tk,Δi=1,Ai=1}]=E[Aie(Xi)1{Ti=tk,Citk}]=E[E[Aie(Xi)1{Ti=tk,Citk}|Xi]]=E[E[Aie(Xi)|Xi]P(Ti=tkXi)P(Citk)]=P(Ti=tk)P(Citk), \begin{aligned} \mathbb{E}\left[\frac{1}{e(X_i)} \mathbb{1}\{\widetilde T_i = t_k, \Delta_i = 1, A_i=1\}\right] &= \mathbb{E}\left[\frac{A_i}{e(X_i)} \mathbb{1}\{T_i = t_k, C_i \geqslant t_k\}\right] \\ &=\mathbb{E}\left[ \mathbb{E}\left[ \frac{A_i}{e(X_i)} \mathbb{1}\{T_i = t_k, C_i \geqslant t_k\} \middle |X_i \right]\right] \\ &= \mathbb{E}\left[ \mathbb{E}\left[\frac{A_i}{e(X_i)}\middle | X_i\right] \mathbb{P}(T_i = t_k |X_i) \mathbb{P}( C_i \geqslant t_k)\right] \\ &= \mathbb{P}(T_i = t_k) \mathbb{P}( C_i \geqslant t_k), \end{aligned} where we used that AA is independent from TT conditionnaly on XX, and that CC is independent from everything. Likewise, one would get that E[1e(Xi)1{T~itk,Ai=1}]==P(Titk)P(Citk). \begin{aligned} \mathbb{E}\left[\frac{1}{e(X_i)} \mathbb{1}\{\widetilde T_i \geqslant t_k, A_i=1\}\right] = &= \mathbb{P}(T_i \geqslant t_k) \mathbb{P}( C_i \geqslant t_k). \end{aligned} Similar computations hold for A=0A=0, ending the proof.

8.4 Proofs of

Proof. (). On the event {Δτ=1,A=1}\{\Delta^\tau=1, A=1\}, there holds T~τ=Tτ=T(1)τ\widetilde T \wedge \tau = T \wedge \tau = T(1) \wedge \tau, whence we find that, E[TIPCWX,A=1]=E[Ae(X)I{T(1)τC}G(T(1)τX,A)T(1)τ|X]=E[Ae(X)E[I{T(1)τC}G(T(1)τX,A)|X,A,T(1)]T(1)τ|X]=E[Ae(X)T(1)τ|X]=E[T(1)τ|X], \begin{aligned} \mathbb{E}[ T^*_{\mathrm{IPCW}}|X,A=1] &= \mathbb{E}\left[\frac{A}{e(X)} \frac{\mathbb{I}\{T(1) \wedge \tau \leqslant C\}}{G(T(1) \wedge \tau|X,A)} T(1) \wedge \tau\middle|X\right] \\ &= \mathbb{E}\left[ \frac{A}{e(X)} \mathbb{E}\left[\frac{\mathbb{I}\{T(1) \wedge \tau \leqslant C\}}{G(T(1) \wedge \tau|X,A)} \middle| X,A, T(1) \right]T(1) \wedge \tau\middle|X\right] \\ &=\mathbb{E}\left[ \frac{A}{e(X)} T(1) \wedge \tau\middle|X\right] \\ &=\mathbb{E}\left[T(1) \wedge \tau\middle|X\right], \end{aligned} and the same holds on the event A=0A=0.

Proof. (). By consistency of G^(X,A)\widehat G(\cdot|X,A) and e^\widehat e and by continuity, it suffices to look at the asymptotic behavior of the oracle estimator θIPTWIPCW=1ni=1n(Aie(Xi)1Ai1e(Xi))ΔiτG(T~iτAi,Xi)T~iτ. \theta^*_{\mathrm{IPTW-IPCW}} = \frac1n\sum_{i=1}^n \left(\frac{A_i}{ e(X_i)}-\frac{1-A_i}{1-e(X_i)} \right)\frac{\Delta_i^\tau}{G(\widetilde T_i \wedge \tau | A_i,X_i)} \widetilde T_i \wedge \tau. The later is converging almost towards its mean, which, following similar computations as in the previous proof, write E[(Ae(X)1A1e(X))ΔτG(T~τA,X)T~τ]=E[(Ae(X)1A1e(X))Tτ]=E[T(1)τ]E[T(0)τ]. \begin{aligned} \mathbb{E}\left[\left(\frac{A}{e(X)}-\frac{1-A}{1-e(X)} \right)\frac{\Delta^\tau}{G(\widetilde T \wedge \tau | A,X)} \widetilde T \wedge \tau\right] &= \mathbb{E}\left[\left(\frac{A}{e(X)}-\frac{1-A}{1-e(X)} \right) T \wedge \tau\right] \\ &= \mathbb{E}\left[T(1) \wedge \tau\right]-\mathbb{E}\left[T(0) \wedge \tau\right]. \end{aligned}

Proof. (). Asymptotic normality comes from a mere application of the δ\delta-method, while strong consistency follows from the law of large number and the follozing computations. Like for the proof of , one find, by first conditionning wrt X,A,T(a)X,A,T(a), that, for tkτt_k \leqslant\tau, E[(Ae(X)+1A1e(X))ΔτG(T~τA,X)I{T~=tk,A=a}]=P(T(a)=tk) \begin{aligned} \mathbb{E}\left[\left(\frac{A}{e(X)}+\frac{1-A}{1-e(X)} \right)\frac{\Delta^\tau}{G(\widetilde T \wedge \tau | A,X)} \mathbb{I}\{\widetilde T = t_k, A=a\}\right] = \mathbb{P}(T(a)=t_k) \end{aligned} and likewise that E[(Ae(X)+1A1e(X))ΔτG(T~τA,X)I{T~tk,A=a}]=P(T(a)tk) \begin{aligned} \mathbb{E}\left[\left(\frac{A}{e(X)}+\frac{1-A}{1-e(X)} \right)\frac{\Delta^\tau}{G(\widetilde T \wedge \tau | A,X)} \mathbb{I}\{\widetilde T \geqslant t_k, A=a\}\right] = \mathbb{P}(T(a)\geqslant t_k) \end{aligned} so that indeed SIPTWIPCW(tA=a)S^*_{\mathrm{IPTW-IPCW}} (t|A=a) converges almost surely towards S(a)(t)S^{(a)}(t).

Proof. (). We write E[TIPTWBJX,A=1]=E[Ae(X)Δτ×T~τ|X]+E[Ae(X)(1Δτ)QS(T~τA,X)|X]. \begin{aligned} \mathbb{E}[ T^*_{\mathrm{IPTW-BJ}}|X,A=1] &= \mathbb{E}\left[\frac{A}{e(X)} \Delta^\tau \times \widetilde T \wedge \tau\middle|X\right] + \mathbb{E}\left[\frac{A}{e(X)} (1-\Delta^\tau) Q_S(\widetilde T \wedge \tau|A,X) \middle| X\right]. \end{aligned} On the event {Δτ=1,A=1}\{\Delta^\tau=1, A=1\}, there holds T~τ=Tτ=T(1)τ\widetilde T \wedge \tau = T \wedge \tau = T(1) \wedge \tau, whence we find that the first term on the the right hand side is equal to E[Ae(X)Δτ×T~τ|X]=E[Ae(X)Δτ×T(1)τ|X]=E[Δτ×T(1)τ|X]. \begin{aligned} \mathbb{E}\left[\frac{A}{e(X)} \Delta^\tau \times \widetilde T \wedge \tau\middle|X\right] &= \mathbb{E}\left[\frac{A}{e(X)} \Delta^\tau \times T(1) \wedge \tau\middle|X\right] \\ &= \mathbb{E}\left[\Delta^\tau \times T(1) \wedge \tau\middle|X\right]. \end{aligned} For the second term in the right hand side, notice that on the event {Δτ=0,A=1}\{\Delta^\tau=0, A=1\}, there holds T~=C<T(1)τ\widetilde T = C < T(1) \wedge \tau, so that E[Ae(X)I{C<T(1)τ}E[T(1)τ×I{C<T(1)τ}X,A,C]P(C<T(1)τC,X,A)|X]=E[Ae(X)E[T(1)τ×I{C<T(1)τ}X,A,C]|X]=E[T(1)τ×I{C<T(1)τ}|X]=E[(1Δτ)T(1)τ|X], \begin{aligned} \mathbb{E}&\left[\frac{A}{e(X)} \mathbb{I}\{C < T(1) \wedge \tau\} \frac{\mathbb{E}[T(1) \wedge \tau \times \mathbb{I}\{C < T(1) \wedge \tau\}|X,A,C]}{\mathbb{P}(C < T(1) \wedge \tau | C,X,A)} \middle| X\right] \\ &= \mathbb{E}\left[\frac{A}{e(X)} \mathbb{E}[T(1) \wedge \tau \times \mathbb{I}\{C < T(1) \wedge \tau\}|X,A,C] \middle| X\right] \\ &= \mathbb{E}\left[T(1) \wedge \tau \times \mathbb{I}\{C < T(1) \wedge \tau\} \middle| X\right] \\ &= \mathbb{E}\left[(1-\Delta^\tau) T(1) \wedge \tau \middle| X\right], \end{aligned} and the sane holds on the event {A=0}\{A=0\}, which ends the proof.

Proof. (). By consistency of G^(X,A)\widehat G(\cdot|X,A) and e^\widehat e and by continuity, it suffices to look at the asymptotic behavior of the oracle estimator θIPTWBJ=1ni=1n(Aie(Xi)1Ai1e(Xi))(Δiτ×T~iτ+(1Δiτ)QS(T~iτAi,Xi)). \theta^*_{\mathrm{IPTW-BJ}} = \frac1n\sum_{i=1}^n \left(\frac{A_i}{ e(X_i)}-\frac{1-A_i}{1-e(X_i)} \right)\left(\Delta_i^\tau \times \widetilde T_i \wedge \tau + (1-\Delta_i^\tau) Q_S(\widetilde T_i \wedge \tau|A_i,X_i)\right). The later is converging almost towards its mean, which, following similar computations as in the previous proof, is simply equal to the difference in RMST.

Proof. (). We can write that ΔQR=Ap(X)(TDR(F,R)ν(X,1))+ν(X,1)(A)(1A1p(X)(TDR(F,R)ν(X,0))+ν(X,0)(B)). \Delta^*_{\mathrm{QR}} = \underbrace{\frac{A}{p(X)}(T^*_{\mathrm{DR}}(F,R)-\nu(X,1))+ \nu(X,1)}_{(\mathrm{A})} - \left(\underbrace{\frac{1-A}{1-p(X)}(T^*_{\mathrm{DR}}(F,R)-\nu(X,0))+ \nu(X,0)}_{(\mathrm{B})} \right). Focusing on term (A)(\mathrm{A}), we easily find that E[(A)X]=E[Ap(X)(TDR(F,R)ν(X,1))+ν(X,1)|X]=e(X)p(X)(μ(X,1)ν(X,1))+ν(X,1). \begin{aligned} \mathbb{E}[(\mathrm{A}) |X] &= \mathbb{E}\left[\frac{A}{p(X)}(T^*_{\mathrm{DR}}(F,R)-\nu(X,1))+ \nu(X,1) \middle|X\right] \\ &= \frac{e(X)}{p(X)}(\mu(X,1)-\nu(X,1)) + \nu(X,1). \end{aligned} Where we used that TDR(F,R)T^*_{\mathrm{DR}}(F,R) is a censoring unbiased transform when F=GF=G or R=SR=S. Now we see that if p(X)=e(X)p(X) = e(X), then E[(A)X]=μ(X,1)ν(X,1)+ν(X,1)=μ(X,1), \mathbb{E}[(\mathrm{A}) |X] = \mu(X,1)-\nu(X,1) + \nu(X,1) = \mu(X,1), and if ν(X,1)=μ(X,1)\nu(X,1) = \mu(X,1), then E[(A)X]=e(X)p(X)×0+μ(X,1)=μ(X,1). \mathbb{E}[(\mathrm{A}) |X] = \frac{e(X)}{p(X)} \times 0 + \mu(X,1) = \mu(X,1). Likewise, we would show that E[(B)X]=μ(X,0)\mathbb{E}[(\mathrm{B}) |X] = \mu(X,0) under either alternative, ending the proof.

9 Appendix B: Descriptive statistics

9.1 RCT

The summary by group of treatment of the generated (observed and unobserved) RCT with independent censoring is displayed below:

[1] "Descriptive statistics for group A=0:   1007"
       X1                X2                X3                X4         
 Min.   :-1.9738   Min.   :-2.0112   Min.   :-4.1033   Min.   :-2.3833  
 1st Qu.: 0.3535   1st Qu.: 0.3657   1st Qu.:-1.7231   1st Qu.: 0.3723  
 Median : 1.0555   Median : 1.0674   Median :-0.9992   Median : 1.0659  
 Mean   : 1.0268   Mean   : 1.0284   Mean   :-1.0232   Mean   : 1.0070  
 3rd Qu.: 1.6908   3rd Qu.: 1.6983   3rd Qu.:-0.3244   3rd Qu.: 1.6702  
 Max.   : 4.2480   Max.   : 3.8138   Max.   : 2.1775   Max.   : 4.1407  
       C                   T1               T0               status      
 Min.   :  0.04363   Min.   : 10.00   Min.   :  0.0024   Min.   :0.0000  
 1st Qu.:  9.28879   1st Qu.: 12.57   1st Qu.:  2.5734   1st Qu.:0.0000  
 Median : 23.92405   Median : 18.51   Median :  8.5114   Median :1.0000  
 Mean   : 33.07905   Mean   : 30.67   Mean   : 20.6739   Mean   :0.6872  
 3rd Qu.: 46.57882   3rd Qu.: 32.86   3rd Qu.: 22.8641   3rd Qu.:1.0000  
 Max.   :219.45849   Max.   :494.85   Max.   :484.8537   Max.   :1.0000  
     T_tild         
 Min.   :  0.00245  
 1st Qu.:  1.98397  
 Median :  5.72068  
 Mean   : 10.41911  
 3rd Qu.: 13.18530  
 Max.   :138.88032  
[1] "Descriptive statistics for group A=1:   993"
       X1                X2                X3                X4         
 Min.   :-2.1498   Min.   :-1.9475   Min.   :-3.6996   Min.   :-1.9002  
 1st Qu.: 0.3748   1st Qu.: 0.3071   1st Qu.:-1.7178   1st Qu.: 0.3025  
 Median : 1.0780   Median : 0.9647   Median :-1.0180   Median : 0.9317  
 Mean   : 1.0554   Mean   : 0.9807   Mean   :-1.0328   Mean   : 0.9813  
 3rd Qu.: 1.7171   3rd Qu.: 1.6588   3rd Qu.:-0.3523   3rd Qu.: 1.6677  
 Max.   : 4.0171   Max.   : 3.5670   Max.   : 2.1602   Max.   : 4.4481  
       C                  T1               T0               status      
 Min.   :  0.1387   Min.   : 10.02   Min.   :  0.0174   Min.   :0.0000  
 1st Qu.:  9.4540   1st Qu.: 12.78   1st Qu.:  2.7849   1st Qu.:0.0000  
 Median : 23.9043   Median : 18.48   Median :  8.4838   Median :1.0000  
 Mean   : 35.6632   Mean   : 29.87   Mean   : 19.8730   Mean   :0.5277  
 3rd Qu.: 49.1845   3rd Qu.: 31.60   3rd Qu.: 21.6038   3rd Qu.:1.0000  
 Max.   :273.1988   Max.   :510.55   Max.   :500.5543   Max.   :1.0000  
     T_tild       
 Min.   : 0.1387  
 1st Qu.: 9.4540  
 Median :13.3066  
 Mean   :16.5426  
 3rd Qu.:19.8679  
 Max.   :99.1196  

Covariates are balanced between groups, and censoring times are the same (independent censoring). However, there are more censored observations in the treated group (A=1A=1) than in the control group (A=0A=0). This is due to the higher instantaneous hazard of the event in the treated group (with T1=T0+10T_1=T_0+10) compared to the constant hazard of censoring.

The summary of the generated (observed and unobserved) RCT with conditionally independent censoring stratified by treatment is displayed below.

[1] "Descriptive statistics for group A=0:   1001"
       X1                X2                X3                X4         
 Min.   :-2.3192   Min.   :-2.4200   Min.   :-3.8789   Min.   :-2.4102  
 1st Qu.: 0.3197   1st Qu.: 0.3678   1st Qu.:-1.6743   1st Qu.: 0.3906  
 Median : 0.9943   Median : 1.0608   Median :-1.0282   Median : 1.0467  
 Mean   : 0.9544   Mean   : 1.0297   Mean   :-1.0356   Mean   : 1.0485  
 3rd Qu.: 1.6353   3rd Qu.: 1.6961   3rd Qu.:-0.3729   3rd Qu.: 1.7529  
 Max.   : 3.8030   Max.   : 3.8728   Max.   : 2.0881   Max.   : 5.3492  
       C                   T1               T0               status      
 Min.   :  0.00073   Min.   : 10.00   Min.   :  0.0043   Min.   :0.0000  
 1st Qu.:  2.21038   1st Qu.: 12.61   1st Qu.:  2.6108   1st Qu.:0.0000  
 Median :  6.47704   Median : 17.95   Median :  7.9450   Median :0.0000  
 Mean   : 14.38359   Mean   : 31.05   Mean   : 21.0531   Mean   :0.4505  
 3rd Qu.: 16.60452   3rd Qu.: 32.74   3rd Qu.: 22.7444   3rd Qu.:1.0000  
 Max.   :233.44943   Max.   :780.35   Max.   :770.3495   Max.   :1.0000  
   status_tau         T_tild         
 Min.   :0.0000   Min.   :  0.00073  
 1st Qu.:0.0000   1st Qu.:  1.19784  
 Median :0.0000   Median :  3.46282  
 Mean   :0.4985   Mean   :  8.45984  
 3rd Qu.:1.0000   3rd Qu.:  8.95527  
 Max.   :1.0000   Max.   :233.44943  
[1] "Descriptive statistics for group A=1:   999"
       X1                X2                X3                X4         
 Min.   :-1.8336   Min.   :-2.2766   Min.   :-4.2739   Min.   :-2.4799  
 1st Qu.: 0.3069   1st Qu.: 0.2735   1st Qu.:-1.5768   1st Qu.: 0.3124  
 Median : 0.9697   Median : 0.9640   Median :-1.0159   Median : 1.0052  
 Mean   : 0.9758   Mean   : 0.9462   Mean   :-0.9749   Mean   : 0.9581  
 3rd Qu.: 1.6410   3rd Qu.: 1.6689   3rd Qu.:-0.3244   3rd Qu.: 1.6111  
 Max.   : 3.8633   Max.   : 4.4451   Max.   : 2.2216   Max.   : 3.9818  
       C                   T1               T0              status      
 Min.   :  0.00565   Min.   : 10.01   Min.   :  0.009   Min.   :0.0000  
 1st Qu.:  3.08859   1st Qu.: 12.95   1st Qu.:  2.955   1st Qu.:0.0000  
 Median :  8.41709   Median : 19.69   Median :  9.686   Median :0.0000  
 Mean   : 17.22128   Mean   : 36.11   Mean   : 26.111   Mean   :0.2272  
 3rd Qu.: 21.75706   3rd Qu.: 38.01   3rd Qu.: 28.014   3rd Qu.:0.0000  
 Max.   :158.59115   Max.   :635.73   Max.   :625.734   Max.   :1.0000  
   status_tau         T_tild         
 Min.   :0.0000   Min.   :  0.00565  
 1st Qu.:0.0000   1st Qu.:  3.08859  
 Median :0.0000   Median :  8.41709  
 Mean   :0.2993   Mean   : 12.33835  
 3rd Qu.:1.0000   3rd Qu.: 15.63307  
 Max.   :1.0000   Max.   :143.60610  

Covariates are balanced between the two groups. However, censoring times differ between groups due to conditionally independent censoring based on covariates and treatment group. Indeed, the distribution of CC is different between the treatment group.

9.2 Observational study with linear relationship

The summary of the generated (observed and unobserved) data set observational study with independent censoring stratified by treatment is displayed below to enhance the difference with the other scenario.

[1] "Descriptive statistics for group A=0:   1112"
       X1                X2                X3                 X4        
 Min.   :-1.8169   Min.   :-1.8541   Min.   :-2.90696   Min.   :-2.477  
 1st Qu.: 0.6277   1st Qu.: 0.5975   1st Qu.:-1.09781   1st Qu.: 0.490  
 Median : 1.2098   Median : 1.2626   Median :-0.54312   Median : 1.129  
 Mean   : 1.2355   Mean   : 1.2654   Mean   :-0.53017   Mean   : 1.123  
 3rd Qu.: 1.8804   3rd Qu.: 1.9466   3rd Qu.: 0.01932   3rd Qu.: 1.776  
 Max.   : 3.9482   Max.   : 4.5541   Max.   : 2.03204   Max.   : 4.380  
       C                 T1               T0               status      
 Min.   :  0.070   Min.   : 10.00   Min.   :  0.0048   Min.   :0.0000  
 1st Qu.:  9.562   1st Qu.: 12.69   1st Qu.:  2.6913   1st Qu.:0.0000  
 Median : 23.028   Median : 17.39   Median :  7.3851   Median :1.0000  
 Mean   : 34.134   Mean   : 29.15   Mean   : 19.1463   Mean   :0.7014  
 3rd Qu.: 47.244   3rd Qu.: 29.66   3rd Qu.: 19.6626   3rd Qu.:1.0000  
 Max.   :341.223   Max.   :704.21   Max.   :694.2068   Max.   :1.0000  
     T_tild        
 Min.   : 0.00479  
 1st Qu.: 2.04822  
 Median : 5.37581  
 Mean   : 9.31404  
 3rd Qu.:11.92959  
 Max.   :99.51749  
[1] "Descriptive statistics for group A=1:   888"
       X1                X2                X3                X4          
 Min.   :-2.6164   Min.   :-2.3108   Min.   :-4.2505   Min.   :-2.60604  
 1st Qu.: 0.1571   1st Qu.: 0.0520   1st Qu.:-2.0830   1st Qu.: 0.08458  
 Median : 0.7372   Median : 0.6735   Median :-1.5464   Median : 0.77157  
 Mean   : 0.7619   Mean   : 0.7002   Mean   :-1.5674   Mean   : 0.75323  
 3rd Qu.: 1.3853   3rd Qu.: 1.3397   3rd Qu.:-0.9975   3rd Qu.: 1.37424  
 Max.   : 3.9018   Max.   : 3.9174   Max.   : 1.0010   Max.   : 4.71048  
       C                   T1               T0               status     
 Min.   :  0.04914   Min.   : 10.01   Min.   :  0.0057   Min.   :0.000  
 1st Qu.: 10.09384   1st Qu.: 13.12   1st Qu.:  3.1180   1st Qu.:0.000  
 Median : 22.87000   Median : 20.25   Median : 10.2451   Median :0.000  
 Mean   : 33.19084   Mean   : 35.79   Mean   : 25.7921   Mean   :0.482  
 3rd Qu.: 46.52774   3rd Qu.: 36.52   3rd Qu.: 26.5228   3rd Qu.:1.000  
 Max.   :271.80285   Max.   :649.50   Max.   :639.5037   Max.   :1.000  
     T_tild         
 Min.   :  0.04914  
 1st Qu.: 10.02109  
 Median : 13.69586  
 Mean   : 17.16939  
 3rd Qu.: 21.85755  
 Max.   :142.50649  

The covariates between the two groups of treatment are unbalanced because of dependent treatment assignation. The mean of X1X1, X2X2, X3X3 and X4X4 is bigger in the control group than in the treated group. The censoring times have the same distribution (independent censoring). There are more censored observation in the treated group (A=1) than in the control group (A=0) for the same reason than in the RCT scenario.

The summary of the generated (observed and unobserved) data set observational study with conditionally independent censoring stratified by treatment is displayed below.

[1] "Descriptive statistics for group A=0:   1103"
       X1                X2                X3                X4         
 Min.   :-1.4656   Min.   :-1.5525   Min.   :-2.7027   Min.   :-2.3218  
 1st Qu.: 0.5603   1st Qu.: 0.5672   1st Qu.:-1.1180   1st Qu.: 0.5605  
 Median : 1.1450   Median : 1.2201   Median :-0.5368   Median : 1.2280  
 Mean   : 1.2034   Mean   : 1.2214   Mean   :-0.5475   Mean   : 1.2243  
 3rd Qu.: 1.8183   3rd Qu.: 1.8923   3rd Qu.:-0.0287   3rd Qu.: 1.8685  
 Max.   : 4.2187   Max.   : 4.1337   Max.   : 2.2278   Max.   : 4.4916  
       C                   T1               T0               status      
 Min.   :  0.00548   Min.   : 10.01   Min.   :  0.0134   Min.   :0.0000  
 1st Qu.:  2.09723   1st Qu.: 12.54   1st Qu.:  2.5405   1st Qu.:0.0000  
 Median :  6.45280   Median : 17.48   Median :  7.4755   Median :0.0000  
 Mean   : 13.25047   Mean   : 29.83   Mean   : 19.8302   Mean   :0.4442  
 3rd Qu.: 15.60958   3rd Qu.: 29.82   3rd Qu.: 19.8173   3rd Qu.:1.0000  
 Max.   :218.33626   Max.   :770.32   Max.   :760.3168   Max.   :1.0000  
   status_tau         T_obs                e            
 Min.   :0.0000   Min.   : 0.00548   Min.   :0.0000053  
 1st Qu.:0.0000   1st Qu.: 1.09858   1st Qu.:0.0235804  
 Median :0.0000   Median : 3.24993   Median :0.1128748  
 Mean   :0.4796   Mean   : 6.52850   Mean   :0.2069473  
 3rd Qu.:1.0000   3rd Qu.: 7.85688   3rd Qu.:0.3272944  
 Max.   :1.0000   Max.   :83.82905   Max.   :0.9881260  
[1] "Descriptive statistics for group A=1:   897"
       X1                 X2                 X3               X4          
 Min.   :-2.51633   Min.   :-3.22213   Min.   :-4.731   Min.   :-2.16630  
 1st Qu.: 0.02828   1st Qu.: 0.05441   1st Qu.:-2.170   1st Qu.: 0.08423  
 Median : 0.71911   Median : 0.68804   Median :-1.597   Median : 0.73754  
 Mean   : 0.70282   Mean   : 0.69876   Mean   :-1.634   Mean   : 0.75928  
 3rd Qu.: 1.35624   3rd Qu.: 1.39794   3rd Qu.:-1.053   3rd Qu.: 1.43619  
 Max.   : 3.81397   Max.   : 3.96002   Max.   : 0.666   Max.   : 3.69206  
       C                 T1                T0                status      
 Min.   :  0.003   Min.   :  10.01   Min.   :   0.0149   Min.   :0.0000  
 1st Qu.:  2.735   1st Qu.:  12.79   1st Qu.:   2.7914   1st Qu.:0.0000  
 Median :  7.839   Median :  19.20   Median :   9.1976   Median :0.0000  
 Mean   : 17.513   Mean   :  37.26   Mean   :  27.2577   Mean   :0.2062  
 3rd Qu.: 18.471   3rd Qu.:  35.39   3rd Qu.:  25.3856   3rd Qu.:0.0000  
 Max.   :821.894   Max.   :1080.05   Max.   :1070.0488   Max.   :1.0000  
   status_tau         T_obs               e          
 Min.   :0.0000   Min.   :  0.003   Min.   :0.01368  
 1st Qu.:0.0000   1st Qu.:  2.735   1st Qu.:0.60675  
 Median :0.0000   Median :  7.839   Median :0.84285  
 Mean   :0.2653   Mean   : 12.026   Mean   :0.75461  
 3rd Qu.:1.0000   3rd Qu.: 14.126   3rd Qu.:0.95929  
 Max.   :1.0000   Max.   :238.172   Max.   :0.99985  

The covariates between the two groups are unbalanced. The censoring time is dependent on the covariates also, as the covariates are unbalanced between the two groups, the censoring time is also unbalanced. In particular, the mean of X1X1, X2X2, X3X3 and X4X4 is bigger in the control group than in the treated group. Also, the number of events is bigger in the control than treated group.

9.3 Observational study with interaction

       X1                X2                X3                 X4         
 Min.   :-2.6711   Min.   :-3.3457   Min.   :-2.56028   Min.   :-2.4285  
 1st Qu.:-0.1776   1st Qu.:-0.1639   1st Qu.: 0.04785   1st Qu.:-0.2016  
 Median : 0.5091   Median : 0.5300   Median : 0.70709   Median : 0.5081  
 Mean   : 0.5249   Mean   : 0.5258   Mean   : 0.70105   Mean   : 0.4771  
 3rd Qu.: 1.2484   3rd Qu.: 1.2120   3rd Qu.: 1.34594   3rd Qu.: 1.1317  
 Max.   : 3.8827   Max.   : 4.0947   Max.   : 3.96683   Max.   : 3.9811  
      tau            A                T0                 T1        
 Min.   :0.5   Min.   :0.0000   Min.   : 0.00000   Min.   : 1.000  
 1st Qu.:0.5   1st Qu.:0.0000   1st Qu.: 0.03313   1st Qu.: 1.033  
 Median :0.5   Median :1.0000   Median : 0.18182   Median : 1.182  
 Mean   :0.5   Mean   :0.5815   Mean   : 0.74724   Mean   : 1.747  
 3rd Qu.:0.5   3rd Qu.:1.0000   3rd Qu.: 0.68860   3rd Qu.: 1.689  
 Max.   :0.5   Max.   :1.0000   Max.   :26.55400   Max.   :27.554  
       C                T_obs           T_obs_tau           status     
 Min.   :  0.0000   Min.   :0.00000   Min.   :0.00000   Min.   :0.000  
 1st Qu.:  0.1364   1st Qu.:0.05019   1st Qu.:0.05019   1st Qu.:0.000  
 Median :  0.5600   Median :0.24031   Median :0.24031   Median :0.000  
 Mean   :  4.9022   Mean   :0.51190   Mean   :0.26715   Mean   :0.437  
 3rd Qu.:  1.8264   3rd Qu.:0.86416   3rd Qu.:0.50000   3rd Qu.:1.000  
 Max.   :880.3211   Max.   :6.82142   Max.   :0.50000   Max.   :1.000  
   status_tau     censor.status         e            
 Min.   :0.0000   Min.   :0.000   Min.   :0.0000279  
 1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.3991987  
 Median :1.0000   Median :1.000   Median :0.5862769  
 Mean   :0.6085   Mean   :0.563   Mean   :0.5814792  
 3rd Qu.:1.0000   3rd Qu.:1.000   3rd Qu.:0.8020782  
 Max.   :1.0000   Max.   :1.000   Max.   :0.9999456  

The summary of the generated (observed and unobserved) data set complex observational study (conditionally independent censoring) stratified by treatment is displayed below.

[1] "Descriptive statistics for group A=0:   837"
       X1                X2                 X3                C           
 Min.   :-2.3131   Min.   :-3.34574   Min.   :-2.1268   Min.   :  0.0000  
 1st Qu.:-0.1246   1st Qu.: 0.01702   1st Qu.:-0.1488   1st Qu.:  0.1712  
 Median : 0.5137   Median : 0.69913   Median : 0.4301   Median :  0.6670  
 Mean   : 0.5486   Mean   : 0.71091   Mean   : 0.4039   Mean   :  4.8886  
 3rd Qu.: 1.2801   3rd Qu.: 1.40030   3rd Qu.: 0.9711   3rd Qu.:  2.1632  
 Max.   : 3.8827   Max.   : 4.09470   Max.   : 2.7399   Max.   :880.3211  
       T1               T0               status          T_obs        
 Min.   : 1.000   Min.   : 0.00000   Min.   :0.000   Min.   :0.00000  
 1st Qu.: 1.027   1st Qu.: 0.02666   1st Qu.:0.000   1st Qu.:0.01878  
 Median : 1.157   Median : 0.15670   Median :1.000   Median :0.09719  
 Mean   : 1.614   Mean   : 0.61354   Mean   :0.687   Mean   :0.28654  
 3rd Qu.: 1.599   3rd Qu.: 0.59850   3rd Qu.:1.000   3rd Qu.:0.33433  
 Max.   :22.331   Max.   :21.33056   Max.   :1.000   Max.   :6.82142  
   status_tau           e            
 Min.   :0.0000   Min.   :0.0000279  
 1st Qu.:1.0000   1st Qu.:0.2268223  
 Median :1.0000   Median :0.4239879  
 Mean   :0.7622   Mean   :0.4100799  
 3rd Qu.:1.0000   3rd Qu.:0.5709441  
 Max.   :1.0000   Max.   :0.9924558  
[1] "Descriptive statistics for group A=1:   1163"
       X1                X2                X3                C           
 Min.   :-2.6711   Min.   :-2.5974   Min.   :-2.5603   Min.   :  0.0000  
 1st Qu.:-0.2185   1st Qu.:-0.2764   1st Qu.: 0.2472   1st Qu.:  0.1165  
 Median : 0.5061   Median : 0.4279   Median : 0.9428   Median :  0.4950  
 Mean   : 0.5079   Mean   : 0.3926   Mean   : 0.9149   Mean   :  4.9120  
 3rd Qu.: 1.2221   3rd Qu.: 1.0453   3rd Qu.: 1.6212   3rd Qu.:  1.6321  
 Max.   : 3.4180   Max.   : 3.2768   Max.   : 3.9668   Max.   :843.0513  
       T1               T0               status           T_obs        
 Min.   : 1.000   Min.   : 0.00000   Min.   :0.0000   Min.   :0.00003  
 1st Qu.: 1.039   1st Qu.: 0.03927   1st Qu.:0.0000   1st Qu.:0.11651  
 Median : 1.206   Median : 0.20588   Median :0.0000   Median :0.49504  
 Mean   : 1.843   Mean   : 0.84346   Mean   :0.2571   Mean   :0.67409  
 3rd Qu.: 1.785   3rd Qu.: 0.78488   3rd Qu.:1.0000   3rd Qu.:1.03678  
 Max.   :27.554   Max.   :26.55400   Max.   :1.0000   Max.   :6.34076  
   status_tau           e          
 Min.   :0.0000   Min.   :0.04422  
 1st Qu.:0.0000   1st Qu.:0.54825  
 Median :0.0000   Median :0.72830  
 Mean   :0.4979   Mean   :0.70483  
 3rd Qu.:1.0000   3rd Qu.:0.89680  
 Max.   :1.0000   Max.   :0.99995  

The observations are the same than the previous scenario: The covariates and the censoring time between the two groups are unbalanced. To be able to evaluate the estimators, we need to know the true θRMST\theta_{\mathrm{RMST}} at time τ\tau.

Reuse