Survival analysis is used when we wish to study the occurrence of some event in a population of subjects and the time until the event of interest. This time is called survival time or failure time. Survival analysis is often used in industrial life-testing experiments and in clinical follow-up studies. Examples of application include: time until failure of a light bulb, time until occurrence of an anomaly in an electronic circuit, time until relapse of cancer, time until pregnancy. In the literature we find many different modeling approaches to survival analysis. Conventional parametric models may involve too strict assumptions on the distributions of failure times and on the form of the influence of the system features on the survival time, assumptions which usually extremely simplify the experimental evidence, particularly in the case of medical data (Cox & Oakes, 1984). In contrast, semiparametric models do not make assumptions on the distributions of failures, but instead make assumptions on how the system features influence the survival time (the usual assumption is the proportionality of hazards); furthermore, these models do not usually allow for direct estimation of survival times. Finally, non-parametric models usually only allow for a qualitative description of the data on the population level. Neural networks have recently been used for survival analysis; for a survey on the current use of neural networks, and some previous attempts at neural network survival modeling we refer to (Bakker & Heskes, 1999), (Biganzoli et al., 1998), (Eleuteri et al., 2003), (Lisboa et al., 2003), (Neal, 2001), (Ripley & Ripley, 1998), (Schwarzer et al. 2000). Neural networks provide efficient parametric estimates of survival functions, and, in principle, the capability to give personalised survival predictions. In a medical context, such information is valuable both to clinicians and patients. It helps clinicians to choose appropriate treatment and plan follow-up efficiently. Patients at high risk could be followed up more frequently than those at lower risk in order to channel valuable resources to those who need them most. For patients, obtaining information about their prognosis is also extremely valuable in terms of planning their lives and providing care for their dependents. In this article we describe a novel neural network model aimed at solving the survival analysis problem in a continuous time setting; we provide details about the Bayesian approach to modeling, and a sample application on real data is shown.
Let T denote an absolutely continuous positive random variable, with distribution function P, representing the time of occurrence of an event. The survival function, S(t), is defined as:S(t)=Pr(T>t),that is, the probability of surviving beyond time t. We shall generally assume that the survival function also depends on a set of covariates, represented by the vector x (which can itself be assumed to be a random variable). An important function related to the survival function is the hazard rate (Cox & Oakes, 1984), defined as:hr (t) = P’(t)/S(t)where P’ is the density associated to P. The hazard rate can be interpreted as the instantaneous force of mortality.
In many survival analysis applications we do not directly observe realisations of the random variable T; therefore we must deal with a missing data problem. The most common form of missingness is right censoring, i.e., we observe realisations of the random variable:Z=min(T,C),where C is a random variable whose distribution is usually unknown. We shall use a censoring indicator d to denote whether we have observed an event (d=1) or not (d=0). It can be shown that inference does not depend on the distribution of C (Cox & Oakes, 1984).
Key Terms in this Chapter
Bayesian Inference: Inference rules which are based on application of Bayes’ theorem and the basic laws of probability calculus.
Random Variable: Measurable function from a sample space to the measurable space of possible values of the variable.
Survival Analysis: Statistical analysis of data represented in terms of realisation of point events. In medical applications usually the point event is the death of an individual, or recurrence of a disease.
Hyperparameter: Parameter in a hierarchical problem formulation. In Bayesian inference, the parameters of a prior.
Neural Networks: A graphical representation of a nonlinear function. Usually represented as a directed acyclic graph. Neural networks can be trained to find nonlinear relationships in data, and are used in applications such as robotics, speech recognition, signal processing or medical diagnosis.
Prior Distribution: Probabilistic representation of prior knowledge.
Censoring: Mechanism which precludes observation of an event. A form of missing data.
Posterior Distribution: Probabilistic representation of knowledge, resulting from combination of prior knowledge and observation of data.