Introduce to Survival Analysis

Survival Function and Hazard Rate

Survival Function

Definition

Let  be a non-negative Random Variable representing survival time with PDF  and CDF . The survival function of the random variable is defined as

The survival function is a function that gives the probability that a patient, device, or other object of interest will survive past a certain time.

Link to original

Hazard Function

Definition

Hazard Function

Let  be a non-negative Random Variable representing survival time with PDF  and CDF . The hazard function of the random variable is defined as where is the Survival Function.

The hazard function refers to the rate of occurring event at a given time .

Cumulative Hazard Function

The hazard function can alternatively be represented in terms of the cumulative hazard function, defined as

Facts

The Survival Function , the Cumulative Hazard Function, the density (PDF) , the Hazard Function, and the distribution function (CDF) of survival time are related through

Link to original

Types of Censoring

Right Censoring

Kinds

Type 1 Censoring

Definition

Type 1 Censoring

when  (’s are constant)

Suppose that  are i.i.d. random variables represent survival times with CDF , and  are the censoring times. And let be the censoring indicator. We observe where .

In a type 1 censoring setting, the censored times are fixed constants, not random variables.

The PDF of the observation is derived as

Likelihood of Type 1 Censoring Data

The Likelihood Function of type 1 censoring data is defined as

Link to original

Type 2 Censoring

Definition

Type 2 Censoring

when

Suppose that  are i.i.d. random variables represent survival times with CDF , and  are the censoring times. And let be the censoring indicator. We observe where .

In a type 2 censoring setting, we observe first  out of  experiment . In other words, for the order statistics of , we only observe . Where  are not constants, but random variables.

Likelihood of Type 2 Censoring Data

The likelihood function of type 2 censored data can be computed using the same equation used for type 1 censored data but computing the joint-PDF of the Order Statistic is easier.

The joint PDF of  is derived as

Link to original

Random Censoring

Definition

Random Censoring

Suppose that  are i.i.d. random variables represent survival times with CDF , and  are the censoring times, which can be both random variable or constant. And let be the censoring indicator.

In a random censoring setting, we only observe where , and the censoring times are  are i.i.d. Random Variable follows PDF  and CDF .

The PDF of the observation is derived as where  is the parameter of interest, and is the: nuisance parameter

Likelihood of Random Censoring Data

The Likelihood Function of random censored data is defined as where is a constant.

Link to original

Link to original

Left Censoring

Definition

Suppose that  are i.i.d. random variables represent survival times, and  are the censoring times, which can be both random variable or constant. And let be the censoring indicator.

In a left censoring setting, we observe where . In other words the event has already occurred before it becomes the subject of observed.

Link to original

Interval-Censored Data

Definition

Suppose that  are i.i.d. random variables represent survival times. Interval censored data is given as an interval, not an exact point of time. We only observe the interval  that includes Interval-censored data is divided into four cases.

Case 1 Interval-Censored Data (Current Status Data)

Data is given the form of or , where is a fixed time point.

Case 2 Interval-Censored Data

Data is given the form of where .

Double Censored Data

Data is given the form of where .

If , then it is right-censored data, if , then it is left-censored data, and if , then it is uncensored data.

Panel Data

Observations are made at discrete time points. The period between these observations can be viewed as an interval.

Link to original

Mean Imputation Method

Definition

Mean imputation method substitute the given interval of case 2 interval-censored data with the mean of the interval if the interval is finite. If , the data is substituted with the left value and treated as right-censored data.

Link to original

Parametric Models

Distributions

Survival Models based on Distributions

Kinds

Exponential Distribution

Exponential Distribution assumes that the Hazard Function is a constant regardless of time. The distribution is memoryless .

The PDF of survival time follows Exponential Distribution The Hazard Function The Survival Function

Gamma Distribution

The PDF of survival time follows Gamma Distribution The Hazard Function and Survival Function don’t have closed form expression.

Weibull Distribution

Weibull Distribution uses additional parameter than Exponential Distribution to control the shape of the Hazard Function. If then the hazard function is increasing, if then it is constant, and if then it is decreasing.

The PDF of survival time follows Weibull Distribution where is a scale parameter and is a shape parameter.

The Hazard Function The Survival Function

Rayleigh Distribution

The PDF of survival time follows Rayleigh distribution The Hazard Function The Survival Function

Log-Normal Distribution

We assume that the survival time follows a Log-Normal Distribution or . The Hazard Function of the distribution is hump-shaped.

The PDF of survival time follows Log-Normal Distribution The Survival Function where is the CDF of standard normal distribution

The Hazard Function doesn’t have closed form expression.

Gompertz Distribution

The Hazard Function where and

Gompertz-Makeham Distribution

The Hazard Function

Link to original

Survival Models based on Log-Lifetime

Definition

Assume that the log survival time can be modeled with Log-Linear Model where is the Scale Parameter, is the Location Parameter, and is some well-known distribution.

Kinds

Standard Gumbel Distribution

If follows the standard Gumbel distribution (extreme value distribution), then If and , then follows a Weibull Distribution.

Normal Distribution

If follows the standard normal distribution, then follows log-normal distribution where and

Logistic Distribution

If follows a Logistic Distribution, then follows Log-Logistic Distribution where and .

Gumbel Distribution

If follows a Gumbel Distribution (generalized extreme value distribution), then where and .

Special Cases

If then If then If then If then

Exponential-F Distribution

If follows a F-Distribution , then follows a generalized F-distribution.

Special Cases

If then follows Log-Logistic Distribution If then follows generalized gamma distribution If then follows Log-Normal Distribution

Link to original

Survival Models with Surviving Fractions

Definition

Consider an event such as death from a specific cause on the incidence of a particular disease.
Define a binary variable  where  indicates that an individual will experience the event eventually, and  indicates that the individual will never experience the event.
Let  denote the time of occurrence of the event with PDF and CDF ,  is defined only when , and  and are independent. We observe where and .

Suppose that the probability of is , .

No Covariate Case

The likelihood is defined as 

With Covariates Case

The likelihood is defined as

Link to original

Nonparametric Methods: One Sample

Life Tables

Empirical Survival Function

Definition

Let be i.i.d. random variables with the Survival Function , then the empirical survival function is defined as Since is a Bernoulli trial,

Link to original

Reduced Sample Estimator

Definition

Notations

Assume intervals have equal length and let the notations:

  • : the -th interval where
  • : the number of individuals alive at the beginning of 
  • : the number of deaths during 
  • : the number of individuals censored during 
  • : where is the survival time.

Estimation of Reduced Sample Estimator

Reduced sample method estimates the Survival Function as where  is the number of deaths in the interval , and is the number of uncensored data in 

The drawback of the reduced sampling method is that it ignore the information contained in censored observations, therefore it usually a biased (under) estimator of the Survival Function .

Link to original

Life Table Estimator

Definition

Notations

Assume intervals have equal length and let the notations:

  • : the -th interval where
  • : the number of individuals alive at the beginning of 
  • : the number of deaths during 
  • : the number of individuals censored during 
  • : where is the survival time.

Estimation of Life Table Estimator

The life table estimator is derived from the expression where

The life table estimator estimates the Survival Function as where and is called the effective sample size.

We assume that, on average, those individuals who became censored during  were at risk for half the interval

Variance of Life Table Estimator

For a given , assume that , where . Since , under the assumption of the independence of ‘s, the variance of the life table estimator is approximated as  Then, by the Delta Method, Greenwood’s formula is derived as

Confidence Interval for Life Table Estimator

Under the asymptotic normality of the estimator, we can use as a confidence interval for . However, this region could take values outside . To avoid this kind of problem, the log-log-transformation is used.

Log-log Transformation

To guarantee for the CI of  to be within , use log-log transformation. Let . Then, by delta-method i.e. CI for  is and CI for  is calculated as 

Examples

Consider a life table

where is the Reduced Sample Estimator and is the life table estimator.

The survival functions are estimated as

Link to original

Kaplan-Meier Estimator

Kaplan-Meier Estimator

Definition

Kaplan-Meier Estimator

Consider a Random Censoring case . Assume that , and the distinct failure times are where . Let be the number of deaths at , and be the number of alive at where the set is called the risk set at . We only observe

The Kaplan-Meier estimator is derived from the expression where

General Case

As an estimator of , consider

The Kaplan-Meier estimator is defined with the estimated ‘s The cumulative hazard function is estimated by Nelson-Aalen Estimator in the same logic

No Ties Case

When there’s no tie in the observation, , then the failure times are equal to the observation , death is equal to the censoring indicator , and . Thus, the Kaplan-Meier estimator is defined as

Properties

Self-Consistency

An estimator is self-consistent if where

The Kaplan-Meier estimator is the unique self-consistent estimator for where is the largest observation.

Generalized MLE

The Kaplan-Meier estimator gives the Generalized Maximum Likelihood Estimation of the Survival Function .

Strong Consistency

The Kaplan-Meier estimator uniformly Almost Surely converges to

Proof

Consider a function and decompose it to the sum of the subsurvival functions and . where is the uncensored case and is the censored case.

Then, the survival function can be expressed as a function of the subsurvival functions.

Define the empirical subsurvival functions and . The Kaplan-Meier estimator also can be expressed as a function of the empirical subsurvival functions.

By Glivenko-Cantelli theorem, and for all . Since is a continuous function of and ,

Asymptotic Normality

Kaplan-Meier estimator has asymptotic normality. where , , and .

The variance of the estimator is estimated by Greenwood’s formula For the no ties case, the formula is

Examples

case where

Facts

Kaplan-Meier estimator has Self-Consistency and Asymptotic Normality, and it is generalized MLE

If no censoring, Kaplan-Meier estimator is just the Empirical Survival Function.

Link to original

Hazard Function Estimators

Nelson-Aalen Estimator

Definition

Suppose that  are i.i.d. random variables represent survival times with CDF , and  are the censoring times. And let be the censoring indicator. In a Random Censoring setting, we only observe where .

The Nelson-Aalen estimator estimates cumulative hazard function as It is derived from Kaplan-Meier Estimator

Link to original

Peterson Estimator

Definition

Suppose that  are i.i.d. random variables represent survival times with CDF , and  are the censoring times. And let be the censoring indicator. In a Random Censoring setting, we only observe where .

The Peterson estimator estimates cumulative hazard function as It is derived from Kaplan-Meier Estimator

Link to original

Robust Estimators

Estimators for Survival Function

Definition

Mean of Survival Time

Without Censoring

where is the empirical CDF

With Censoring

where is calculated by the Kaplan-Meier Estimator and is the jump size at .

The asymptotic variance of the parameter is obtained by And the asymptotic variance is estimated by where and

Median of Survival Time

A reasonable estimator for is where is the Kaplan-Meier Estimator

If does not have a unique solution, then is defined as the midpoint of the interval constituting of the solutions.

However, the estimator over-estimate the true parameter, so linear smooth of the estimator is used to estimate the

The asymptotic variance of the estimated parameter is obtained by where is estimated by Greenwood’s formula, and may estimated by Kernel Estimation.

Link to original

Bayes Estimator for Survival Function

Definition

The Bayes estimator of with the Squared Error Loss and a Dirichlet process prior is given by where the squared loss is where is any non-negative non-decreasing function, the parameter which is a finite non-negative measure on , and .

Examples

If , then In many cases,

Link to original

Nonparametric Density Estimation

Kernel Estimation for Survival Analysis

Definition

Kaplan-Meier estimator is a step function. So it is difficult to calculate its quantile function and Density Function. The Kernel Density Estimation is used to make it smooth function.

Let be i.i.d. survival time with a distribution , and be i.i.d. censoring time with a distribution . We can observe where and is censoring indicator.

Without Censoring

For the complete data, the Kernel Density Estimation is defined as where is the kernel, is the scaled kernel, and is a smoothing parameter.

The kernel estimator for the Distribution Function is defined as where

With Censoring

For the censored data, the weights for each observation is defined as a jump size in Kaplan-Meier Estimator. where is the jump size at in Kaplan-Meier Estimator.

Thus, the Kernel Density Estimation for the Survival Function is

Link to original

Nonparametric Methods: Two Samples

Gehan Test

Definition

For the first sample, let be i.i.d. survival time with a distribution , and be i.i.d. censoring time with a distribution . We can observe where and is censoring indicator.

For the second sample, let be i.i.d. survival time with a distribution , and be i.i.d. censoring time with a distribution . We can observe where and is censoring indicator.

Gehan’s test is an extension of Signed-Rank Wilcoxon Test. The test statistic of Gehen’s test is defined as where .

Under the null hypothesis , let be the combined samples and define , , and where is the indicator set for sample 1.

Then, the variance of is calculated as and

Link to original

Hypothesis Test for a 2 by 2 Contingency Table

Definition

The hypothesis test for a contingency table is used to determine if there’s a significant association between two categorical variables

Consider a Contingency Table

DeadAlive
Population 1
Population 2
i.e. and

We want to test . Let , , , and

Uncorrelated Chi-squared Test

The test statistic is defined as

We reject if

Yates’ Corrected Chi-squared Test

The test statistic is defined as

We reject if

Fisher’s Exact Test

The test statistic is defined as where ,

We reject if

Corrected Chi-squared Test

The test statistic is defined as

We reject if

Liddell’s Exact Test

The test statistic is defined as

We reject if

Exact Unconditional Test

The test statistic is defined as where

We reject if

Approximate Unconditional Test

The test statistic is defined as where

We reject if

Link to original

Mantel-Haenszel Test

Definition

Consider a sequence of contingency tables

DeadAlive
Treatment 1
Treatment 2
where is the indicator of hospital

Under the null hypothesis , where and , the test statistic for Mantel-Haenszel test is defined as where and .

Link to original

Log-Rank Test

Definition

The log-rank test is a test to compare the survival functions of two samples. The test uses the sequence of Mantel-Haenszel statistics of the tables at each uncensored event time.

Examples

The Mantel-Haenszel statistic of the data is calculated as
and the p-value is
Link to original

Tarone-Ware Test

Definition

The Tarone-Ware test is the generalization of the Mantel-Haenszel Test. where is the weight for each table.

If , then it is MH Statistic, if then it is Gehan statistic, and if then it is Tarone-Ware statistic.

Link to original

Nonparametric Methods: K Samples

Generalized Gehan Test

Definition

For the -th sample, let , where , be i.i.d. survival time with a distribution , and be i.i.d. censoring time with a distribution . We can observe where and is the censoring indicator.

The generalized Gehan’s test is an extension of Gehan Test used for more than two sample case. Under the null hypothesis , the test statistic of generalized Gehen’s test is defined as where is the statistic of thet Gehan Test.

Then, where , , and

Link to original

Generalized Mantel-Haenszel Test

Definition

For the -th sample, let , where , be i.i.d. survival time with a distribution , and be i.i.d. censoring time with a distribution . We can observe where and is the censoring indicator.

The generalized Mantel-Haenszel test is an extension of Mantel-Haenszel Test used for more than two sample case.

Let be the combined samples

For each uncensored time point, construct a table.

Dead
Alive

Under the null hypothesis , the test statistic of generalized Mantel-Haenszel test is defined as where , are the -length vector and matrix, calculated with the first population is deleted data (corner-point constraint).

The and is defined as where where

Link to original

Nonparametric Methods: Regression

Cox Proportional Hazards Model

Cox Proportional Hazards Model

Definition

Cox proportional hazards model assume that covariates affect the Hazard Function.

Let be i.i.d. survival time, and be i.i.d. censoring time. We can observe , where and is censoring indicator, and have covariates . Then, the Cox proportional hazards model is defined as where is called the baseline hazard function, i.e. hazard at

Conditional Likelihood

Let (no ties case), and be the risk set. For each uncensored time , Therefore, Taking the product of these conditional probabilities gives a conditional likelihood where is the indicator set for uncensored samples.

The is not a likelihood. However, Cox suggested treating the conditional likelihood as an ordinary likelihood to find the Maximum Likelihood Estimation.

Since there’s no analytic solution for the MLE, iterative methods such as Newton–Raphson method is used to estimate the coefficient .

The hazard ratio represents the relative change in Hazard Rate for a one-unit increase in the covariate .

Goodness-of-Fit Test

For testing the null hypothesis , Cox suggested the Rao Test.

Asymptotic Normality of MLE

where is the observed Fisher Information

Estimation of Survival Function

Under the Cox proportional hazards model, To estimate , we can use for but we still need to estimate , , or .

Breslow suggested the estimators of and as If

If , then is the Kaplan-Meier Estimator

It has a few drawbacks

  • can take negative values.

Tsiatis suggested a non-negative version of where

Link suggested using the linear smooth of .

Discrete on Grouped Data

When data is discrete or grouped, there are ties at each failure. Denote the ordered discrete failure time by and let be the risk set at , be the death set at , and .

Cox suggested combining the all possible permutations. However, it is computationally infeasible. where , and is the size subset of

Peto suggested an alternative likelihood that instead of all possible permutations, use the same contribution.

Time Dependent Covariates

In the case, the covariate depends on time. We observe and the conditional likelihood defined as where is the indicator set for uncensored samples, and is the risk set.

Facts

Any two individuals have hazard functions that are constant multiples of the one another.

The Survival Function of the Cox proportional hazard model is a family of Lehmann alternatives. where .

If , , where is the indicator set for sample 1, and there are no ties, then Cox test is exactly equal to the Mantel-Haenszel Test.

Link to original

Linear Models

Accelerated Life Model

Definition

Consider the random variable represents survival time with Hazard Function with , , and And assume that the survival time of individual with covariate is defined as If then the covariate accelerates the time to failure. The model based on this assumption is called accelerated failure time (AFT) model.

Under AFT model

Let , then where .

Assume that , where is a Random Variable represents error term. Then, the AFT model becomes

Relationship between and . where

If then , and if , then where .

Link to original

Miller Estimator

Definition

Let be i.i.d. survival time, and be i.i.d. censoring time. We can observe , where and is censoring indicator

Suppose a Simple Linear Regression model

With no censoring present, the least squares estimators of the parameters are obtained by minimizing where is the Empirical Distribution Function of where .

With censoring present, Miller proposed to minimize where is the Kaplan-Meier Estimator based on and the weights is its jump size.

If the last observation is censored, then . Hence, change the last observation to be uncensored, so that .

Link to original

Buckley-James Estimator

Definition

Let be i.i.d. survival time, and be i.i.d. censoring time. We can observe , where and is censoring indicator

If we can observe the true survival time , we can make a model However, we can’t observe , but only censored , and Buckley and James proposed an Unbiased Estimator for Since we also can not observe , we estimate it again. where , is the Kaplan-Meier Estimator based on and the weights is its jump size.

The variance of the estimator is estimated by where , , , and

Link to original

Koul-Susarla-Van Ryzin Estimator

Definition

Let be i.i.d. survival time, and be i.i.d. censoring time. We can observe , where and is censoring indicator

Suppose a Simple Linear Regression model

Koul-Susarla-Van Ryzin proposed an Unbiased Estimator for Since is unknown, it should be estimated. Authors suggested to use Kaplan-Meier Estimator of with data as an estimator.

Link to original

Goodness-of-Fit Tests

Graphical Validity Tests for Survival Model

Definition

If the selected model holds, a plot of the data resembles a straight line, and if model fails, a plot resembles a curved line.

There are two types of plots, survival plots and hazard plots .

One Sample

Exponential Distribution

Weibull Distribution

Log-Normal Distribution

where is the Probit.

Two to K Samples

For parametric models, repeat one sample methods on each sample.

The validity of the Cox Proportional Hazards Model can be checked by the Lehmann alternatives property of the Survival Function.

Regression

Linear Model

Ordinary residual may be used in model checking.

Cox Proportional Hazard Model

where is the Kaplan-Meier Estimator based on and .

Also, the estimated cumulative hazard function and the covariates shouldn’t have any systematic pattern.

Link to original

Goodness-of-Fit Tests for Survival Model

Definition

No Censoring Case

Kolmogorov–Smirnov Test

Definition

The Kolmogorov–Smirnov test (KS test) is a non-parametric test for the equality of continuous, distribution functions.

The Kolmogorov–Smirnov test statistic for a given CDF is defined as where is the Empirical Distribution Function based on the i.i.d. random variables .

Link to original

Cramer-von Mises Test

Definition

The Cramer-von Mises test is a non-parametric test for the equality of continuous, distribution functions.

The Cramer-von Mises test statistic for a given CDF is defined as where is the Empirical Distribution Function based on the i.i.d. random variables .

Link to original

Censoring Case

Generalized Kolmogorov–Smirnov Test

The generalized Kolmogorov–Smirnov test uses Kaplan-Meier Estimator instead of Empirical Distribution Function used for Kolmogorov–Smirnov Test where is the Empirical Distribution Function based on the i.i.d. random variables .

Generalized Cramer-von Mises Test

The generalized Cramer-von Mises test uses Kaplan-Meier Estimator instead of Empirical Distribution Function used for Cramer-von Mises Test where is the Empirical Distribution Function based on the i.i.d. random variables .

Link to original

Miscellaneous Topics

Multivariate Survival Model

Kinds

Copula Model

Definition

Let be i.i.d. survival time with CDF , PDF , and Survival Function , be marginal survival function, be marginal CDF, and be i.i.d. censoring time.

Copula model define the joint survival function as the copula function whose arguments are marginal survival functions.

When , where is the copula function with uniform marginals.

Copula Functions

Clayton’s copula function where

Crowder’s copula function

Hougaard’s copula function

Link to original

Competing Risks Model

Definition

Competing risks (multiple modes of failures) model is designed to accommodate the multiple causes to the same event.

Let be the survival time and failure mode (competing risk type) for each individual, where

The mode-specific hazard function is defined as where

The marginal hazard function and marginal cumulative hazard function are defined as

Likelihood

We can observe , where , is censoring indicator, and is failure mode.

The likelihood function is defined as where

Estimators

The Nelson-Aalen Estimator for competing risk model is defined as where

The survival function is estimated by

The sub-distribution function is estimated by where

Link to original

Examples

When , where

where

where

Link to original

Basic Issues in Clinical Trials

Observational Study

Definition

An observational study draws inferences from a sample to population where the independent variable is not under the control of the researcher.

Types

Case-Control Study

Definition

A case-control study is a retrospective study that compares two groups of people: those with a specific outcome (cases) and similar people without the outcome (control)

Examples

Suppose that researchers want to study the relationship between smoking and lung cancer. They identify 100 lung cancer patients (cases) and 100 matched individuals without lung cancer (controls). They then collect data on past smoking habits for both groups and compare the prevalence of smoking between cases and controls.

Link to original

Cohort Study

Definition

A cohort study is a prospective study that follows a group of individuals (cohort) over time to determine the incidence of a specific outcome.

Examples

Suppose that researchers want to study the relationship between smoking and lung cancer. They follow a group of 10,000 people for 20 years, comparing smokers to non-smokers to determine the incidence of lung cancer.

Link to original

Cross-Sectional Study

Definition

A cross-sectional study analyzes data from a population at a specific point in time. It provides a snapshot of the prevalence in a population

Examples

Suppose that researchers want to study the relationship between smoking and lung cancer. They survey 1,000 people in a city, collecting data on their current smoking habits and presence of lung cancer.

Link to original

Link to original

Relative Risk

Definition

Consider a Contingency Table

EventNon-event
Group 1
Group 2

Point Estimation

The relative risk (RR) is estimated by

Confidence Interval

The confidence interval for relative risk is estimated by Delta Method. The confidence interval for is defined as

Facts

Relative risk must be used in Cohort Study or experimental study, can not be used for Case-Control Study.

Link to original

Odds Ratio

Definition

Consider a Contingency Table

EventNon-event
Group 1
Group 2

Point Estimation

The odds ratio (OR) is estimated by

Confidence Interval

The confidence interval for odds ratio is estimated by Delta Method. The confidence interval for is defined as

Facts

Odds ratio can be used for Case-Control Study.

Link to original

Tests of Association

Kinds

Independence Test for Two Discrete Variables

Definition

Let category variables and and consider a null hypothesis are independent. Then, the test Statistic, which follows Chi-squared Distribution, is defined as where , , ,

Link to original

Fisher’s Exact Test

The test statistic is defined as where ,

We reject if

Link to original

McNemar's Test

Definition

McNemar’s test is used to analyze paired nominal data (same samples are used for both conditions), particularly in before-and-after studies or matched-pair designs.

The test statistic is defined as

Examples

After treatment / No insomniaAfter treatment / Insomnia
Before treatment / No insomnia4515
Before treatment / Insomnia2515

Consider a null hypothesis The treatment is not effective. We can not reject the null hypothesis.

Link to original

Link to original

Confusion Matrix

Definition

Predicted Positive (PP)Predicted Negative (PN)
Actual PositiveTrue Positive (TP)False Negative (FN)
Actual NegativeFalse Positive (FP)True Negative (TN)

Metrics

Accuracy

Definition

Link to original

Recall

Definition

Recall, Sensitivity, or True positive rate means that the rate of correctly predicted cases out of all the actual positive cases..

Link to original

Precision

Definition

Precision means that the rate of actually positive cases out of cases predicted as positive.

Link to original

Specificity

Definition

Specificity means that the rate of correctly predicted cases out of all the actual negative cases.

Link to original

Type 1 Error

Definition

Link to original

Type 2 Error

Definition

Link to original

F-Score

Definition

F1 Score

The harmonic mean of Precision and Recall.

F-beta score

where

Recall is considered times as important as Precision.

Link to original

Positive Predictive Value

Definition

where

Positive predictive value (in medical statistics and epidemiology) means that the rate of actually positive cases out of cases predicted as positive.

Link to original

Link to original

Receiver Operating Characteristic Curve

Definition

A receiver operating characteristic curve (ROC curve) is a plot of the True Positive Rate and False Negative Rate at each threshold setting.

AUC

The area under the ROC curve is called AUC (Area under curve)

Link to original