Introduce to Survival Analysis
Survival Function and Hazard Rate
Survival Function
Definition
Let be a non-negative Random Variable representing survival time with PDF and CDF . The survival function of the random variable is defined as
The survival function is a function that gives the probability that a patient, device, or other object of interest will survive past a certain time.
Link to original
Hazard Function
Definition
Hazard Function
Let be a non-negative Random Variable representing survival time with PDF and CDF . The hazard function of the random variable is defined as where is the Survival Function.
The hazard function refers to the rate of occurring event at a given time .
Cumulative Hazard Function
The hazard function can alternatively be represented in terms of the cumulative hazard function, defined as
Facts
The Survival Function , the Cumulative Hazard Function, the density (PDF) , the Hazard Function, and the distribution function (CDF) of survival time are related through
Link to original
Types of Censoring
Right Censoring
Kinds
Type 1 Censoring
Definition
Type 1 Censoring
when (’s are constant)
Suppose that are i.i.d. random variables represent survival times with CDF , and are the censoring times. And let be the censoring indicator. We observe where .
In a type 1 censoring setting, the censored times are fixed constants, not random variables.
The PDF of the observation is derived as
Likelihood of Type 1 Censoring Data
The Likelihood Function of type 1 censoring data is defined as
Link to originalType 2 Censoring
Definition
Type 2 Censoring
when
Suppose that are i.i.d. random variables represent survival times with CDF , and are the censoring times. And let be the censoring indicator. We observe where .
In a type 2 censoring setting, we observe first out of experiment . In other words, for the order statistics of , we only observe . Where are not constants, but random variables.
Likelihood of Type 2 Censoring Data
The likelihood function of type 2 censored data can be computed using the same equation used for type 1 censored data but computing the joint-PDF of the Order Statistic is easier.
The joint PDF of is derived as
Link to originalLink to originalRandom Censoring
Definition
Random Censoring
Suppose that are i.i.d. random variables represent survival times with CDF , and are the censoring times, which can be both random variable or constant. And let be the censoring indicator.
In a random censoring setting, we only observe where , and the censoring times are are i.i.d. Random Variable follows PDF and CDF .
The PDF of the observation is derived as where is the parameter of interest, and is the: nuisance parameter
Likelihood of Random Censoring Data
The Likelihood Function of random censored data is defined as where is a constant.
Link to original
Left Censoring
Definition
Suppose that are i.i.d. random variables represent survival times, and are the censoring times, which can be both random variable or constant. And let be the censoring indicator.
In a left censoring setting, we observe where . In other words the event has already occurred before it becomes the subject of observed.
Link to original
Interval-Censored Data
Definition
Suppose that are i.i.d. random variables represent survival times. Interval censored data is given as an interval, not an exact point of time. We only observe the interval that includes Interval-censored data is divided into four cases.
Case 1 Interval-Censored Data (Current Status Data)
Data is given the form of or , where is a fixed time point.
Case 2 Interval-Censored Data
Data is given the form of where .
Double Censored Data
Data is given the form of where .
If , then it is right-censored data, if , then it is left-censored data, and if , then it is uncensored data.
Panel Data
Observations are made at discrete time points. The period between these observations can be viewed as an interval.
Link to original
Mean Imputation Method
Definition
Mean imputation method substitute the given interval of case 2 interval-censored data with the mean of the interval if the interval is finite. If , the data is substituted with the left value and treated as right-censored data.
Link to original
Parametric Models
Distributions
Survival Models based on Distributions
Kinds
Exponential Distribution
Exponential Distribution assumes that the Hazard Function is a constant regardless of time. The distribution is memoryless .
The PDF of survival time follows Exponential Distribution The Hazard Function The Survival Function
Gamma Distribution
The PDF of survival time follows Gamma Distribution The Hazard Function and Survival Function don’t have closed form expression.
Weibull Distribution
Weibull Distribution uses additional parameter than Exponential Distribution to control the shape of the Hazard Function. If then the hazard function is increasing, if then it is constant, and if then it is decreasing.
The PDF of survival time follows Weibull Distribution where is a scale parameter and is a shape parameter.
The Hazard Function The Survival Function
Rayleigh Distribution
The PDF of survival time follows Rayleigh distribution The Hazard Function The Survival Function
Log-Normal Distribution
We assume that the survival time follows a Log-Normal Distribution or . The Hazard Function of the distribution is hump-shaped.
The PDF of survival time follows Log-Normal Distribution The Survival Function where is the CDF of standard normal distribution
The Hazard Function doesn’t have closed form expression.
Gompertz Distribution
The Hazard Function where and
Gompertz-Makeham Distribution
The Hazard Function
Link to original
Survival Models based on Log-Lifetime
Definition
Assume that the log survival time can be modeled with Log-Linear Model where is the Scale Parameter, is the Location Parameter, and is some well-known distribution.
Kinds
Standard Gumbel Distribution
If follows the standard Gumbel distribution (extreme value distribution), then If and , then follows a Weibull Distribution.
Normal Distribution
If follows the standard normal distribution, then follows log-normal distribution where and
Logistic Distribution
If follows a Logistic Distribution, then follows Log-Logistic Distribution where and .
Gumbel Distribution
If follows a Gumbel Distribution (generalized extreme value distribution), then where and .
Special Cases
If then If then If then If then
Exponential-F Distribution
If follows a F-Distribution , then follows a generalized F-distribution.
Special Cases
If then follows Log-Logistic Distribution If then follows generalized gamma distribution If then follows Log-Normal Distribution
Link to original
Survival Models with Surviving Fractions
Definition
Consider an event such as death from a specific cause on the incidence of a particular disease.
Define a binary variable where indicates that an individual will experience the event eventually, and indicates that the individual will never experience the event.
Let denote the time of occurrence of the event with PDF and CDF , is defined only when , and and are independent. We observe where and .Suppose that the probability of is , .
No Covariate Case
The likelihood is defined as
With Covariates Case
The likelihood is defined as
Link to original
Nonparametric Methods: One Sample
Life Tables
Empirical Survival Function
Definition
Let be i.i.d. random variables with the Survival Function , then the empirical survival function is defined as Since is a Bernoulli trial,
Link to original
Reduced Sample Estimator
Definition
Notations
Assume intervals have equal length and let the notations:
- : the -th interval where
- : the number of individuals alive at the beginning of
- : the number of deaths during
- : the number of individuals censored during
- : where is the survival time.
Estimation of Reduced Sample Estimator
Reduced sample method estimates the Survival Function as where is the number of deaths in the interval , and is the number of uncensored data in
The drawback of the reduced sampling method is that it ignore the information contained in censored observations, therefore it usually a biased (under) estimator of the Survival Function .
Link to original
Life Table Estimator
Definition
Notations
Assume intervals have equal length and let the notations:
- : the -th interval where
- : the number of individuals alive at the beginning of
- : the number of deaths during
- : the number of individuals censored during
- : where is the survival time.
Estimation of Life Table Estimator
The life table estimator is derived from the expression where
The life table estimator estimates the Survival Function as where and is called the effective sample size.
We assume that, on average, those individuals who became censored during were at risk for half the interval
Variance of Life Table Estimator
For a given , assume that , where . Since , under the assumption of the independence of ‘s, the variance of the life table estimator is approximated as Then, by the Delta Method, Greenwood’s formula is derived as
Confidence Interval for Life Table Estimator
Under the asymptotic normality of the estimator, we can use as a confidence interval for . However, this region could take values outside . To avoid this kind of problem, the log-log-transformation is used.
Log-log Transformation
To guarantee for the CI of to be within , use log-log transformation. Let . Then, by delta-method i.e. CI for is and CI for is calculated as
Examples
Consider a life table
where is the Reduced Sample Estimator and is the life table estimator. The survival functions are estimated as
Link to original
Kaplan-Meier Estimator
Kaplan-Meier Estimator
Definition
Kaplan-Meier Estimator
Consider a Random Censoring case . Assume that , and the distinct failure times are where . Let be the number of deaths at , and be the number of alive at where the set is called the risk set at . We only observe
The Kaplan-Meier estimator is derived from the expression where
General Case
As an estimator of , consider
The Kaplan-Meier estimator is defined with the estimated ‘s The cumulative hazard function is estimated by Nelson-Aalen Estimator in the same logic
No Ties Case
When there’s no tie in the observation, , then the failure times are equal to the observation , death is equal to the censoring indicator , and . Thus, the Kaplan-Meier estimator is defined as
Properties
Self-Consistency
An estimator is self-consistent if where
The Kaplan-Meier estimator is the unique self-consistent estimator for where is the largest observation.
Generalized MLE
The Kaplan-Meier estimator gives the Generalized Maximum Likelihood Estimation of the Survival Function .
Strong Consistency
The Kaplan-Meier estimator uniformly Almost Surely converges to
Proof
Consider a function and decompose it to the sum of the subsurvival functions and . where is the uncensored case and is the censored case.
Then, the survival function can be expressed as a function of the subsurvival functions.
Define the empirical subsurvival functions and . The Kaplan-Meier estimator also can be expressed as a function of the empirical subsurvival functions.
By Glivenko-Cantelli theorem, and for all . Since is a continuous function of and ,
Asymptotic Normality
Kaplan-Meier estimator has asymptotic normality. where , , and .
The variance of the estimator is estimated by Greenwood’s formula For the no ties case, the formula is
Examples
case where
Facts
Kaplan-Meier estimator has Self-Consistency and Asymptotic Normality, and it is generalized MLE
Link to originalIf no censoring, Kaplan-Meier estimator is just the Empirical Survival Function.
Hazard Function Estimators
Nelson-Aalen Estimator
Definition
Suppose that are i.i.d. random variables represent survival times with CDF , and are the censoring times. And let be the censoring indicator. In a Random Censoring setting, we only observe where .
The Nelson-Aalen estimator estimates cumulative hazard function as It is derived from Kaplan-Meier Estimator
Link to original
Peterson Estimator
Definition
Suppose that are i.i.d. random variables represent survival times with CDF , and are the censoring times. And let be the censoring indicator. In a Random Censoring setting, we only observe where .
The Peterson estimator estimates cumulative hazard function as It is derived from Kaplan-Meier Estimator
Link to original
Robust Estimators
Estimators for Survival Function
Definition
Mean of Survival Time
Without Censoring
where is the empirical CDF
With Censoring
where is calculated by the Kaplan-Meier Estimator and is the jump size at .
The asymptotic variance of the parameter is obtained by And the asymptotic variance is estimated by where and
Median of Survival Time
A reasonable estimator for is where is the Kaplan-Meier Estimator
If does not have a unique solution, then is defined as the midpoint of the interval constituting of the solutions.
However, the estimator over-estimate the true parameter, so linear smooth of the estimator is used to estimate the
The asymptotic variance of the estimated parameter is obtained by where is estimated by Greenwood’s formula, and may estimated by Kernel Estimation.
Link to original
Bayes Estimator for Survival Function
Definition
The Bayes estimator of with the Squared Error Loss and a Dirichlet process prior is given by where the squared loss is where is any non-negative non-decreasing function, the parameter which is a finite non-negative measure on , and .
Examples
If , then
Link to originalIn many cases,
Nonparametric Density Estimation
Kernel Estimation for Survival Analysis
Definition
Kaplan-Meier estimator is a step function. So it is difficult to calculate its quantile function and Density Function. The Kernel Density Estimation is used to make it smooth function.
Let be i.i.d. survival time with a distribution , and be i.i.d. censoring time with a distribution . We can observe where and is censoring indicator.
Without Censoring
For the complete data, the Kernel Density Estimation is defined as where is the kernel, is the scaled kernel, and is a smoothing parameter.
The kernel estimator for the Distribution Function is defined as where
With Censoring
For the censored data, the weights for each observation is defined as a jump size in Kaplan-Meier Estimator. where is the jump size at in Kaplan-Meier Estimator.
Thus, the Kernel Density Estimation for the Survival Function is
Link to original
Nonparametric Methods: Two Samples
Gehan Test
Definition
For the first sample, let be i.i.d. survival time with a distribution , and be i.i.d. censoring time with a distribution . We can observe where and is censoring indicator.
For the second sample, let be i.i.d. survival time with a distribution , and be i.i.d. censoring time with a distribution . We can observe where and is censoring indicator.
Gehan’s test is an extension of Signed-Rank Wilcoxon Test. The test statistic of Gehen’s test is defined as where .
Under the null hypothesis , let be the combined samples and define , , and where is the indicator set for sample 1.
Then, the variance of is calculated as and
Link to original
Hypothesis Test for a 2 by 2 Contingency Table
Definition
The hypothesis test for a contingency table is used to determine if there’s a significant association between two categorical variables
Consider a Contingency Table
Dead Alive Population 1 Population 2 i.e. and We want to test . Let , , , and
Uncorrelated Chi-squared Test
The test statistic is defined as
We reject if
Yates’ Corrected Chi-squared Test
The test statistic is defined as
We reject if
Fisher’s Exact Test
The test statistic is defined as where ,
We reject if
Corrected Chi-squared Test
The test statistic is defined as
We reject if
Liddell’s Exact Test
The test statistic is defined as
We reject if
Exact Unconditional Test
The test statistic is defined as where
We reject if
Approximate Unconditional Test
The test statistic is defined as where
We reject if
Link to original
Mantel-Haenszel Test
Definition
Consider a sequence of contingency tables
Dead Alive Treatment 1 Treatment 2 where is the indicator of hospital Under the null hypothesis , where and , the test statistic for Mantel-Haenszel test is defined as where and .
Link to original
Log-Rank Test
Definition
The log-rank test is a test to compare the survival functions of two samples. The test uses the sequence of Mantel-Haenszel statistics of the tables at each uncensored event time.
Examples
Link to original
The Mantel-Haenszel statistic of the data is calculated as and the p-value is
Tarone-Ware Test
Definition
The Tarone-Ware test is the generalization of the Mantel-Haenszel Test. where is the weight for each table.
If , then it is MH Statistic, if then it is Gehan statistic, and if then it is Tarone-Ware statistic.
Link to original
Nonparametric Methods: K Samples
Generalized Gehan Test
Definition
For the -th sample, let , where , be i.i.d. survival time with a distribution , and be i.i.d. censoring time with a distribution . We can observe where and is the censoring indicator.
The generalized Gehan’s test is an extension of Gehan Test used for more than two sample case. Under the null hypothesis , the test statistic of generalized Gehen’s test is defined as where is the statistic of thet Gehan Test.
Then, where , , and
Link to original
Generalized Mantel-Haenszel Test
Definition
For the -th sample, let , where , be i.i.d. survival time with a distribution , and be i.i.d. censoring time with a distribution . We can observe where and is the censoring indicator.
The generalized Mantel-Haenszel test is an extension of Mantel-Haenszel Test used for more than two sample case.
Let be the combined samples
For each uncensored time point, construct a table.
Dead Alive Under the null hypothesis , the test statistic of generalized Mantel-Haenszel test is defined as where , are the -length vector and matrix, calculated with the first population is deleted data (corner-point constraint).
The and is defined as where where
Link to original
Nonparametric Methods: Regression
Cox Proportional Hazards Model
Cox Proportional Hazards Model
Definition
Cox proportional hazards model assume that covariates affect the Hazard Function.
Let be i.i.d. survival time, and be i.i.d. censoring time. We can observe , where and is censoring indicator, and have covariates . Then, the Cox proportional hazards model is defined as where is called the baseline hazard function, i.e. hazard at
Conditional Likelihood
Let (no ties case), and be the risk set. For each uncensored time , Therefore, Taking the product of these conditional probabilities gives a conditional likelihood where is the indicator set for uncensored samples.
The is not a likelihood. However, Cox suggested treating the conditional likelihood as an ordinary likelihood to find the Maximum Likelihood Estimation.
Since there’s no analytic solution for the MLE, iterative methods such as Newton–Raphson method is used to estimate the coefficient .
The hazard ratio represents the relative change in Hazard Rate for a one-unit increase in the covariate .
Goodness-of-Fit Test
For testing the null hypothesis , Cox suggested the Rao Test.
Asymptotic Normality of MLE
where is the observed Fisher Information
Estimation of Survival Function
Under the Cox proportional hazards model, To estimate , we can use for but we still need to estimate , , or .
Breslow suggested the estimators of and as If
If , then is the Kaplan-Meier Estimator
It has a few drawbacks
- can take negative values.
Tsiatis suggested a non-negative version of where
Link suggested using the linear smooth of .
Discrete on Grouped Data
When data is discrete or grouped, there are ties at each failure. Denote the ordered discrete failure time by and let be the risk set at , be the death set at , and .
Cox suggested combining the all possible permutations. However, it is computationally infeasible. where , and is the size subset of
Peto suggested an alternative likelihood that instead of all possible permutations, use the same contribution.
Time Dependent Covariates
In the case, the covariate depends on time. We observe and the conditional likelihood defined as where is the indicator set for uncensored samples, and is the risk set.
Facts
Any two individuals have hazard functions that are constant multiples of the one another.
The Survival Function of the Cox proportional hazard model is a family of Lehmann alternatives. where .
Link to originalIf , , where is the indicator set for sample 1, and there are no ties, then Cox test is exactly equal to the Mantel-Haenszel Test.
Linear Models
Accelerated Life Model
Definition
Consider the random variable represents survival time with Hazard Function with , , and And assume that the survival time of individual with covariate is defined as If then the covariate accelerates the time to failure. The model based on this assumption is called accelerated failure time (AFT) model.
Under AFT model
Let , then where .
Assume that , where is a Random Variable represents error term. Then, the AFT model becomes
Relationship between and . where
If then , and if , then where .
Link to original
Miller Estimator
Definition
Let be i.i.d. survival time, and be i.i.d. censoring time. We can observe , where and is censoring indicator
Suppose a Simple Linear Regression model
With no censoring present, the least squares estimators of the parameters are obtained by minimizing where is the Empirical Distribution Function of where .
With censoring present, Miller proposed to minimize where is the Kaplan-Meier Estimator based on and the weights is its jump size.
If the last observation is censored, then . Hence, change the last observation to be uncensored, so that .
Link to original
Buckley-James Estimator
Definition
Let be i.i.d. survival time, and be i.i.d. censoring time. We can observe , where and is censoring indicator
If we can observe the true survival time , we can make a model However, we can’t observe , but only censored , and Buckley and James proposed an Unbiased Estimator for Since we also can not observe , we estimate it again. where , is the Kaplan-Meier Estimator based on and the weights is its jump size.
The variance of the estimator is estimated by where , , , and
Link to original
Koul-Susarla-Van Ryzin Estimator
Definition
Let be i.i.d. survival time, and be i.i.d. censoring time. We can observe , where and is censoring indicator
Suppose a Simple Linear Regression model
Koul-Susarla-Van Ryzin proposed an Unbiased Estimator for Since is unknown, it should be estimated. Authors suggested to use Kaplan-Meier Estimator of with data as an estimator.
Link to original
Goodness-of-Fit Tests
Graphical Validity Tests for Survival Model
Definition
If the selected model holds, a plot of the data resembles a straight line, and if model fails, a plot resembles a curved line.
There are two types of plots, survival plots and hazard plots .
One Sample
Exponential Distribution
Weibull Distribution
Log-Normal Distribution
where is the Probit.
Two to K Samples
For parametric models, repeat one sample methods on each sample.
The validity of the Cox Proportional Hazards Model can be checked by the Lehmann alternatives property of the Survival Function.
Regression
Linear Model
Ordinary residual may be used in model checking.
Cox Proportional Hazard Model
where is the Kaplan-Meier Estimator based on and .
Also, the estimated cumulative hazard function and the covariates shouldn’t have any systematic pattern.
Link to original
Goodness-of-Fit Tests for Survival Model
Definition
No Censoring Case
Kolmogorov–Smirnov Test
Definition
The Kolmogorov–Smirnov test (KS test) is a non-parametric test for the equality of continuous, distribution functions.
The Kolmogorov–Smirnov test statistic for a given CDF is defined as where is the Empirical Distribution Function based on the i.i.d. random variables .
Link to originalCramer-von Mises Test
Definition
The Cramer-von Mises test is a non-parametric test for the equality of continuous, distribution functions.
The Cramer-von Mises test statistic for a given CDF is defined as where is the Empirical Distribution Function based on the i.i.d. random variables .
Link to originalCensoring Case
Generalized Kolmogorov–Smirnov Test
The generalized Kolmogorov–Smirnov test uses Kaplan-Meier Estimator instead of Empirical Distribution Function used for Kolmogorov–Smirnov Test where is the Empirical Distribution Function based on the i.i.d. random variables .
Generalized Cramer-von Mises Test
The generalized Cramer-von Mises test uses Kaplan-Meier Estimator instead of Empirical Distribution Function used for Cramer-von Mises Test where is the Empirical Distribution Function based on the i.i.d. random variables .
Link to original
Miscellaneous Topics
Multivariate Survival Model
Kinds
Copula Model
Definition
Let be i.i.d. survival time with CDF , PDF , and Survival Function , be marginal survival function, be marginal CDF, and be i.i.d. censoring time.
Copula model define the joint survival function as the copula function whose arguments are marginal survival functions.
When , where is the copula function with uniform marginals.
Copula Functions
Clayton’s copula function where
Crowder’s copula function
Link to originalHougaard’s copula function
Competing Risks Model
Definition
Competing risks (multiple modes of failures) model is designed to accommodate the multiple causes to the same event.
Let be the survival time and failure mode (competing risk type) for each individual, where
The mode-specific hazard function is defined as where
The marginal hazard function and marginal cumulative hazard function are defined as
Likelihood
We can observe , where , is censoring indicator, and is failure mode.
The likelihood function is defined as where
Estimators
The Nelson-Aalen Estimator for competing risk model is defined as where
The survival function is estimated by
The sub-distribution function is estimated by where
Link to originalExamples
When , where
where
where
Link to original
Basic Issues in Clinical Trials
Observational Study
Definition
An observational study draws inferences from a sample to population where the independent variable is not under the control of the researcher.
Types
Case-Control Study
Definition
A case-control study is a retrospective study that compares two groups of people: those with a specific outcome (cases) and similar people without the outcome (control)
Examples
Suppose that researchers want to study the relationship between smoking and lung cancer. They identify 100 lung cancer patients (cases) and 100 matched individuals without lung cancer (controls). They then collect data on past smoking habits for both groups and compare the prevalence of smoking between cases and controls.
Link to originalCohort Study
Definition
A cohort study is a prospective study that follows a group of individuals (cohort) over time to determine the incidence of a specific outcome.
Examples
Suppose that researchers want to study the relationship between smoking and lung cancer. They follow a group of 10,000 people for 20 years, comparing smokers to non-smokers to determine the incidence of lung cancer.
Link to originalLink to originalCross-Sectional Study
Definition
A cross-sectional study analyzes data from a population at a specific point in time. It provides a snapshot of the prevalence in a population
Examples
Suppose that researchers want to study the relationship between smoking and lung cancer. They survey 1,000 people in a city, collecting data on their current smoking habits and presence of lung cancer.
Link to original
Relative Risk
Definition
Consider a Contingency Table
Event Non-event Group 1 Group 2 Point Estimation
The relative risk (RR) is estimated by
Confidence Interval
The confidence interval for relative risk is estimated by Delta Method. The confidence interval for is defined as
Facts
Link to originalRelative risk must be used in Cohort Study or experimental study, can not be used for Case-Control Study.
Odds Ratio
Definition
Consider a Contingency Table
Event Non-event Group 1 Group 2 Point Estimation
The odds ratio (OR) is estimated by
Confidence Interval
The confidence interval for odds ratio is estimated by Delta Method. The confidence interval for is defined as
Facts
Link to originalOdds ratio can be used for Case-Control Study.
Tests of Association
Kinds
Independence Test for Two Discrete Variables
Definition
Let category variables and and consider a null hypothesis are independent. Then, the test Statistic, which follows Chi-squared Distribution, is defined as where , , ,
Link to originalFisher’s Exact Test
The test statistic is defined as where ,
We reject if
Link to originalLink to originalMcNemar's Test
Definition
McNemar’s test is used to analyze paired nominal data (same samples are used for both conditions), particularly in before-and-after studies or matched-pair designs.
The test statistic is defined as
Examples
After treatment / No insomnia After treatment / Insomnia Before treatment / No insomnia 45 15 Before treatment / Insomnia 25 15 Consider a null hypothesis The treatment is not effective. We can not reject the null hypothesis.
Link to original
Confusion Matrix
Definition
Predicted Positive (PP) Predicted Negative (PN) Actual Positive True Positive (TP) False Negative (FN) Actual Negative False Positive (FP) True Negative (TN) Metrics
Accuracy
Definition
Link to original
Recall
Definition
![]()
Recall, Sensitivity, or True positive rate means that the rate of correctly predicted cases out of all the actual positive cases..
Link to originalPrecision
Definition
Precision means that the rate of actually positive cases out of cases predicted as positive.
Link to originalSpecificity
Definition
Specificity means that the rate of correctly predicted cases out of all the actual negative cases.
Link to originalType 1 Error
Definition
Link to original
Type 2 Error
Definition
Link to original
F-Score
Definition
F1 Score
The harmonic mean of Precision and Recall.
F-beta score
where
Recall is considered times as important as Precision.
Link to originalLink to originalPositive Predictive Value
Definition
where
Positive predictive value (in medical statistics and epidemiology) means that the rate of actually positive cases out of cases predicted as positive.
Link to original
Receiver Operating Characteristic Curve
Definition
![]()
A receiver operating characteristic curve (ROC curve) is a plot of the True Positive Rate and False Negative Rate at each threshold setting.
AUC
The area under the ROC curve is called AUC (Area under curve)
Link to original


when
when 







A reasonable estimator for
If
However, the estimator
In many cases, 














