Probability and Distributions

The Probability Set Function

Sigma-Algebra

Definition

A sigma-algebra is a Family of Sets satisfying the following properties

where is a universal set

Intersection of Sigma-Fields

Consider a set of sigma-fields on ,

  • is a sigma-field on .
  • is a sigma-field on .
  • is a sigma-field on

Facts

If a set of subsets of a universal set is both Pi-System and Lambda-System, then is a sigma-algebra on .

Link to original

Probability Measure

Definition

The Measure with normality

Link to original

Inclusion-Exclusion Formula

Definition

Link to original

Increasing and Decreasing Sequence of Events

Definition

Facts

Let be an increasing sequence of events, then

Let be a decreasing sequence of events, then

Link to original

Boole's Inequality

Definition

Link to original

Conditional Probability

Definition

Facts

Link to original

Bayes Theorem

Definition

Discrete Case

where by Law of Total Probability

is called a prior probability, and is called a posterior probability

Continuous Case

where is not a constant, but an unknown parameter follows a certain distribution with a parameter .

is called a prior probability, is called a likelihood, is called an evidence or marginal likelihood, and is called a posterior probability

Parameter-Centric Notation

Examples

Consider a random variable follows Binomial Distribution and a prior distribution follows Beta Distribution where .

The PDFs are defined as and . Then, by Bayes theorem, Under Squared Error Loss, the Bayes Estimator is a mean of the posterior distribution.

Link to original

Statistical Independence

Definition

Statistical Independence of Events

Suppose a Probability Space . The events are independent if

Statistical Independence of Two Random Variables

Rigorous

Suppose a Probability Space . random variables are independent if are independent,

Casual

Two random variables are independent if and only if

Statistical Independence of Random Variables

Mutually Independent

A collection of random variables are mutually independent if and only if where is marginal CDF of

Or equivalently, a collection of random variables are mutually independent if and only if where is marginal PDF of

Pointwise Independent

A collection of random variables are Pointwise independent if and only if where is marginal CDF of and is joint pdf of and .

Statistical Independence of Stochastic Processes

Facts

Random variables are independent

where is a function of only, and is a function of only.

Random variables are independent

Random variables are independent

Random variables are independent

If Moment Generating Function exists and Random variables are independent

Mutually independence pairwise independence

If is mutually independent, then

If random variables are independent, are Borel measurable, so are

Let Random variables . and are independent

If and follow normal distributions, and are independent

Link to original

Random Variables

Random Variable

Definition

A random variable is a function whose inverse function is -measurable for the two measurable spaces and .

The inverse image of an arbitrary Borel set of Codomain is an element of sigma field .

Notations

Consider a probability space

  • Outcomes:
  • Set of outcomes (Sample space):
  • Events:
  • Set of events (Sigma-Field):
  • Probabilities:
  • Random variable:

For a random variable on a Probability Space

For a random variable on Probability Space and another random variable on Probability Space

Link to original

Transclude of Density-Function

Distribution Function

Definition

A distribution function is a function for the Random Variable on Probability Space

Facts

Proof By definition, So, defining is the Equivalence Relation to defining

Since is a Pi-System, is uniquely determined by by Extension from Pi-System

Now, is a Measurable Space induced by . Therefore, defining on is equivalent to the defining on

Distribution function(CDF) has the following properties

  • Monotonic increasing:
  • Right-continuous:

If a function satisfies the following properties, then is a distribution function(CDF) of some Random Variable

  • Monotonic increasing:
  • Right-continuous:

Link to original

Transformation of Random Variable

Definition

A new Random Variable can be defined by applying a function to the outcomes of a Random Variable

Discrete

Transformation Technique

Suppose is Bijective function

Continuous

Let be the PDF of Random Variable with a support , be a differentiable Bijective function, and

Transformation Technique

where

CDF Technique

If is increasing, then

If is decreasing, then

Link to original

Expectation of a Random Variable

Expected Value

Definition

The expected value of the Random Variable on Probability Space

Continuous

where is a PDF of Random Variable

The expected value of the Random Variable when satisfies absolute continuous over ,

Discrete

where is a PMF of Random Variable

The expected value of the Random Variable when satisfies absolute continuous over , In other words, has at most countable jumps.

Expected Value of a Function

Continuous

Discrete

Properties

Linearity

Random Variables

If , then

Matrix of Random Variables

Let be a matrices of random variables, be matrices of constants, and a matrix of constant. Then,

Notations

ExpressionDiscreteContinuous
Expression for the event and the probability
Expression for the measurable space and the distribution
Link to original

Some Special Expectations

Moment

Definition

-th moment

Facts

Link to original

Central Moment

Definition

-th central moment

A Moment of a random variable about its mean

Link to original

Moment Generating Function

Definition

Univariate

Let be a Random Variable

Then, is called the moment generating function of the

Calculations of Moments

Moment can be calculated using the mgf

Uniqueness of MGF

Let be random variables with mgf , respectively Then, where

Let be a Random Variable If , where , then determine the distribution uniquely.

Multivariate

Let be a Random Vector where is a constant vector.

Calculations of Multivariate Moments

Moment can be calculated using the mgf

Marginal MGF

Facts

The mgf may not exist.

Let be a Random Vector. If exists and can be expressed as then and are statistically independent.

Let be a Random Vector, , and , then and is independent where and are functions.

Link to original

Characteristic Function

Definition

Link to original

Cumulant Generating Function

Definition

Calculations of Cumulants

the -th cumulant can be calculated using cgf

Link to original

Important Inequalities

Markov's Inequality

Definition

Let be a non-negative Random Variable, and , then where serves as an upper bound for the probability

Extended Version for Non-negative Functions

Let be a non-negative Function of a Random Variable , and , then where serves as an upper bound for the probability

Link to original

Chebyshev's Inequality

Definition

Link to original

Jensen's Inequality

Definition

Let be a Convex Function on an interval , be a Random Variable with support , and , then

Facts

By Jensen’s Inequality, the following relation is satisfied arithmetic mean geometric mean harmonic mean

Link to original

Multivariate Distribution

Random Vector

Definition

Column vector whose components are random variables .

Link to original

Multivariate Distribution

Definition

Joint CDF

The Distribution Function of a Random Vector is defined as where

Joint PDF

The joint probability Distribution Function of discrete Random Vector

The joint probability Distribution Function of continuous Random Vector

Expected Value of a Multivariate Function

Continuous

E[g(\mathbf{X})] = \idotsint\limits_{x_{n} \dots x_{1}} g(\mathbf{x})f(\mathbf{x})dx_{1} \dots \dots dx_{n}

Discrete

Marginal Distribution of a Multivariate Function

Marginal CDF

Marginal PDF

f_{X_{1}}(x_{1}) = E[g(\mathbf{X})] = \idotsint\limits_{x_{n} \dots x_{1}} f(x_{2}, \dots, x_n)dx_{2} \dots \dots dx_{n}

Conditional Distribution of a Multivariate Function

Properties

Linearity

If , then

Link to original

Transformation of Random Vector

Definition

Discrete

Let be a Random Vector with joint pmf and support , and be a vector Bijective transformation

Continuous

Let be a Random Vector with joint pdf and support , be a vector Bijective transformation, and

One-to-One Transformation Case

where is a Jacobian Matrix, and

Many-to-One Transformation Case

Let be a many-to-one transformation, , be a partition of a set whose each partition’s transformation result is one-to-one Then, where is a Jacobian Matrix for each partition, and , and is a union of the pairwise disjoint sets

CDF Technique

If is increasing, then

If is decreasing, then

Link to original

Conditional Distribution

Conditional Distribution

Definition

where

Link to original

Conditional Expectation

Definition

Conditional Expectation of a Random Variable

Conditional Expectation of a Function

Conditional Expectation with respect to a Sub-Sigma-Algebra

Consider a Probability Space , a Random Variable , and a sub-Sigma-Algebra . A conditional expectation of given , denoted as is a function which satisfies:

Existence of Conditional Distribution

Define a measure and its restriction where , and . Then, ( is absolute continuous with respect to ) and there exists a Radon–Nikodym Theorem satisfying and it is called the conditional expectation of given .

Conditional Expectation with respect to a Random Variable

Suppose a Probability Space , and random variables and . The conditional expectation of given , denoted as is defined as where is the sigma-field generated by random variable

Facts

If is -measurable, then , where is sub-Sigma-Field

Since is -measurable, -measurable function s.t. By the property of conditional expectation, .

Link to original

Conditional Variance

Definition

Link to original

Law of Total Expectation

Definition

Proof

By definition of conditional expectation Since is sub-Sigma-Field, and hold.

Then, By definition of Expected Value

Link to original

Law of Total Variance

Definition

Link to original

The Correlation Coefficient

Covariance

Definition

Properties

Covariance with Itself

Covariance of Linear Combinations

Link to original

Pearson Correlation Coefficient

Definition

Facts

Let is linear in , that is , then the Conditional Expectation and Conditional Variance will be

Link to original

Independent Random Variable

Independent and Identically Distributed Random Variable

Definition

A Random Vector is independent and identically distributed if each Random Variable has the same Distribution and are mutually independent.

Facts

If is i.i.d. Random Vector, then

Link to original

Extension to Several Random Variables

Covariance Matrix

Definition

Variance-Covariance Matrix

Let be an dimensional Random Vector with finite variance , and , then the variance-covariance matrix of is

Cross-Covariance Matrix

Let be dimensional random vectors with finite variance , and , then the cross-covariance matrix of is

Properties

Let be a Random Vector with finite variance , and be a matrix of constants, then

Let and be random vectors. where and are matrices of constants.

Facts

Every variance-covariance matrix is Positive Semi-Definite Matrix

Let be a Random Vector such that no element of is a linear combination of the remaining elements. Then, the Variance-Covariance Matrix of the random vector is Positive-Definite Matrix.

Link to original

Some Special Distributions

Uniform Distribution

Definition

where and are the minimum and maximum values.

Properties

PDF

CDF

Mean

Variance

Link to original

Bernoulli Distribution

Definition

where is a probability of success

number of success in a single trial with success probability

Bernoulli Process

The i.i.d. Random Vector with Bernoulli distribution

Properties

PDF

Mean

Variance

Link to original

Binomial Distribution

Definition

where is the length of bernoulli process, and is a probability of success

The number of successes in length bernoulli process with success probability

Properties

PDF

Mean

Variance

MGF

Facts

Let be independent random variables following binomial distribution, then

Link to original

Negative Binomial Distribution

Definition

where is a number of successes, and is a probability in each experiment

The number of failures in a bernoulli process before the -th success

Properties

PDF

Mean

Variance

MGF

Link to original

Geometric Distribution

Definition

where is a probability in each experiment

The number of failures before the 1st success

Properties

PDF

Mean

Variance

Link to original

Trinomial Distribution

Definition

where is the number of trials, is a probability of category , and is a probability of category

Distribution that expands the outcomes of a binomial distribution to three

Properties

PDF

MGF

Marginal PDF

Conditional PDF

Link to original

Multinomial Distribution

Definition

where is the number of trials, is a probability of category

Distribution that describes the probability of observing a specific combination of outcomes

Properties

PDF

where , , and

MGF

Marginal PDF

Each one-variable marginal pdf is Binomial Distribution, each two-variables marginal pdf is Trinomial Distribution, and so on.

Link to original

Hypergeometric Distribution

Definition

where is the population size, is the number of success, and is the number of trials,

Distribution that describes the probability of successes in draws, without replacement, from a population that contains successes.

Draws without replacement version of Binomial Distribution

Properties

PDF

Mean

Variance

Link to original

Poisson Distribution

Poission Distribution

Definition

where is the average number of occurrences in a fixed interval of time

The number of occurrences in a fixed interval of time with mean

Properties

PDF

where

Mean

Variance

MGF

Summation

Let , and ‘s are independent, then

Link to original

The Gamma, Chi-Squared, and Beta Distribution

Gamma Function

Definition

where

where

Gamma function is used as an extension of the factorial function

Properties

Link to original

Gamma Distribution

Definition

where is the shape parameter, and is the scale parameter

Distribution that models the waiting time until occurring th events in a Poisson process with mean

Properties

PDF

where is the Gamma Function

Mean

Variance

MGF

where and

Sum of independent Gamma distributions

Let ‘s are independent gamma distributions

Scaling

Let , then,

Let , then,

Relationship with Poisson Distribution

Let be the time until the -th event in a poisson process with rate , then, the cdf and pdf of is

Link to original

Exponential Distribution

Definition

where is the average number of events per fixed unit of time

Time interval between events in a Poisson process with mean

Properties

PDF

CDF

Mean

Variance

Memorylessness Property

Link to original

Chi-squared Distribution

Definition

where is the degrees of freedom

squared sum of independent standard normal distributions

Properties

PDF

Mean

Variance

MGF

Additivity

Let ‘s are independent chi-squared distributions

Facts

Let and , and is independent of , then

Let , where are quadratic forms in , where each element of the is a Random Sample from If , then

  • are independent
Link to original

Beta Distribution

Definition

where are the shape parameters

Properties

PDF

Mean

Variance

Derivation from Gamma Distribution

Let are independent gamma distributions, then

Link to original

Dirichlet Distribution

Definition

where is the number of categories, and ‘s are the concentration parameters

A multivariate generalization of the Beta Distribution

Properties

PDF

Facts

If , then the Dirichlet distribution is a beta distribution.

Link to original

The Normal Distribution

Normal Distribution

Definition

where is the location parameter(mean), and is the scale parameter(variance)

Standard Normal Distribution

Properties

PDF

Mean

Variance

MGF

Higher Order Moments

Sum of Normally Distributed Random Variables

Let be independent random variables following normal distribution, then

Relationship with Chi-squared Distribution

Let be a standard normal distribution, then

Facts

Let , then

Link to original

The Multivariate Normal Distribution

Multivariate Normal Distribution

Definition

where is the number of dimensions, is the vector of location parameters, and is the vector of scale parameters

Standard Multivariate Normal Distribution

PDF

MGF

Properties

PDF

Mean

Variance

MGF

Affine Transformation

Let be a Random Variable following multivariate normal distribution, be a matrix, and be a dimensional vector, then

Relationship with Chi-squared Distribution

Suppose be a Random Variable following multivariate normal distribution, then

Facts

Let , be an , and be a -dimensional vector, then

Let , , , , where is and is vectors. Then, and are independent

Let , , and , where is , is matrices. and are independent

Link to original

t-and F-Distribution

Student's t-Distribution

Definition

Let be a standard normal distribution, be a Chi-squared Distribution, and be independent, then where is the degrees of freedom

Properties

PDF

Mean

Variance

Link to original

F-Distribution

Definition

Let be independent random variables following Chi-squared distributions, then where are the degrees of freedoms

Properties

PDF

{\Gamma(\frac{r_{1} + r_{2}}{2}) (\frac{r_{1}}{r_{2}})^{\frac{r_{1}}{2}} x^{\frac{r_{1}}{2}-1}} {\Gamma(\frac{r_{1}}{2}) \Gamma(\frac{r_{2}}{2}) (\frac{r_{1}}{r_{2}}x+1)^{(r_{1} + r_{2})/2}}$$ ## Mean $$E(X) = \frac{r_{2}}{r_{2} - 2},\quad r_{2}>2$$ ## Variance $$\operatorname{Var}(X) = \frac{2(r_{1}+r_{2}-2)}{r_{1} (r_{2}-2)^{2}(r_{2}-4)},\quad r_{2}>4$$Link to original

Student's Theorem

Definition

Let be i.i.d. Random Vector following Normal Distribution, , and , then

  • and are independent
Link to original

Consistency, and Limiting Distributions

Convergence in Probability

Convergence in Probability

Definition

Let be a Sequence of random variables and be a Random Variable, then converges in probability to if , and denoted by

Facts

If the Sequence of random variables converges in probability to , then it also Almost Surely converge to .

Convergence in Probability implies convergence in distribution

Link to original

Convergence in probability has Linearity

  • Additivity
  • Homogeneity

Continuous Mapping Theorem

Definition

Continuous functions preserve convergences (in probability, Almost Surely, or in distribution) of a Sequence of random variables to limits.

Consider a Sequence of random variables defined on same Probability Space, and a Continuous Function on the space. Then,

Link to original

Link to original

Consistency

Definition

A Statistic is called consistent estimator of if converges in probability to

Link to original

Converge in Distribution

Convergence in Distribution

Definition

Let be a Sequence of random variables with CDF , be a Random Variable with cdf , and be a set of every point at which is continuous, then converges in distribution to if , and denoted by . is called the limiting distribution of or asymptotic distribution of

Facts

Convergence in Probability implies convergence in distribution

Continuous Mapping Theorem

Definition

Continuous functions preserve convergences (in probability, Almost Surely, or in distribution) of a Sequence of random variables to limits.

Consider a Sequence of random variables defined on same Probability Space, and a Continuous Function on the space. Then,

Link to original

Slutzky Theorem

Definition

Let be Sequence of random variables, be a Random Variable, be constants, then

Link to original

Let be a sequence of distributions. Then, The convergence of the KL-Divergence to zero implies that the JS-Divergence also converges to zero. The convergence of the JS-Divergence to zero is equivalent to the convergence of the Total Variation Distance to zero. The convergence of the Total Variation Distance to zero implies that the Wasserstein Distance also converges to zero. The convergence of the Wasserstein Distance to zero is equivalent to the Convergence in Distribution of the sequence.

Link to original

Link to original

Boundedness in Probability

Definition

Facts

Let be a Random Vector, and be a Random Variable, then If , then is bounded in probability

Link to original

Big O Notation

Definition

Let be the rate of convergence, and be a Sequence, then

Link to original

Big O in Probability Notation

Definition

Extension of Big O Notation for Random Variable

Let be the rate of convergence, and be a Sequence of random variables, then

Facts

If , then is called Boundedness in Probability

Link to original

Delta Method

Definition

Univariate Delta Method

Let be a sequence of random variables satisfying , be a differentiable function at , and , then

Proof

By Taylor Series approximation

Where by the assumption. By the continuous mapping theorem, , which is a function of , also converges to random variable, so it is Boundedness in Probability by the property of converging random vector

Therefore, by the property of sequence of random variables bounded in probability

Examples

Estimation of the sample variance of Bernoulli Distribution

Let by CLT

by Delta method

Let , then

Therefore, the sample mean and variance follow such distributions

Visualization

  • x-axis:
  • y_axis:
  • y1: ^[variance by mean]
  • y2: ^[sample mean]
  • y3, y4: ^[sample variance that calculated by the sample mean]
  • y5: first order approximated line at

If sample size , variance of sample mean . So, sample variance can be well approximated by first-order approximation.

Link to original

MGF Technique

Definition

Let be a Sequence of random variables with MGF and be a Random Variable with MGF , then

A technique calculating limiting distribution using Moment Generating Function

Facts

Link to original

Central Limit Theorem

Central Limit Theorem

Definition

Let be i.i.d. Random Sample from a distribution with mean , variance , then

For i.i.d random variables that have finite variance, the sample mean converges to Normal Distribution.

Link to original

Asymptotics for Multivariate Distributions

Asymptotics for Multivariate Distributions

Definition

Convergence in Probability

Let be a Sequence of random vectors and be a Random Vector, then converges in probability to if , and denoted by

Let be a Sequence of random vectors and be a Random Vector, then

Convergence in Distribution

Let be a Sequence of random vectors with CDF , be a Random Variable with cdf , and be a set of every point at which is continuous, then converges in distribution to if , and denoted by . is also called limiting distribution of or asymptotic distribution of

Continuous Mapping Theorem

Let be a Continuous Function

MGF Technique

Let be a Sequence of random vectors with MGF and be a Random Vector with MGF , then

Central Limit Theorem

Let be i.i.d. Sequence of random vectors from a distribution with mean , variance-covariance matrix , then

For i.i.d random variables that have finite variance, the sample mean converges to Normal Distribution.

Delta Method

Let be a Sequence of p-dimensional random vectors with , , be a matrix, and , then

Facts

Let , then

Let be a Sequence of p-dimensional random vectors, , be a constant matrix, and be a dimensional constant vector, then

Link to original

Some Elementary Statistical Inferences

Sample and Statistic

Random Sample

Definition

A realization of i.i.d. Random Vector

Properties

Let be random samples, be constant matrix, and be constant matrix, then the Expected Value and Covariance Matrix of random vectors become

Link to original

Statistic

Definition

A quantity computed from values in a Random Sample

Link to original

Bias of an Estimator

Definition

Let be a Random Sample from where is a parameter, and be a Statistic.

is unbiased estimator An estimator is unbiased if its bias is equal to zero for all values of parameter .

Link to original

Order Statistic

Order Statistic

Definition

Let be Random Sample from a population with a PDF and CDF . And let be the smallest of ‘s, be the 2nd smallest of ‘s , , and be the largest of ‘s, then is called the order statistic of

Properties

Joint PDF

where

Marginal PDF

Joint PDF of and

Suppose

Joint PDF of

Suppose where

Link to original

Quantiles

Definition

Let be a Random Variable with cdf , and be a -th quantile of

Properties

Estimator of Quantile

Let be order statistics, then we can define an estimator of quantile using the order statistic where

and call the a -th sample quantile

Confidence Intervals of Quantiles

where

Select satisfying the equation

Link to original

Tolerance Limits for Distributions

Tolerance Limits for Distributions

Definition

Let be Random Sample from a distribution with cdf , and be Order Statistic, then If , then is tolerance limits for of the probability for the distribution of

Properties

Joint PDF

Computation of

where

A probability that the tolerance interval contains of the probability

Link to original

Introduction to Hypothesis Testing

Hypothesis Testing

Definition

Types of Errors

is true is true
Not reeject type 2 error ()
Reject type 1 error ()

Let the set where is rejected be a rejection region and a statistic for testing be a test statistic, then the size is the maximum of probability of type 1 error a test of size is a power of the test is the probability of the type 2 error is

p-value

The probability of obtaining test results at least as extreme as the result of actually observed under

Link to original

Goodness-of-Fit Test for Multinomial Distribution

Definition

Let be a Multinomial Distribution and consider a null hypothesis . Then, the test Statistic, which follows Chi-squared Distribution, is defined as

Link to original

Homogeneity Test for Multinomial Distribution

Definition

Let and consider a null hypothesis . Then, the test Statistic, which follows Chi-squared Distribution, is defined as where

Link to original

Independence Test for Two Discrete Variables

Definition

Let category variables and and consider a null hypothesis are independent. Then, the test Statistic, which follows Chi-squared Distribution, is defined as where , , ,

Link to original

The Method of Monte Carlo

Inverse Transform Sampling

Definition

we can generate with CDF by using inverse function of CDF and uniform random variable

Facts

Let and be continuous CDF, then

Link to original

Box-Muller Transformation

Definition

Let , , and , then and

The method that generates pairs of i.i.d. normal distribution by using two uniform distributions or Exponential Distribution

Link to original

Accept Reject Generation Algorithm

Definition

Let be independent random variables, be a PDF which we want to sample, and step 1: Generate and . step 2: If otherwise, go to step 1. step 3. repeat step 1 and 2.

Link to original

Bootstrap Procedures

Bootstrapping

Definition

Let be a random variable with , be a Random Sample of , and be a point estimator of Then, the bootstrap sample is -dimensional sample vector drawn with replacement from the vector of original samples

Bootstrap Confidence Interval

Draw (large number) bootstrap samples and calculate confidence interval using the samples Calculate point estimators using the bootstrap samples , and define order statistics for the estimators. Then, confidence bootstrap confidence interval for is , where

Link to original

Maximum Likelihood Methods

Maximum Likelihood Estimation

Likelihood Function

Definition

Log-Likelihood Function

Link to original

Maximum Likelihood Estimation

Definition

MLE is the method of estimating the parameters of an assumed Distribution

Let be Random Sample with PDF , where , then the MLE of is estimated as

Regularity Conditions

  • R0: The pdfs are distinct, i.e.
  • R1: The pdfs have same supports
  • R2: The true value is an interior point in
  • R3: The pdf is twice differentiable with respect to
  • R4:
  • R5: The pdf is three times differentiable with respect to , , and interior point

Properties

Functional Invariance

If is the MLE for , then is the MLE of

Consistency

Under R0 ~ R2 Regularity Conditions, let be a true parameter, is differentiable with respect to , then has a solution such that

Asymptotic Normality

Under the R0 ~ R5 Regularity Conditions, let be Random Sample with PDF , where , be a consistent Sequence of solutions of MLE equation , and , then where is the Fisher Information.

By the asymptotic normality, the MLE estimator is asymptotically efficient under R0 ~ R5 Regularity Conditions

Asymptotic Confidence Interval

By the asymptotic normality of MLE, Thus, confidence interval of for is

Delta method for MLE Estimator

Under the R0 ~ R5 Regularity Conditions, let be a continuous function and , then

Facts

Under R0 and R1 regularity conditions, let be a true parameter, then

Link to original

Rao-Cramer Lower Bound and Efficiency

Bartlett Identities

Definition

First Bartlett Identity

where is a Likelihood Function and is a Score Function

Second Bartlett Identity

where is a Likelihood Function

Link to original

Score Function

Definition

The gradient of the log-likelihood function with respect to the parameter vector. The score indicates the steepness of the log-likelihood function

Facts

The score will vanish at a local Extremum

Link to original

Fisher Information

Definition

Fisher Information

I(\theta) &:= E\left[ \left( \frac{\partial \ln f(X|\theta)}{\partial \theta} \right)^{2} \right] = \int_{\mathbb{R}} \left( \frac{\partial \ln f(X|\theta)}{\partial \theta} \right)^{2} p(x, \theta)dx\\ &= -E\left[ \frac{\partial^{2} \ln f(X|\theta)}{\partial \theta^{2}} \right] = -\int_{\mathbb{R}} \frac{\partial^{2} \ln f(X|\theta)}{\partial \theta^{2}} p(x, \theta)dx\\ \end{aligned}$$ by the [[Bartlett Identities#second-bartlett-identity|second Bartlett identity]] $$I(\theta) = \operatorname{Var}\left( \frac{\partial \ln f(X|\theta)}{\partial \theta} \right) = \operatorname{Var}(s(\theta|x))$$ where $s(\theta|x)$ is a [[Score Function]] ## Fisher Information Matrix ![[Pasted image 20231224171415.png|800]] Let $\mathbf{X}$ be a [[Random Vector]] with [[Density Function|PDF]] $f(x|\boldsymbol{\theta})$, where $\boldsymbol{\theta} \in \Omega \subset R^{p}$, then the **Fisher information matrix** for on $\boldsymbol{\theta}$ is a $p \times p$ matrix defined as $$I(\boldsymbol{\theta}) := \operatorname{Cov}\left( \cfrac{\partial}{\partial \boldsymbol{\theta}} \ln f(x|\boldsymbol{\theta}) \right) = E\left[ \left( \cfrac{\partial}{\partial \boldsymbol{\theta}} \ln f(x|\boldsymbol{\theta}) \right) \left( \cfrac{\partial}{\partial \boldsymbol{\theta}} \ln f(x|\boldsymbol{\theta}) \right)^\intercal \right] = -E\left[ \cfrac{\partial^{2}}{\partial \boldsymbol{\theta} \partial \boldsymbol{\theta}^\intercal} \ln f(x|\boldsymbol{\theta}) \right]$$ and $jk$-th element of $I(\boldsymbol{\theta})$, $I_{jk} = - E\left[ \cfrac{\partial^{2}}{\partial \theta_{j} \partial\theta_{k}} \ln f(x|\boldsymbol{\theta}) \right]$ # Properties ## Chain Rule The information in length $n$ [[Random Sample]] $X_{1}, X_{2}, \dots, X_{n}$ is $n$ times the information in a single sample $X_{i}$ $I_\mathbf{X}(\theta) = n I_{X_{1}}(\theta)$ # Facts > In a location model, information is not dependent on a location parameter. > $$I(\theta) = \int_{-\infty}^{\infty}\left( \frac{f'(z)}{f(z)} \right)^{2} f(z)dz$$Link to original

Rao-Cramer Lower Bound

Definition

Let be Random Sample with PDF with the R0 ~ R4 regularity conditions and be a Statistic with , then

Facts

Let be an unbiased estimator of , then . Thus, by a Rao-Cramer Lower Bound,

Link to original

Efficiency

Definition

Efficient Estimator

where is a Fisher Information

An unbiased estimator is called an efficient estimator if its variance is the Rao-Cramer Lower Bound

Efficiency

Let be an unbiased estimator, then the efficiency of , is

Asymptotic Efficiency

Let be i.i.d. random variables with PDF , and be an estimator satisfying , then

  • the asymptotic efficiency of is
  • If , then is called asymptotically efficient
  • Assume that another estimator such that Then, the asymptotic relative efficiency (ARE) of to is

Examples

In a Laplace distribution, So, sample median is times more efficient than sample mean for the estimator of the location parameter.

In a Normal distribution, So, sample mean is times more efficient than sample median for the estimator of the location parameter.

Link to original

Newton's Method

Definition

An iterative algorithm for finding the roots of a differentiable function, which are solution to the equation

Algorithm

Find the next point such that the Taylor series of the given point is 0 Taylor first approximation: The point such that the Taylor series is 0:

multivariate version:

In convex optimization,

Find the minimum point^[its derivative is 0] of Taylor quadratic approximation. Taylor quadratic approximation: The derivative of the quadratic approximation: The minimum point of the quadratic approximation^[the point such that the derivative of the quadratic approximation is 0]: multivariate version:

Examples

Solution of a linear system

Solve with an MSE loss

The cost function is and its gradient and hessian are ,

Then, solution is If is invertible, is a Least Square solution.

Link to original

Fisher's Scoring Method

Definition

Fisher’s scoring method is a variation of Newton–Raphson method that uses Fisher Information instead of the Hessian Matrix

where is the Fisher Information.

Link to original

Maximum Likelihood Tests

Likelihood Ratio Test

Definition

Let be Random Sample with PDF where , , where . is called a likelihood ratio, where are MLE under and respectively. If is true, then will be close to , while if is not true, then the will be close to

Therefore, LRT rejects when , where the is determined by the level alpha condition.

By the Wilks’ Theorem, the likelihood ratio test Statistic for the hypothesis is given by

Visualization

The comparison of the Likelihood Ratio Test, Wald Test, and Score Test

Link to original

Wilks' Theorem

Definition

Single Parameter

Under R0 ~ R5 regularity conditions, let , where is the likelihood ratio, and the is the MLE of

Multiparameter

Under R0 ~ R5 regularity conditions, let , where is the likelihood ratio, and the is the MLE of without constraints.

Link to original

Wald Test

Definition

the Wald test Statistic for the hypothesis is given by

Link to original

Score Test

Definition

the Score test Statistic for the hypothesis is given by

Link to original

Multiparameter Case

Multiparametric Maximum Likelihood Estimation

Definition

Multiparameter MLE

Let be Random Sample with PDF , where , then the MLE of is estimated as

Properties

Multiparameter Asymptotic Normality

Under the R0 ~ R5 Regularity Conditions, let be Random Sample with PDF , where , then

  • has solution such that

Delta Method for Multiparameter MLE

Multiparameter Rao-Cramer Lower Bound

Let be Random Sample with PDF , be an unbiased estimator of , then . Thus, by a Rao-Cramer Lower Bound, where is -th diagonal element of

If holds, then is called an efficient estimator of

In a location-scale model, the information of the parameters do not depend on the locale parameter.

Link to original

Multiparametric Likelihood Ratio Test

Definition

Let be Random Sample from PDF , where , be the number of parameters, be the number of constraints (restrictions) , where is differentiable function , where the dimension of is .

Now, consider a test statistic to test the hypothesis If is true, then will be close to , while if is not true, then the will be close to Therefore, LRT rejects when , where the is determined by the level alpha condition.

By the Wilks’ Theorem, the likelihood ratio test Statistic for the hypothesis is given by

Link to original

EM Algorithm

Expectation-Maximization Algorithm

Definition

Let be an observed data, be an unobserved (latent) variable, are independent, be a joint pdf of , be a joint pdf of , be a conditional pdf of given

By the definition of a conditional pdf, we have the identity

The goal of the EM algorithm is maximizing the observed likelihood using the complete likelihood .

Using the definition conditional pdf, we derive the identity for an arbitrary but fixed

&= \int \ln[h(\mathbf{X}, \mathbf{Z} | \boldsymbol{\theta})] k(\mathbf{Z} | \mathbf{X}, \boldsymbol{\theta}_{0}) d \mathbf{Z} - \int \ln[k(\mathbf{Z} | \mathbf{X}, \boldsymbol{\theta})]k(\mathbf{Z} | \mathbf{X}, \boldsymbol{\theta}_{0})d\mathbf{Z}\\ &= E_{\boldsymbol{\theta}_{0}}[\ln L^{c}(\boldsymbol{\theta}|\mathbf{X}, \mathbf{Z}) | \boldsymbol{\theta}_{0}, \mathbf{X}] - E_{\boldsymbol{\theta}_{0}}[\ln k(\mathbf{Z} | \mathbf{X}, \boldsymbol{\theta}) | \boldsymbol{\theta}_{0}, \mathbf{X}] \end{aligned}$$ Let the first term of RHS be a quasi-likelihood function $$Q(\boldsymbol{\theta} | \boldsymbol{\theta}_{0}, \mathbf{X}) := E_{\boldsymbol{\theta}_{0}}[\ln L^{c}(\boldsymbol{\theta}|\mathbf{X}, \mathbf{Z}) | \boldsymbol{\theta}_{0}, \mathbf{X}]$$ EM algorithm maximizes $Q(\boldsymbol{\theta} | \boldsymbol{\theta}_{0}, \mathbf{X})$ instead of maximizing $\ln L(\boldsymbol{\theta}|\mathbf{X})$ # Algorithm 1. Expectation Step: Compute $$Q(\boldsymbol{\theta} | \hat{\boldsymbol{\theta}}^{(m)}, \mathbf{X}) := E_{\hat{\boldsymbol{\theta}}^{(m)}}[\ln L^{c}(\boldsymbol{\theta}|\mathbf{X}, \mathbf{Z}) | \hat{\boldsymbol{\theta}}_{m}, \mathbf{X}]$$ where the $m = 0, 1, \dots$, and the expectation is taken under the conditional pdf $k(\mathbf{Z} | \mathbf{X}, \hat{\boldsymbol{\theta}}^{(m)})$ 2. Maximization Step: $$\hat{\boldsymbol{\theta}}^{(m+1)} = \underset{\boldsymbol{\theta}}{\operatorname{arg max}} Q(\boldsymbol{\theta} | \hat{\boldsymbol{\theta}}^{(m)}, \mathbf{X})$$ # Properties ## Convergence The [[Sequence]] of estimates $\hat{\boldsymbol{\theta}}^{(m)}$ satisfies $$L(\hat{\boldsymbol{\theta}}^{(m+1)}|\mathbf{X}) \leq L(\hat{\boldsymbol{\theta}}^{(m)}|\mathbf{X})$$ Therefore the [[Sequence]] of EM estimates converge to (at least local) optimalLink to original

Sufficiency

Decision Function

Definition

Let be Random Sample with PDF , where , be a Statistic for the parameter , and be a function of the observed value of the statistic , then The function is called a decision function or a decision rule.

Link to original

Loss Function

Definition

Let be a parameter, be a Statistic for the parameter , and be a Decision Function

A loss function is a non-negative function defined as

It indicates the difference or discrepancy between and

Examples

Link to original

Risk Function

Definition

Risk function is an expectation of Loss Function

Link to original

Minimum Variance Unbiased Estimator

Definition

An estimator satisfying the following is the minimum variance unbiased estimator (MVUE) for where is an unbiased estimator

An Unbiased Estimator that has lower variance than any other unbiased estimator for the parameter

Facts

A minimum variance unbiased estimator does not always exist.

If some unbiased estimator’s variance is equal to the Rao-Cramer Lower Bound for all , then it is a minimum variance unbiased estimator.

Link to original

Minimax Estimator

Definition

An estimator satisfying the following is he minimax estimator of

Link to original

A Sufficient Statistic for a Parameter

Sufficient Statistic

Definition

Let be a Random Sample with PDF , where , and be a Statistic with PDF . The is a sufficient statistic for if and only of

No other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter.

Facts

Any one-to-one transformation of a sufficient statistic is also sufficient statistic

MLE is a function of sufficient statistic

Let be a Random Sample with PDF , where , be a Statistic with PDF , and be a unique MLE of , then is a function of

Link to original

Neyman's Factorization Theorem

Definition

Let be a Random Sample with PDF , where , then The statistic is a Sufficient Statistic for if and only if where are non-negative functions, and does not depend on

Link to original

Rao-Blackwell Theorem

Definition

Let be a Random Sample with PDF , where , be a Sufficient Statistic for , be a Unbiased Estimator for . And define , then is an Unbiased Estimator, and it has a lower variance than

Link to original

Completeness and Uniqueness

Complete Statistic

Definition

Let be a Random Variable with PDF , where . A complete statistic for satisfies the condition then, is called a complete family.

Link to original

Uniformly Minimum Variance Unbiased Estimator

Definition

where is an unbiased estimator, and is the parameter.

An Unbiased Estimator that has lower variance than any other unbiased estimator for all possible values of the parameter

An MVUE for all possible values of the parameters

Link to original

Lehmann-Scheffe Theorem

Definition

Let be a Random Sample with PDF , where , be a complete sufficient Statistic for , then If , then is the unique UMVUE of

Facts

Let be a complete sufficient Statistic for , be a Unbiased Estimator of

Link to original

The Exponential Class of Distribution

Regular Exponential Class

Definition

A PDF of form is said to be a member of the regular exponential class if

  • , support of , does not depend on
  • is a non-trivial (constant) continuous function of
  • If is a continuous random variable, then and is a continuous function of
  • If is a discrete random variable, then is non-trivial function of

Properties

Joint PDF

Let be a Random Sample from a regular exponential class, then the joint pdf of is

Monotone Likelihood Ratio

Let be a Random Sample from a regular exponential class If is monotone function of , then likelihood ratio has Monotone Likelihood Ratio property in

Facts

Let be a Random Sample from a regular exponential class, and be a Statistic, then

  • The pdf of is where is a function of only.
  • is a complete sufficient Statistic
Link to original

Function of a Parameter

Multiparametric Sufficiency

Definition

Joint Sufficient Statistic

Let be a Random Sample with PDF , where , , where be a Statistic, and be a pdf of . The is jointly sufficient for if and only where does not depend on

if and only if where does not depend on

Completeness

Let be a family of pdfs of random variables If , then the family of pdfs is called complete family, and are complete statistics for

m-Dimensioanl Regular Exponential Class

Let be a Random Variable with PDF , where , and be a support of the . If can be written as then, it is called -dimensional exponential class. Further, it is called a regular case if the following are satisfied

  • , support of , does not depend on
  • contains m-dimensional open rectangle
  • are functionally independent and continuous function of
  • If is a continuous random variable, then and are continuous function
  • If is a discrete random variable, then are non-trivial function of

Let be a Random Sample from a m-dimensional regular exponential class, then the joint pdf of is

, where , is a joint complete sufficient Statistic for

the joint pdf of is , where does not depend on

k-Dimensional Random Vector with p-Dimensional Parameters

Exponential Class

Let be a -dimensional random vector with pdf , where

Link to original

Minimum Sufficiency and Ancillary Statistic

Minimal Sufficient Statistic

Definition

A Sufficient Statistic is called minimal sufficient statistic (MSS) if it has the minimum dimension among all sufficient statistics

Facts

complete sufficient Statistic minimal sufficient statistic

Link to original

Ancillary Statistic

Definition

Let be a Random Sample with PDF , where is called ancillary statistic to if its distribution does not depend on the

Link to original

Location Parameter

Definition

Let , where , be Random Sample from a pdf If , then is called location parameter

Link to original

Location Invariant Statistic

Definition

Let be a Random Sample from a location model, and be a statistic such that , then . So, the distribution of does not depend on . In this case, is Ancillary Statistic to , and it is called location-invariant statistic.

Link to original

Scale Parameter

Definition

Let , where , be Random Sample from a pdf If , then is called scale parameter

Link to original

Scale-Invariant Statistic

Definition

Let be a Random Sample from a scale model, and be a statistic such that , then . So, the distribution of does not depend on . In this case, is Ancillary Statistic to , and it is called scale-invariant statistic.

Link to original

Location-Scale Family

Definition

Let , where , be Random Sample from a pdf If , then is called location parameter and is called scale parameter

Link to original

Location- and Scale-Invariant Statistic

Definition

Let be a Random Sample from a location-scale model, and be a statistic such that , then . So, the distribution of does not depend on . In this case, is Ancillary Statistic to , and it is called location- and scale-invariant statistic.

Link to original

Sufficiency, Completeness, and Independence

Basu's Theorem

Definition

Let be a Random Sample with PDF , where , be a complete sufficient Statistic for , and be a Ancillary Statistic to , then UMP and are independent.

Link to original

Optimal Tests of Hypothesis

Most Powerful Test

Best Critical Region

Definition

Let be a subset of sample space. is a best critical region of size for testing the simple hypothesis test if

  • (level alpha condition)
  • (most powerful)
Link to original

Most Powerful Test

Definition

A test with the Best Critical Region is called most powerful (MP) test

Link to original

Neyman-Pearson Theorem

Definition

The Neyman-Pearson theorem states that the Most Powerful Test for choosing between two simple hypotheses is based on the likelihood ratio.

Let be random samples with PDF , where , then the Likelihood Function of is defined as Consider the simple null hypothesis against the simple alternative hypothesis and define the likelihood ratio test statistic.

Let , and be a subset of the sample space such that

  • where

Then, is a Best Critical Region of size for the hypothesis test.

Link to original

Uniformly Most Powerful Tests

Uniformly Most Powerful Critical Region

Definition

If is a Best Critical Region of size for every in , then is called a uniformly most powerful critical region

Link to original

Uniformly Most Powerful Test

Definition

A test defined by Uniformly Most Powerful Critical Region is called a uniformly most powerful (UMP) test

Simple vs Composite

Consider is a simple hypothesis and is a composite hypothesis. Fix some satisfying and obtain a most powerful test using Neyman-Pearson Theorem and generalize it to all satisfying conditions

Composite vs Composite

Karlin-Rubin Theorem

Definition

Consider the hypothesis and , and assume that the likelihood ratio has MLR property in some Sufficient Statistic If the likelihood ratio is decreasing (increasing) in the statistic Then the Uniformly Most Powerful Test of level for the hypothesis is rejecting if , where is determined by

Link to original

Link to original

Monotone Likelihood Ratio

Definition

has a monotone likelihood ratio (MLR) property in If is a monotone function of

Link to original

Karlin-Rubin Theorem

Definition

Consider the hypothesis and , and assume that the likelihood ratio has MLR property in some Sufficient Statistic If the likelihood ratio is decreasing (increasing) in the statistic Then the Uniformly Most Powerful Test of level for the hypothesis is rejecting if , where is determined by

Link to original

The Sequential Probability Ratio Test

Sequential Probability Ratio Test

Definition

Let be Random Sample with PDF , where , and be the Likelihood Function Consider a simple test We observe likelihood ratio sequentially i.e. , and reject if and only if , where , and do not reject if and only if , where

i.e. we continue to observe as long as , and stop otherwise.

Choice of and

Let be type 1 and 2 error, respectively, then Hence, Therefore, By the inequality, we choose

Link to original

Minimax and Classification Procedures

Minimax Procedures

Definition

We want to test Let be a Loss Function such that and , be a rejection region, be an acceptance region, and be a Risk Function Then, The minimax procedure is to find such that is minimized

The minimax solution is the region , where satisfies

Link to original

Inferences About Normal Linear Models

Quadratic Form

Definition

where is a Symmetric Matrix

A mapping where is a Module on Commutative Ring that has the following properties.

Matrix Expressions

Facts

Let be a Random Vector and be a symmetric matrix of constants. If and , then the expectation of the quadratic form is

Let be a Random Vector and be a symmetric matrix of constants. If and , , and , where , then the variance of the quadratic form is where is the column vector of diagonal elements of .

If and ‘s are independent, then If and ‘s are independent, then

Let , , where is a Symmetric Matrix and , then the MGF of is where ‘s are non-zero eigenvalue of

Let , where is Positive-Definite Matrix, then

Let , , where is a Symmetric Matrix and , then where

Let , , where are symmetric matrices, then are independent if and only if

Let , where are quadratic forms in Random Sample from If and is non-negative, then

  • are independent

Let , , where , where , then

Link to original

One-way ANOVA

One-way ANOVA

One-way ANOVA is used to analyze the significance of differences of means between groups. Let an -th response of the -th group be where

We want to test the null hypothesis i.e. there is no treatment effect. where is the total number of observations, represents the mean of the -th group and represents the overall mean, of the numerator indicates between-group variance (SSB) and of the denominator indicates within-group variance (SSW)

The Likelihood Ratio Test rejects if

Link to original

Noncentral Chi-squared and F distributions

Noncentral Chi-squared Distribution

Definition

Let be independent normal distributions , then where is degree of freedom, and is non-centrality parameter.

Properties

PDF

MGF

Link to original

Noncentral F-Distribution

Definition

Let be independent random variables following Noncentral Chi-squared Distribution and Chi-squared Distribution, respectively, then where are degree of freedoms and is non-centrality parameter.

Properties

PDF

Link to original

Multiple Comparisons

Multiple Comparisons

Definition

Assume that is rejected, and we interested in the confidence interval for some linear combination of parameters

Let ‘s be random samples from We want to compute the confidence interval of , where

Type 1 Error

Consider multiple testing problem , and let be an event of rejecting even though is true (type 1 error). If a test satisfies for level , then the test is called to be controlled at Under the independence of each test, the probability of type 1 error in multiple testing is Thus, the probability of type 1 error approaches to as increase.

Fixed Constants Case

The ‘s are fixed constants. The confidence interval for each is where

Scheffe’s Simultaneous Confidence Interval

The ‘s are allowed to have any real numbers. The simultaneous confidence interval for all is where

Link to original

The Analysis of Variance

Two-way ANOVA

Definition

Two-way ANOVA Without Replications

Let ‘s be Random Sample from . The can be decomposed as sum of means (global mean -th level effect of factor -th level effect of factor ). In this setup, we assume that

We want to test . where , and

The Likelihood Ratio Test rejects if

Under , the follows Noncentral F-Distribution where

Therefore, the power of the test is

Two-way ANOVA With Equal Replications

Let ‘s be Random Sample from . The can be decomposed as sum of means (global mean + -th level effect of factor + -th level effect of factor + interaction effect of -th level of factor and the -th level of factor ). In this setup, we assume that

We want to test . where , , and

The Likelihood Ratio Test rejects if

Under , the follows Noncentral F-Distribution where

Therefore, the power of the test is

If is not rejected, then we continue to test or

Two-way ANOVA with a Regression Model

where and are dummy variables representing categories of the two factors, is the number of categories for the first factor, and is the number of categories for the second factor

In this setting, a corner-point constraint is used for both factors.

The null hypotheses can be tested by Deviance

Deviance Test for Two-way ANOVA

For a two-way ANOVA with factors and , we have three null hypotheses to test:

  • i.e. there is no treatment effect of factor
  • i.e. there is no treatment effect of factor
  • i.e. there is no interaction effect between factor and They can be tested with the Deviance.

If is known, the test statistic for is defined as And reject the if

If is unknown, the test statistic for is defined as And reject the if

If is known, the test statistic for is defined as And reject the if

If is unknown, the test statistic for is defined as And reject the if

If is known, the test statistic for is defined as And reject the if

If is unknown, the test statistic for is defined as And reject the if

Link to original

Nonparametric Statistics

Functional (Statistics)

Definition

is called a functional if is a function of CDF or PDF

Let be Random Sample from a cdf and be functional, then the empirical Distribution Function of at is , and is called an induced estimator of .

Link to original

Location Model

Definition

Location Functional

Let be continuous Random Sample with CDF . If satisfies the following, then it is called location functional

Location Model

Let be a location functional, then is called to follow a location model with if

  • , where ‘s are i.i.d. with PDF and

Facts

Mean functional is a locational functional

Let be Random Variable with CDF and PDF which is symmetric about . If is a location functional, then

Link to original

Sample Median and Sign Test

Sign Test

Definition

Let , where ‘s are i.i.d. with CDF and PDF with median . i.e. median of ‘s are .

We want to test , which is called sign test Consider a statistic , which is called sign statistic We reject if .

Distribution of Sign Statistic

under . Therefore, . Also, by CLT,

Consider a sequence of local alternatives The efficacy of the sign statistic is

Composite Sign Test

Consider a composite hypothesis Since the power function is non-decreasing, the sign test of size is

Facts

Consider a test and the test statistic , then Also, the power function of the test is a non-decreasing function of

Consider a sequence of hypothesis , where is called a local alternative, then where is a CDF of standard normal distribution, and

Let be random variables from location model, where ‘s are i.i.d. with CDF , PDF , and median 0, and be the sample median of , then where

Link to original

Efficacy of Sign Test

Definition

Let be a test statistic and , then the efficacy of is defined as where can be interpreted as the rate of change of the mean of

Link to original

Asymptotic Relative Efficiency

Definition

Let be two test statistics, then the asymptotic relative efficiency of to is the ratio between those efficacies

Link to original

Signed-Rank Wilcoxon

Signed-Rank Wilcoxon Test

Definition

Let be Random Sample from cdf , symmetric pdf , and median . Consider a Sign Test

The signed-rank Wilcoxon test statistic for the test is defined as where is the rank of , among

We reject if , where is determined by the level

takes one of . So, we can rewrite the statistic as where is an observation such that

Let be the sum of the ranks of positive ‘s, then So.

can be written as where are called the Walsh averages

In general, if the median of is , then

Consider a sequence of local alternatives The efficacy of the modified signed-rank Wilcoxon test statistic is

Facts

Assume that is symmetric, then under , are independent of

Let be Random Sample from cdf , symmetric pdf , and median , and , then under

  • is distribution free

Consider a sequence of hypothesis , then where is a CDF of standard normal distribution, and

The estimator , which can be a solution of the equation , is called Hodges-Lehmann estimator. Hodges-Lehmann estimator is an estimator of median of the Warsh average .

Let be random variables from location model, where ‘s are i.i.d. with CDF , symmetric PDF , and median 0, then where

Link to original

Mann-Whitney-Wilcoxon Procedure

Mann-Whitney-Wilcoxon Procedure

Definition

Let be Random Sample from a distribution with cdf , and be Random Sample from a distribution with cdf Consider a test

The Mann-Whitney-Wilcoxon test statistic for the test is defined as where is the rank of in the combined (pooled) sample of size

We reject when , where is defined by the level

Therefore, the Mann-Whitney-Wilcoxon test statistic is decomposed as where

So, We reject when

Also, the power function of the test is defined as

Consider a sequence of local alternatives , and assume that , where The efficacy of the modified Mann-Whitney-Wilcoxon test statistic is

Facts

Let be Random Sample from a distribution with cdf , be Random Sample from a distribution with cdf , and then under

  • is distribution free

Consider a sequence of hypothesis , then where is a CDF of standard normal distribution, and

Link to original

Measures of Association

Kendall's Tau

Definition

Let be a Random Sample from a bivariate distribution with cdf We want to test are independent

Let be independent pairs with the same bivariate distribution. If , then we say these pairs are concordant If , then pairs are discordant

Kendall’s is defined as

is bounded by . If are independent, then

Consider another statistic

, so it is an unbiased estimator of

Facts

Let be a Random Sample from a bivariate distribution with cdf , and , then under are independent

  • is distribution free with a symmetric pdf
Link to original

Spearman's Rho

Definition

Let be a Random Sample from a bivariate distribution with cdf

Instead of a sample Pearson Correlation Coefficient, we define another statistic using the ranks of samples which is called Spearman’s

is bounded by . If are independent, then

Facts

Let be a Random Sample from a bivariate distribution with cdf , and , then under are independent

  • is distribution free with a symmetric pdf
Link to original