Convex Function

Definition

If is differentiable on , then

  • is convex
  • is convex
Link to original

Convex Set

Definition

A set of points is convex if it contains every line segment between two points in the set.

Link to original

Feasible Region

Definition

The set of all possible points on an optimization problem that satisfy the problem’s constraints.

Link to original

Convex Optimization

Definition

subject to

An optimization problem in which the objective function is a Convex Function and the feasible set is a Convex Set.

Facts

Every local minimum is a global minimum

The convex feasible set condition is equivalent to the following conditions

  • the inequality constraint function is a Convex Function
  • the equality constraint function is an affine function
Link to original

Gradient Descent

Definition

An iterative optimization algorithm for finding a local minimum of a differentiable function

where is a learning rate

Examples

Solution of a linear system

Solve with an MSE loss

The cost function is and its gradient is

Then, solution is

Link to original

Newton's Method

Definition

An iterative algorithm for finding the roots of a differentiable function, which are solution to the equation

Algorithm

Find the next point such that the Taylor series of the given point is 0 Taylor first approximation: The point such that the Taylor series is 0:

multivariate version:

In convex optimization,

Find the minimum point^[its derivative is 0] of Taylor quadratic approximation. Taylor quadratic approximation: The derivative of the quadratic approximation: The minimum point of the quadratic approximation^[the point such that the derivative of the quadratic approximation is 0]: multivariate version:

Examples

Solution of a linear system

Solve with an MSE loss

The cost function is and its gradient and hessian are ,

Then, solution is If is invertible, is a Least Square solution.

Link to original

Quasi-Newton Method

Definition

Secant equation

Methods

Facts

DFP and BFGS is dual problem

Link to original

Broyden's Method

Definition

Solve using Newton’s Method with approximation of Jacobian Matrix

Algorithm

The essence of Broyden’s method is to approximate Jacobian Matrix of Newton’s Method using the average rate of change

Replace derivative or gradient with an approximation Then,

Define

Then,

We want to find satisfying the following expression by above approximation

To ensure stability of update, take a minimum modification In other words, solve the minimization problem using method of Lagrange multipliers subject to

Then, the solution is and is calculated by

Now updating formula become

Where can be easily calculated by Sherman–Morrison Formula

and

Link to original

Symmetric Rank-One Method

Symmetric Rank-One Method

The Quasi-Newton Method method with a symmetric condition

Algorithm

Define

Then,

To assure the symmetry of the approximation of hessian matrix, let the perturbation matrix be

Then,

Now we can express and

Since satisfying is unique(because is the scaling of ), we can get by multiplying to the both side

by substituting to original expression,

Therefore, Since the denominator is a scalar and

Link to original

Broyden–Fletcher–Goldfarb–Shanno Algorithm

Definition

The quasi-Newton method with symmetric and positive-definite condition Update inverse of hessian directly

Algorithm

Define

Then,

We want to find satisfying the following expression with the symmetric and positive-definite condition by the definition

So, Solve the minimization problem with the symmetric and positive-definite conditions using method of Lagrange multipliers subject to and where

Then, the solution is where by the definition

Therefore, updating formula is

And by the Sherman–Morrison Formula, and

Link to original

Davidon–Fletcher–Powell Formula

Definition

The quasi-Newton method with symmetric and positive-definite condition Update hessian and find inverse using Sherman–Morrison Formula

Algorithm

Define

Then,

We want to find satisfying the following expression with the symmetric and positive-definite condition by the definition

So, solve the minimization problem with the symmetric and positive-definite conditions using method of Lagrange multipliers subject to and where

Then, the solution is where by the definition Therefore, updating formula is

And by the Sherman–Morrison Formula, and

Link to original

Broyden Family Method

Definition

Broyden family update is obtained from combining the BFGS and the DFP method

Algorithm

where

Facts

If the of Broyden Family is a , then the update is equal to Symmetric Rank-One Method

Link to original

Method of Lagrange Multipliers

Definition

The method of Lagrange multipliers is a method for finding the local maxima and minima of a function subject to equality constraints

Solution

Given general problem

In order to find the maximum or minimum of a function subject to the equality constraint , find the stationary point of considered as a function of and the Lagrange multiplier . In other words, all partial derivatives should be zero. The solution of the constrained optimization problem is always a Saddle Point of the Lagrangian function .

Facts

The gradient of constraint and the gradient of the function should be in same or directions.

Proof

If and don’t have same or directions, then there is the direction decreasing subject to

If and have the same or opposite direction, then there is no direction decreasing subject to

Link to original

Infeasible Start Newton's Method

Definition

Algorithm

Solve the minimization problem with the equality condition subject to

Use a Newton’s Method to find the optimal point satisfying above conditions

The quadratic Taylor approximation of at the point is

The minimization problem is subject to

The Lagrangian function is where is a Lagrangian multiplier

The optimality condition is

In a matrix form,

we solve this linear equation to find the Newton direction

Link to original

Backtracking Line Search

Definition

The method to determine the amount to move along a given search direction (learning rate).

Algorithm

We want to find a learning rate minimizing the cost function

The optimal can be obtained by using grid search in the interval However, it causes substantial computational costs.

Identify a value of that provides a reasonable amount of improvement in the objective function, rather than to find the actual minimizing value

The backtracking line search starts with a large and iteratively shrinks it until the value is enough to provide a decrease in the objective function.

Calculate the gradient of objective function with respect to learning rate at the point

and define a parameter which modify the slope of

Now, the line is used to check whether the decrease in the objective function is enough

Start from large usually , and iteratively shrinks it by multiplying to until the value is lower than the line

If the condition is satisfied, then use as a learning rate

Link to original

Duality

Definition

Optimization problems may be viewed from either the primal problem or the dual problem.

Properties

Consider the following optimization problem

The original problem is called a Primal problem subject to Let a be the domain of the problem and the be its feasible set

The Lagrangian function corresponding to the primal problem is called a Lagrangian primal function where the variable is called a Primal variable, and are called Dual variables

Since by the constraints, the Lagrangian primal function is always lower than or equal to the for all in the feasible set

Let a primal optimal value be the feasible optimal value of the primal problem

Let a Lagrange dual function be the minimum of the Lagrangian primal function for all in the domain

We say another optimization problem for the Lagrange dual function a Dual problem subject to

Since the Lagrange dual function minimize the Lagrangian primal function , it is always lower than or equal to the primal optimal value that minimizes that

Let a dual optimal value be the maximum of the dual problem

Then, by the above inequality, become the lower bound of This property is called a Weak Duality

and the difference between the dual optimal value and the primal optimal value is called a Duality Gap

If the dual optimal value and the primal optimal value are the same, then this property is called a Strong Duality

Solution for the Problem Satisfying Strong Duality

Find the Lagrange dual function by using the gradient condition find the dual optimal variables from the dual problem by using the gradient condition obtain the primal optimal variable using the dual optimal variables

Visual Explanation of Duality

Consider an optimization problem subject to

Let a set where

And let be the primal optimal solution

By definition of Lagrange dual function, where

Consider plane Where the feasible region is of

The optimal value is given by the tangent horizontal line that indicates the minimum value when

Since , the line always has negative slope. So, for a given , the line provides the lower bound of the objective function

The figure shows the hyperplanes for different values We see that gives the best lower bound , also see that Strong Duality does not hold because

Optimality Property

If we have feasible solutions to the primal and dual problems such that their respective objective functions are equal, then these solutions are optimal to their respective problems

Proof Let a prime variable be and dual variables be satisfying

Since the primal optimal minimize , the value with the prime variable always greater than or equal to the

And by the assumption, the first term and the last term of the inequality become the same.

Then, by the equality , the dual variable are the dual optimal, and by the equality , the primal variable is the primal optimal And the two are the same.

Therefore, values are optimal.

Facts

If the primal is a minimization problem, then the dual is a maximization problem, and vice versa

The solution to the primal is an upper bound to the solution of the dual, and the solution of the dual is a lower bound to the solution of the primal, and vice versa

For the minimization primal problem, the dual optimal is a lower bound of the primal optimal

Link to original

Duality Gap

Definition

The difference between the primal and dual solutions

Link to original

Weak Duality

Definition

For a minimization problem, The solution to the primal problem is always greater than or equal to the solution to the dual problem

The Duality Gap of an optimization problem is always greater than or equal to zero

Link to original

Strong Duality

Definition

The primal optimal objective and the dual optimal objective are equal. The Duality Gap of an optimization problem is zero.

Facts

If a strong duality holds, the dual optimal primal optimal

Link to original

Slater's Condition

Definition

For a convex problem, there exists a point such that and

Link to original

Karush-Kun-Tucker Conditions

Definition

By allowing inequality constraints, generalizes the method of Lagrange multiplier, which allows only equality constraints.

Given general problem subject to

The KKT conditions are

  • Stationarity:
  • Primal feasibility:
  • Dual feasibility:
  • Complementary slackness:

Facts

The gradient of inequality constraint and the gradient of the function should be in opposite directions.

If and don’t have opposite directions, then there is the area(direction) decreasing both and

For a problem with Strong Duality, are optimal satisfy the KKT conditions

For a convex optimization, satisfy the KKT condition Strong Duality holds

Proof Since are all convex function, the Lagrangian primal function is also convex

Let be the values satisfy KKT conditions. By the stationarity condition, So, is a local minimum of and by the convexity, it is also global minimum Therefore, ^[Lagrange dual function]

Also, by the complementary slackness and primal feasibility conditions, So, Therefore, the dual optimal value and the primal optimal value is the same. In other words, satisfy the Strong Duality condition

are optimal

For a problem with Slater’s Condition (ensure Strong Duality and convexity), and satisfy the KKT condition are optimal

Proof Let be the primal and dual solutions

Since are the dual optimal, the dual optimal value can be expressed as an expression in regard to where

Since is a feasible value, the with , is always greater than or equal to the with and by the constraints of Lagrangian primal function, and So,

In summary

Since Slater’s Condition ensures Strong Duality , the inequality become the equality Therefore, , so the complementary slackness condition is satisfied Also, , so the stationary condition is satisfied Finally, since are the primal and dual solutions, the constraints of Lagrangian primal and dual function, satisfy the primal and dual feasibility condition

Link to original

Barrier Method

Definition

An interior point optimization technique used to solve constrained optimization problems. It involves adding a barrier or penalty term to the objective function that penalizes solutions for being outside the feasible region.

Algorithm

Consider the optimization problem with an inequality constraint

subject to

The KKT Conditions of the problem are the following

  • ^[Stationarity]
  • ^[Primal feasibility]
  • ^[Dual feasibility]
  • ^[Complementary slackness]

Interior point methods solve the problem whose complementary slackness condition is relaxed where

the stationary condition become where is called a logarithmic barrier function

As large, the logarithmic barrier function functions like an indicator function.

Now, by the definition of log function, should be negative or zero, so the primal feasibility condition holds. Also, by the relaxed complementary slackness conditions, is positive or zero, so the dual feasibility conditions holds

Now the conditions have to be satisfied are the following

and the problem with inequality become the problem with equality subject to

It can be solved using Newton’s Method

Newton’s Method’s approximation becomes unstable if is too small and the current point is far from optimal. So, we initially choose large and iteratively shrink it.

Analytic Central Point of Feasible Region

If is very large, then the optimization problem become the following. subject to

Since is a monotonic increasing function, the problem can be transformed as subject to

So, it finds the analytic central point of feasible region (gravity of the feasible region)

Phase 1 method

In the iteration, we have to find the initial points satisfying constraints satisfying

To ensure the condition, transform the optimization problem only at the initial stage. subject to

Now the problem can be solved using Infeasible Start Newton’s Method

Link to original

Primal-Dual Interior-Point Method

Definition

An interior point optimization technique used to solve constrained optimization problems. Converging and accuracy are faster than Barrier Method

Algorithm

Consider the optimization problem with an inequality constraint

subject to

The KKT Conditions of the problem are the following

  • ^[Stationarity]
  • ^[Primal feasibility]
  • ^[Dual feasibility]
  • ^[Complementary slackness]

Interior point methods solve the problem whose complementary slackness condition is relaxed where

Instead of substituting in the stationary condition with like the Barrier Method, Primal-dual interior method directly solve the problem with Newton’s Method

Now, the conditions have to be satisfied are the following

and the problem with inequality become the problem with equality subject to

It can be solved using Newton’s Method

Link to original