Definition

Hard-Margin Support Vector Machine

Assume that we have a learning set $L = {(x_{i}, y_{i}) ∣ i = 1, 2, \dots, n}, x_{i} \in R^{p}, y_{i} \in {- 1, + 1}$ Suppose given two classes of data can separate by a Hyperplane without error. The hyperplane is called a separating hyperplane (SH)

Define $d_{-} := i min dist (SH, x_{i}), x_{i} \in {x_{i} ∣ y_{i} = - 1}$ , $d_{+} := i min dist (SH, x_{i}), x_{i} \in {x_{i} ∣ y_{i} = + 1}$ , and the margin of SH $d := d_{-} + d_{+}$ A SH which maximizes its margin ( $d$ ) is called an optimal separating hyperplane (OSH). To find OSH, set linear constraints $β_{0}$ and $β$ satisfy ${β_{0} + β^{⊺} x_{i} \geq + 1 β_{0} + β^{⊺} x_{i} \leq - 1 y_{i} = + 1 y_{i} = - 1 \Leftrightarrow y_{i} (β_{0} + β^{⊺} x_{i}) \geq + 1, i = 1, \dots, n$ where the minimum distance $1$ is arbitrary and may differ.

Let $H_{+} : β_{0} + β^{⊺} x_{i} = + 1$ and $H_{-} : β_{0} + β^{⊺} x_{i} = - 1$ . Then the points lying either on $H_{+}$ or $H_{-}$ are called support vectors. If $x_{+} \in H_{+}$ and $x_{-} \in H_{-}$ , then $d_{+} = d_{-} = \frac{1}{∣∣ β ∣∣}$ and $d = \frac{2}{∣∣ β ∣∣}$

Therefore, the OHS can be obtained by a Convex Optimization problem. It can be solved by Method of Lagrange Multipliers $L_{P} (β, β_{0}, α) = \frac{1}{2} ∣∣ β ∣ ∣^{2} - \sum_{i = 1}^{n} α_{i} [y_{i} (β_{0} + β^{⊺} x_{i}) - 1]$ with $0 \leq α$

By Duality of optimization problem, we have the dual optimization problem is defined as $max_{α} L_{D} (α) subject to 0 \leq α, α^{⊺} y = 0$ where $H = (y_{i} y_{j} x_{i}^{⊺} x_{j})$

And the dual Lagrangian function is defined as $L_{D} (α) = \sum_{i = 1}^{n} α_{i} - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} y_{i} y_{j} x_{i}^{⊺} x_{j} = 1_{n}^{⊺} α - \frac{1}{2} α^{⊺} H α$

The primal optimization problem is convex and satisfies KKT Conditions. Thus, holds Strong Duality and the solution of the dual problem is the same as the primal problem.

The optimal parameter is obtained as $\hat{β} = i \in sv \sum \overset{α}{^}_{i} y_{i} x_{i}, \hat{β}_{0} = \frac{1}{∣ sv ∣} i \in sv \sum (\frac{1 - y _{i} x _{i}^{⊺} β ^}{y _{i}})$ where $sv$ is an index set of support vectors

The optimal hyperplane can be written as $\hat{f} (x) = \hat{β}_{0} + \hat{β}^{⊺} x$ and the classification rule is given by $C (x) = sign (\hat{f} (x))$

Soft-Margin Support Vector Machine

$min \frac{1}{2} ∣∣ β ∣ ∣^{2} + C i = 1 \sum n ξ_{i} subject to y_{i} (β_{0} + β^{⊺} x_{i}) \geq + 1 - ξ_{i}, ξ_{i} \geq 0, i = 1, \dots, n$ where $C$ is the regularization parameter, and $ξ$ is called a slack variable. If $ξ_{i} = 0$ , the point is out of margin. On the other hand, if $ξ_{i} > 0$ , then the point is within the margin.

The Lagrangian primal function is defined as $L_{P} (β, β_{0}, α, η) = \frac{1}{2} ∣∣ β ∣ ∣^{2} + C i = 1 \sum n ξ_{i} - \sum_{i = 1}^{n} α_{i} [y_{i} (β_{0} + β^{⊺} x_{i}) - (1 - ξ_{i})] - \sum_{i = 1}^{n} η_{i} ξ_{i}$ with $0 \leq α$ and $0 \leq η$ .

And the dual function is defined as $L_{D} (α) = \sum_{i = 1}^{n} α_{i} - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} y_{i} y_{j} x_{i}^{⊺} x_{j} = 1_{n}^{⊺} α - \frac{1}{2} α^{⊺} H α$ with $α^{⊺} y = 0$ and $0 \leq α \leq C 1_{n}$ where $H = (y_{i} y_{j} x_{i}^{⊺} x_{j})$

and the dual optimization problem is defined as $max_{α} L_{D} (α) subject to α^{⊺} y = 0, 0 \leq α \leq C 1_{n}$ The primal optimization problem is convex and satisfies KKT Conditions. Thus, holds Strong Duality and the solution of the dual problem is the same as the primal problem.

Non-Linear Support Vector Machine

Non-linear SVM finds an optimal separating Hyperplane in high-dimensional feature space $H$ . It accomplished by the kernel trick. The kernel trick is that instead of computing inner products in $H$ , compute them using a non-linear Kernel Function in input space.

Hard-Margin Non-Linear Support Vector Machine

If the data can be separated in $H$ , then the dual optimization problem is defined as $max_{α} (1_{n}^{⊺} α - \frac{1}{2} α^{⊺} H α) subject to 0 \leq α, α^{⊺} y = 0$ where $H = (y_{i} y_{j} K (x_{i}, x_{j}))$

The optimal separating Hyperplane in the $H$ is $\hat{f} (x) = \hat{β}_{0} + i \in sv \sum \overset{α}{^}_{i} y_{i} K (x_{i}, x_{j})$

and the decision rule is defined as $C (x) = sign (\hat{f} (x))$

Soft-Margin Non-Linear Support Vector Machine

In the non-separable case, the dual problem is defined as $max_{α} (1_{n}^{⊺} α - \frac{1}{2} α^{⊺} H α) subject to 0 \leq α \leq C 1_{n}, α^{⊺} y = 0$ where $H = (y_{i} y_{j} K (x_{i}, x_{j}))$

The optimal separating Hyperplane in the $H$ is $\hat{f} (x) = \hat{β}_{0} + i \in sv \sum \overset{α}{^}_{i} y_{i} K (x_{i}, x_{j})$

and the decision rule is defined as $C (x) = sign (\hat{f} (x))$

My Knowledge Base

Explorer

Support Vector Machine

Definition

Hard-Margin Support Vector Machine

Soft-Margin Support Vector Machine

Non-Linear Support Vector Machine

Hard-Margin Non-Linear Support Vector Machine

Soft-Margin Non-Linear Support Vector Machine

Graph View

Table of Contents

Backlinks