Machine Learning

general problem format

given input , want output

  • traditional approach
    hand-craft the function
  • machine learning
    build another function and use it to generate an approximation

machine learning is about building a function

  • training data
  • class of allowed function

use a predefined algorithm to compute with the goal:

simplification using feature vector

convert input into a 1d vector, making machine learning generally applicable

  • feature vector
  • training set
  • hypothesis space

predefined algorithm produce so that

loss

estimation of the error of

zero-one loss

quadratic loss

empirical risk

average lost based on the training set

three types of machine learning problems

classification problem

label given data

  • classifier (predictor) : the produced function

all possible training set

signature of machine learning function

  • learn (or train) classifier from training set
  • inference: apply classifier on any data
    • testing: apply classifier on unseen data

classifier define a partition of

regression problem

given data, return a vector

clustering problem

group given data

supervised/ unsupervised machine learning

supervised learning: classification, regression

unsupervised learning: clustering

polynomial data fitting

not machine learning but similar

  • number of monomial

interpolation in polynomial data fitting

achieve loss

  • always interpolate
  • overfitting

k-nearest neighbors predictor

remember the whole training set

return average of s corresponding to the closest s to

  • useful for both classification and regression
  • smaller results in worse overfitting
  • good interpolation and poor extrapolation

Voronoi diagram

example of a Voronoi diagram

gradient descent

stochastic gradient descent (SGD)

group training set randomly into mini-batch, use gradient from each mini-batch to descent

  • mini-step are in the right direction on average
  • epoch: using all data once

step size

  • fixed

  • decreasing

  • momentum

  • line search

subgradient

logistic-regression classifier

  • score-based

score function for linear boundary

signed distance to hyperplane

hyperplane

  • is perpendicular to

  • distance of from origin

  • signed distance of from

logistic function

logistic function

then the score function is

  • activation is signed distance scaled

softmax function

softmax function in activation

cross entropy loss for binary classification

cross entropy loss of assigning score to point whose true label is

  • differentiable so we can do gradient descent
  • base of does not matter
  • resulting risk function is weakly convex

cross entropy loss for -class case

  • true label
  • prediction
  • one hot encoding of

support vector machine (SVM)

binary support vector machine

separating hyperplane for binary support vector machine

  • decision rule

  • margin for with parameter

    • margin of training set

    • linearly separable if

hinge loss

reference margin

where

  • separating hyperplane

empirical risk of binary support vector machine

  • bigger smaller

soft linear support vector machine

subgradient of hinge function

support vector machine with kernel

representer theorem

loss function in the form

where increasing, ,

s.t.

  • proof: by writing

    and proving by contradiction

support vector

sample that are misclassified or classified correctly with a margin not larger than

only support vector contribute to

kernel for support vector machine

where kernel

for some

is a kernel of

  • where

  • Mercer’s condition: s.t.

    is finite,

  • Gaussian kernel

    • radial basis function (RBF) SVM