- Machine Learning
- polynomial data fitting
- k-nearest neighbors predictor
- gradient descent
- logistic-regression classifier
- support vector machine (SVM)
Machine Learning
general problem format
given input , want output
- traditional approach
hand-craft the function - machine learning
build another function and use it to generate an approximation
machine learning is about building a function
- training data
- class of allowed function
use a predefined algorithm to compute with the goal:
simplification using feature vector
convert input into a 1d vector, making machine learning generally applicable
- feature vector
- training set
- hypothesis space
predefined algorithm produce so that
loss
estimation of the error of
zero-one loss
quadratic loss
empirical risk
average lost based on the training set
three types of machine learning problems
classification problem
label given data
- classifier (predictor) : the produced function
all possible training set
signature of machine learning function
- learn (or train) classifier from training set
- inference: apply classifier on any data
- testing: apply classifier on unseen data
classifier define a partition of
regression problem
given data, return a vector
clustering problem
group given data
supervised/ unsupervised machine learning
supervised learning: classification, regression
unsupervised learning: clustering
polynomial data fitting
not machine learning but similar
- number of monomial
interpolation in polynomial data fitting
achieve loss
- always interpolate
- overfitting
k-nearest neighbors predictor
remember the whole training set
return average of s corresponding to the closest s to
- useful for both classification and regression
- smaller results in worse overfitting
- good interpolation and poor extrapolation
Voronoi diagram
gradient descent
stochastic gradient descent (SGD)
group training set randomly into mini-batch, use gradient from each mini-batch to descent
- mini-step are in the right direction on average
- epoch: using all data once
step size
fixed
decreasing
momentum
line search
subgradient
logistic-regression classifier
- score-based
score function for linear boundary
signed distance to hyperplane
hyperplane
is perpendicular to
distance of from origin
signed distance of from
logistic function
then the score function is
- activation is signed distance scaled
softmax function
softmax function in activation
cross entropy loss for binary classification
cross entropy loss of assigning score to point whose true label is
- differentiable so we can do gradient descent
- base of does not matter
- resulting risk function is weakly convex
cross entropy loss for -class case
- true label
- prediction
- one hot encoding of
support vector machine (SVM)
binary support vector machine
separating hyperplane for binary support vector machine
decision rule
margin for with parameter
margin of training set
linearly separable if
hinge loss
reference margin
where
- separating hyperplane
empirical risk of binary support vector machine
- bigger smaller
soft linear support vector machine
subgradient of hinge function
support vector machine with kernel
representer theorem
loss function in the form
where increasing, ,
s.t.
proof: by writing
and proving by contradiction
support vector
sample that are misclassified or classified correctly with a margin not larger than
only support vector contribute to
kernel for support vector machine
where kernel
for some
is a kernel of
where
Mercer’s condition: s.t.
is finite,
Gaussian kernel
- radial basis function (RBF) SVM