来自Siggraph
Machine Learning
机器学习
是计算机科学的一个领域,这个领域使用统计学的方法让计算机系统使用数据去学习,而不是提前被设计好的。
ML种类
- 监督类
- 分类问题
- Digit Recognition
- Spam Detection
- Face detection
- 回归问题
- Human Face/Pose Estimation
- Model Estimation
- Data consolidation(数据合并??)
- 分类问题
- 非监督
- 聚类(Clustering)
- Group Points According to X
- Image Segmentation using NCuts
- 降维问题(Dimensionality Reduction)
- 多种学习融合(Manifold Learning)
- 聚类(Clustering)
- 弱监督/半监督
- 一些数据采用监督, 一些数据非监督
- 强化学习
- Supervision: sparse reward for a sequence of decisions(监督:对一系列决策的稀疏奖励)
PS: Segmentation + Classification in Real Images
评价标准: Confusion matrix, ROC curve, precision, recall, etc.
Learning a Function
$y = f_w(x)$, 其中$y$是预测(prediction);$f$是预测方法(method);$w$是参数(parameters);$x$是输入(input).
Learning a Linear Separator/Classifier
$y = f(w_1x_1 + w_2x_2) = \mathcal H(w_1x_1 + w_2x_2)$
其中$\mathcal H$fixed non-linearity 并且$w_1 ,w_2$是通过学习得到的。
Combining Simple Functions/Classifiers
Regression
1. Least Squares fitting
- Assumption:Linear Function
- Reminder: Linear Classifier
1.Sum of Square Errors (MSE without the mean)
$y^i = \mathbf w^T\mathbf x^i + \epsilon^i$
loss function:Sum of Square Errors
$L(\mathbf w) = \sum_{i=1}^N(\epsilon^i)^2$
展开: $L(w_0, w_2) = \sum_{i=1}^N[y_i - (w_0x^i_0 + w_1x^i_1)]^2$
2.Q : what is the best (or least bad) value of w?
$\bf y = \bf X\bf w + \bf \epsilon$
$L(\bf w) = \epsilon^T\epsilon$
$minmize L(\bf w)$
$L(\bf w) = (\bf y - \bf X\bf w)^T(\bf y - \bf X\bf w) = \mid\mid\bf y - \bf X\bf w\mid\mid^2$
$\nabla L = 2\bf X^T(y - Xw) = 0 \Rightarrow w^* = (X^TX)^{-1}X^Ty$
code
- 超参数
- 过拟合和未拟合
- 调参
- 交叉验证,选择合适的$\lambda$
2. Nonlinear error function and gradient descent
扩展#1:逻辑回归(Logistic Regression)
使用sigmoidal函数$g(\alpha) = \frac{1}{1 + exp(-\alpha)}$
扩展#2:处理多个类别的分类问题
C个类别:one-of-c coding (or one-hot encoding)
矩阵记号:
$\mathbf Y = \begin{bmatrix} \mathbf y^1 \ - \ .\ . \ - \ \mathbf y^2\end{bmatrix} = \begin{bmatrix} \mathbf y_1 \mid … \mid \mathbf y_{\mathbf c}\end{bmatrix}$,其中$\mathbf y_{\mathbf c}=\begin{bmatrix} y_1 \…\ y_c^N\end{bmatrix}$, $W = \begin{bmatrix} \bf w_1 \mid … \mid w_c\end{bmatrix}$
损失函数:
$L(W) = \sum^C_{c=1}{(\bf y_c - Xw_c)^T(y_c - Xw_c)}$
Least squares fit:
$\bf w_c^* = (X^TX)^{-1}X^Ty_c$
Logistic vs Linear Regression (n > 2 classes)
交叉熵的梯度
- $L(\mathbf w) =-\sum_{i=1}^Ny^i\log g(\mathbf w^T \mathbf x^i) + (1-y^i)\log (1-g(\mathbf w^T\mathbf x^i))$
- $\nabla L(\mathbf w^*) == 0$
- initialize : $\mathbf x_0$
- Update: $\mathbf x_{i+1} = x_{i} - \alpha \nabla f(\mathbf x_i)$
XOR 问题
3.感知训练(简单神经网络)
-
多层感知机
- 输入向量
- 隐藏层
- 输出