机器学习笔记【Week3】

一、逻辑回归（Logistic Regression）

与线性回归的区别：

问题类型	输出类型	举例
回归问题	连续实数	房价预测、气温预测
分类问题	离散类别（0 或 1）	是否患病、是否点击广告、是否合格

我们希望构建一个模型，根据输入 $x$ 输出一个概率值：
$h_\theta(x) = P(y=1 \mid x;\theta)$

应用场景

用于二分类任务，例如：

邮件是否垃圾
是否患病
信用是否违约

二、假设函数 Hypothesis

与线性回归的主要区别：输出范围需限制在 [0, 1]

使用 sigmoid 函数（也称 logistic 函数）：
$h_\theta(x) = g(\theta^T x) = \frac{1}{1 + e^{-\theta^T x}}$
其中：

$g (z)$ 是 sigmoid 函数
输出值 $h_\theta(x)$ 表示输入为正类（y = 1）的概率

Python 实现：

import numpy as npdef sigmoid(z):return 1 / (1 + np.exp(-z))

三、分类决策

逻辑回归模型最终输出一个概率，我们通常采用：

$h_\theta(x) \ge 0.5$ ⇒ 预测为 1
$h_\theta(x) < 0.5$ ⇒ 预测为 0

决策边界：

满足 $h_\theta(x) = 0.5$ 即：
$\theta^T x = 0$
这就是一条分界线（或超平面），用来把输入空间划分为两类。

四、代价函数（Cost Function）

线性回归的平方误差不适用于分类，会导致非凸函数。因此改用如下对数损失函数：

单个样本：
$\text{Cost}(h_\theta(x), y) = \begin{cases} - \log(h_\theta(x)) & \text{if } y = 1 \\ - \log(1 - h_\theta(x)) & \text{if } y = 0 \end{cases}$
统一表达为：
$J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} \left[y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)}))\right]$
它是一个凸函数，可用梯度下降优化。

对每个样本：

若 $y = 1$ ：损失为 $-\log(h_\theta(x))$
若 $y = 0$ ：损失为 $-\log(1 - h_\theta(x))$

Python 实现：

def compute_cost(theta, X, y):m = len(y)h = sigmoid(X @ theta)epsilon = 1e-5  # 防止 log(0)return (-1 / m) * (y.T @ np.log(h + epsilon) + (1 - y).T @ np.log(1 - h + epsilon))

五、梯度下降优化参数

逻辑回归成本函数依然是凸函数，适用梯度下降：
$\theta_j := \theta_j - \alpha \cdot \frac{1}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)}$

向量化形式：
$\theta := \theta - \frac{\alpha}{m} X^T (h_\theta(x) - y)$
其中 $h_\theta(x) = g(X\theta)$

Python 向量化实现：

def gradient(theta, X, y):m = len(y)h = sigmoid(X @ theta)return (1 / m) * (X.T @ (h - y))

六、训练模型示例（使用 sklearn 数据）

from sklearn.datasets import make_classification
from scipy.optimize import minimize# 生成模拟数据
X, y = make_classification(n_samples=100, n_features=2, n_informative=2,n_redundant=0, random_state=42)
X = np.c_[np.ones((X.shape[0], 1)), X]  # 添加 x0
y = y.reshape(-1, 1)
theta_init = np.zeros((X.shape[1], 1))# 定义损失函数封装形式（用于 minimize）
def cost_func(t):return compute_cost(t.reshape(-1, 1), X, y)def grad_func(t):return gradient(t.reshape(-1, 1), X, y).flatten()# 优化
result = minimize(fun=cost_func, x0=theta_init.flatten(), jac=grad_func)
theta_optimized = result.x.reshape(-1, 1)

七、决策边界可视化

import matplotlib.pyplot as pltdef plot_decision_boundary(X, y, theta):plt.scatter(X[:, 1], X[:, 2], c=y.flatten(), cmap='bwr')x_vals = np.linspace(X[:, 1].min(), X[:, 1].max(), 100)y_vals = -(theta[0] + theta[1]*x_vals) / theta[2]plt.plot(x_vals, y_vals, 'g--')plt.xlabel('Feature 1')plt.ylabel('Feature 2')plt.title('Decision Boundary')plt.grid(True)plt.show()plot_decision_boundary(X, y, theta_optimized)

八、过拟合与欠拟合（Overfitting vs Underfitting）

欠拟合（Underfitting）

模型太简单，不能很好地拟合训练数据。
训练误差高，泛化能力差。

过拟合（Overfitting）

模型太复杂（如高阶多项式），虽然训练误差低，但在新数据上表现差。
泛化能力弱。

图示对比：

欠拟合：模型是一条直线
合理拟合：模型是一条平滑曲线
过拟合：模型是高频震荡曲线，精确穿过每个训练点

解决过拟合的两种主要方法

方法 1：减少特征数量（手动或 PCA）

删除噪声特征
降维技术（如 PCA）

方法 2：正则化（Regularization）

惩罚模型中参数过大的情况
防止模型过度复杂

九、多项式回归（Polynomial Regression）

使用更高阶的特征，如：
$h_\theta(x) = \theta_0 + \theta_1 x + \theta_2 x^2 + \theta_3 x^3 + \cdots$
为了防止高阶模型过拟合，需要 正则化。

十、正则化（Regularization）

在代价函数中加入一个惩罚项（L2 范数），避免参数变得过大：

1. 线性回归正则化代价函数：

$J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2$

第一项：模型的预测误差

第二项：参数平方和，防止过大

$\lambda$ 是正则化系数（控制惩罚强度）

注意：不对 $\theta_0$ 正则化

2. 对应的梯度更新（带正则化）：

$j = 0$ （偏置项）：

$\theta_0 := \theta_0 - \alpha \cdot \frac{1}{m} \sum (h_\theta(x^{(i)}) - y^{(i)})$

$\ge 1$ ：

$\theta_j := \theta_j - \alpha \cdot \left[ \frac{1}{m} \sum (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} + \frac{\lambda}{m} \theta_j \right] \quad \text{(j ≥ 1)}$

十一、逻辑回归中的正则化

逻辑回归同样适用：
$J(\theta) = -\frac{1}{m} \sum \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1-y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right] + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2$

Python 实现（逻辑回归正则化）：

def cost_regularized(theta, X, y, lambda_):m = len(y)h = sigmoid(X @ theta)reg_term = (lambda_ / (2 * m)) * np.sum(np.square(theta[1:]))return (-1 / m) * (y.T @ np.log(h + 1e-5) + (1 - y).T @ np.log(1 - h + 1e-5)) + reg_termdef gradient_regularized(theta, X, y, lambda_):m = len(y)h = sigmoid(X @ theta)grad = (1 / m) * (X.T @ (h - y))reg = (lambda_ / m) * thetareg[0] = 0  # θ₀ 不正则化return grad + reg

十二、多项式特征与 sklearn 示例

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge# 构造多项式特征
poly = PolynomialFeatures(degree=5)
X_poly = poly.fit_transform(X)# 岭回归（L2 正则化）
model = Ridge(alpha=1.0)  # alpha 对应 λ
model.fit(X_poly, y)

十三、训练集 vs 验证集 vs 测试集

训练集（training set）：用于训练模型
验证集（cross validation set）：用于选择参数，如 λ、模型复杂度等
测试集（test set）：用于评估模型最终泛化性能

通常划分比例为 60% / 20% / 20%

十四、模型选择与评估流程

模型选择步骤：

使用训练集训练多个不同 λ 值的模型
在验证集上评估不同模型的性能，选择最优 λ
使用测试集评估最终模型的泛化误差

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。
如若转载，请注明出处：http://www.pswp.cn/bicheng/82841.shtml
繁体地址，请注明出处：http://hk.pswp.cn/bicheng/82841.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！