神经网络常见激活函数 14-Mish函数

文章目录

- Mish
- - 函数+导函数
  - 函数和导函数图像
  - 优缺点
  - PyTorch 中的 Mish 函数
  - TensorFlow 中的 Mish 函数

Mish

论文

https://arxiv.org/pdf/1908.08681

函数+导函数

Mish函数
$Mish(x)=x⋅tanh⁡⁣(softplus(x))=x⋅tanh⁡⁣(ln⁡⁣(1+ex))\begin{aligned} \text{Mish}(x) &= x \cdot \tanh\!\bigl(\text{softplus}(x)\bigr) \\ &= x \cdot \tanh\!\Bigl(\ln\!\bigl(1+e^{x}\bigr)\Bigr) \end{aligned}$
Mish函数导数

已知：
$$
\frac{d}{dx}\tanh(x) =1- \rm tanh ^2(x) \[2mm]

\frac{d}{dx}\operatorname{Softplus}(x)=\sigma(x)=\frac{1}{1+e^{-x}}
$$
参考：

神经网络常见激活函数 2-tanh函数(双曲正切)

则：
$$
\begin{aligned}
\frac{\mathrm{d}}{\mathrm{d}x}\text{Mish}(x)
&= x \cdot \tanh!\Bigl(\ln!\bigl(1+e^{x}\bigr)\Bigr)\

&=\frac{\mathrm{d}}{\mathrm{d}x}x\cdot\tanh\bigl(\ln(1+e^{x})\bigr) + x \cdot \frac{\mathrm{d}}{\mathrm{d}x}\tanh\bigl(\ln(1+e^{x})\bigr) \[2mm]

&=\tanh\bigl(\ln(1+e^{x})\bigr) + x \cdot\bigl(1-\tanh^2(\ln(1+e{x})\bigr)\cdot\frac{1}{1+e^{-x}}\
&=\tanh\bigl(\ln(1+e^{x})\bigr) + x \cdot\bigl(1-\tanh^2(\ln(1+e{x})\bigr)\cdot\sigma(x)
\end{aligned}
$$

函数和导函数图像

画图

import numpy as np
from matplotlib import pyplot as pltdef mish(x):"""Mish(x) = x * tanh(softplus(x))"""sp = np.log(1 + np.exp(x))          # softplus(x)return x * np.tanh(sp)def mish_derivative(x):"""Mish'(x) = tanh(softplus(x)) + x * (1 - tanh²(softplus(x))) * sigmoid(x)"""sp = np.log(1 + np.exp(x))          # softplus(x)t  = np.tanh(sp)                    # tanh(softplus(x))s  = 1 / (1 + np.exp(-x))           # sigmoid(x)return t + x  * (1 - t ** 2) * sx = np.linspace(-4, 4, 1000)
y = mish(x)
y1 = mish_derivative(x)plt.figure(figsize=(12, 8))
ax = plt.gca()
plt.plot(x, y, label='Mish')
plt.plot(x, y1, label='Derivative', linestyle='--')
plt.title('Mish Activation Function and its Derivative')ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data', 0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data', 0))plt.legend(loc='upper left')
plt.savefig('./mish.jpg',dpi=300)
plt.show()

mish

优缺点

Mish 的优点
1. 平滑无断点：Mish 函数在整个实数域内连续可导，有助于稳定的梯度流，缓解梯度消失问题。
2. 非单调性：负半轴有一段“下凹再回升”的曲线，有助于梯度流动，提升网络的表达能力。
3. 无上界正值：正值部分无饱和区，避免梯度消失，适合深层网络，有有下界（≈ −0.31）。
4. 实验性能：在 ImageNet、COCO 等多个基准上，Mish 常优于 ReLU、Swish 等激活函数。（并非绝对）
Mish 的缺点
1. 计算开销大：相比 ReLU，需要额外计算 softplus、tanh 与乘法，推理延迟略高。
2. 显存占用：反向传播需缓存中间结果，显存开销高于 ReLU。
3. 并非万能：在某些轻量级或实时任务中，性能提升可能无法抵消额外计算成本，需要实验验证。

PyTorch 中的 Mish 函数

代码

import torch
import torch.nn.functional as F# 固定随机种子
torch.manual_seed(1024)      # CPU
if torch.cuda.is_available():torch.cuda.manual_seed_all(42)   # GPU，如果有x = torch.randn(2,dtype=torch.float32)
mish_x = mish(x)print(f"x:\n{x}")
print(f"mish_x:\n{mish_x}")
"""输出示例"""
x:
tensor([-1.4837,  0.2671])
mish_x:
[-0.29912564  0.18258688]

TensorFlow 中的 Mish 函数

环境

python: 3.10.9
tensorflow: 2.19.0

代码

import tensorflow as tfdef mish(x):return x * tf.math.tanh(tf.math.softplus(x))# 生成随机张量
x = tf.constant([-1.4837, 0.2671], dtype=tf.float32)
mish_x = mish(x)print(f"x:\n{x.numpy()}")
print(f"mish_x:\n{mish_x.numpy()}")"""输出示例"""
x:
[-1.4837  0.2671]
mish_x:
[-0.29912373  0.18255362]

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。
如若转载，请注明出处：http://www.pswp.cn/news/914881.shtml
繁体地址，请注明出处：http://hk.pswp.cn/news/914881.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！