TensorFlow深度学习实战——DCGAN详解与实现

- 0. 前言
- 1. DCGAN 架构
- 2. 构建 DCGAN 生成手写数字图像
- - 2.1 生成器与判别器架构
  - 2.2 构建 DCGAN
- 相关链接

0. 前言

深度卷积生成对抗网络 (Deep Convolutional Generative Adversarial Network, DCGAN) 是一种基于生成对抗网络 (Generative Adversarial Network, GAN) 的深度学习模型，主要用于生成图像。它结合了卷积神经网络 (Convolutional Neural Network,CNN) 和生成对抗网络的优势，以更高效地生成质量更高的图像。

1. DCGAN 架构

深度卷积生成对抗网络 (Deep Convolutional Generative Adversarial Network, DCGAN) 引入了卷积神经网络 (Convolutional Neural Network,CNN) 的结构，主要设计思想是使用卷积层而不使用池化层或分类层。使用卷积的步幅参数和转置卷积执行下采样(维度减少)和上采样(维度增加)。

相比于原始生成对抗网络 (Generative Adversarial Network, GAN)，DCGAN 的主要变化包括：

网络完全由卷积层组成。池化层替换为步幅卷积(即，在使用卷积层时，将步幅从 1 增加为 2 )用于判别器，而生成器使用转置卷积
移除卷积后的全连接分类层
为了提高训练的稳定性，在每个卷积层后使用批归一化

DCGAN 的基本思想与原始 GAN 相同，生成器接受 100 维的噪声输入，经过全连接层后重塑形状后，通过卷积层处理，生成器架构如下：

生成器架构

判别器接收图像(可以是生成器生成的图像或来自真实数据集的图像)，图像经过卷积处理和批归一化处理。在每一步卷积中通过步幅参数进行下采样。卷积层的最终输出展平后，输入到一个具有单个神经元的分类层：

判别器

生成器和判别器组合在一起形成 DCGAN。训练过程与原始 GAN 相同，首先在一个批数据上训练判别器，然后冻结判别器，训练生成器，并重复以上过程。实践证明，使用学习率为 0.002 的 Adam 优化器能得到更稳定的结果。接下来，使用 Tensorflow 实现一个用于生成 MNIST 手写数字图像的 DCGAN。

2. 构建 DCGAN 生成手写数字图像

在本节中，构建一个用于生成 MNIST 手写数字图像的 DCGAN。

2.1 生成器与判别器架构

生成器通过顺序添加网络层构建。第一层是一个全连接层，接受 100 维的噪声作为输入，全连接层将 100 维的输入扩展为一个大小为 128 × 7 × 7 的一维向量。这样做的目的是为了最终得到大小为 28 × 28 的输出，也就是 MNIST 手写数字图像的标准大小。该向量重塑为一个大小为 7 × 7 × 128 的张量，然后使用 TensorFlow 的 UpSampling2D 层进行上采样。需要注意的是，该层只是通过将行和列翻倍来放大图像，并没有可训练权重，因此计算开销较小。
Upsampling2D 层将 7 × 7 × 128 (行 × 列 × 通道)的图像的行和列翻倍，得到大小 14 × 14 × 128 的输出。上采样后的图像传递给一个卷积层，卷积层学习填充上采样图像中的细节，卷积的输出传递到批归一化层。批归一化后的输出经过 ReLU 激活。重复以上结构，即：上采样-卷积-批归一化-ReLU。在生成器中，具有两个这样的结构，第一个卷积层中使用 128 个卷积核，第二个使用 64 个卷积核。最终输出使用一个卷积层，使用尺寸为 3 x 3 的单个卷积核和 tanh 激活函数，生成 28 × 28 × 1 的图像：

    def build_generator(self):model = Sequential()model.add(Dense(128 * 7 * 7, activation="relu", input_dim=self.latent_dim))model.add(Reshape((7, 7, 128)))model.add(UpSampling2D())model.add(Conv2D(128, kernel_size=3, padding="same"))model.add(BatchNormalization(momentum=0.8))model.add(Activation("relu"))model.add(UpSampling2D())model.add(Conv2D(64, kernel_size=3, padding="same"))model.add(BatchNormalization(momentum=0.8))model.add(Activation("relu"))model.add(Conv2D(self.channels, kernel_size=3, padding="same"))model.add(Activation("tanh"))model.summary()noise = Input(shape=(self.latent_dim,))img = model(noise)return Model(noise, img)

生成器模型架构如下：

生成器架构

也可以使用转置卷积层，转置卷积层不仅对输入图像进行上采样，而且在训练过程中学习如何填充细节。因此，可以用一个转置卷积层来替代上采样和卷积层，转置卷积层执行的是反卷积操作。
接下来，构建判别器。判别器类似于标准卷积神经网络，但区别在于，使用步幅为 2 的卷积层来代替最大池化层。还添加了 dropout 层以避免过拟合，并使用批归一化以提高准确性和加快收敛速度，激活函数使用 leaky ReLU。在判别器中，使用了三个卷积层，分别具有 32、64 和 128 个卷积核。最后一个卷积层的输出展平后传递给一个具有单个单元的全连接层。输出用于将图像分类为真实图像或伪造图像：

    def build_discriminator(self):model = Sequential()model.add(Conv2D(32, kernel_size=3, strides=2, input_shape=self.img_shape, padding="same"))model.add(LeakyReLU(alpha=0.2))model.add(Dropout(0.25))model.add(Conv2D(64, kernel_size=3, strides=2, padding="same"))model.add(ZeroPadding2D(padding=((0,1),(0,1))))model.add(BatchNormalization(momentum=0.8))model.add(LeakyReLU(alpha=0.2))model.add(Dropout(0.25))model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))model.add(BatchNormalization(momentum=0.8))model.add(LeakyReLU(alpha=0.2))model.add(Dropout(0.25))model.add(Conv2D(256, kernel_size=3, strides=1, padding="same"))model.add(BatchNormalization(momentum=0.8))model.add(LeakyReLU(alpha=0.2))model.add(Dropout(0.25))model.add(Flatten())model.add(Dense(1, activation='sigmoid'))model.summary()img = Input(shape=self.img_shape)validity = model(img)return Model(img, validity)

判别器模型架构如下：

判别器架构

2.2 构建 DCGAN

通过将生成器和判别器组合在一起得到完整的 GAN：

from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Input, Dense, Reshape, Flatten, Dropout
from tensorflow.keras.layers import BatchNormalization, Activation, ZeroPadding2D
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras.layers import UpSampling2D, Conv2D
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.optimizers import Adamimport matplotlib.pyplot as plt
import sys
import numpy as npclass DCGAN():def __init__(self, rows, cols, channels, z = 10):# Input shapeself.img_rows = rowsself.img_cols = colsself.channels = channelsself.img_shape = (self.img_rows, self.img_cols, self.channels)self.latent_dim = zoptimizer_1 = Adam(0.0002, 0.5)optimizer_2 = Adam(0.0002, 0.5)# Build and compile the discriminatorself.discriminator = self.build_discriminator()self.discriminator.compile(loss='binary_crossentropy',optimizer=optimizer_1,metrics=['accuracy'])# Build the generatorself.generator = self.build_generator()# The generator takes noise as input and generates imgsz = Input(shape=(self.latent_dim,))img = self.generator(z)# For the combined model we will only train the generatorself.discriminator.trainable = False# The discriminator takes generated images as input and determines validityvalid = self.discriminator(img)# The combined model  (stacked generator and discriminator)# Trains the generator to fool the discriminatorself.combined = Model(z, valid)self.combined.compile(loss='binary_crossentropy', optimizer=optimizer_2)def build_generator(self):model = Sequential()model.add(Dense(128 * 7 * 7, activation="relu", input_dim=self.latent_dim))model.add(Reshape((7, 7, 128)))model.add(UpSampling2D())model.add(Conv2D(128, kernel_size=3, padding="same"))model.add(BatchNormalization(momentum=0.8))model.add(Activation("relu"))model.add(UpSampling2D())model.add(Conv2D(64, kernel_size=3, padding="same"))model.add(BatchNormalization(momentum=0.8))model.add(Activation("relu"))model.add(Conv2D(self.channels, kernel_size=3, padding="same"))model.add(Activation("tanh"))model.summary()noise = Input(shape=(self.latent_dim,))img = model(noise)return Model(noise, img)def build_discriminator(self):model = Sequential()model.add(Conv2D(32, kernel_size=3, strides=2, input_shape=self.img_shape, padding="same"))model.add(LeakyReLU(alpha=0.2))model.add(Dropout(0.25))model.add(Conv2D(64, kernel_size=3, strides=2, padding="same"))model.add(ZeroPadding2D(padding=((0,1),(0,1))))model.add(BatchNormalization(momentum=0.8))model.add(LeakyReLU(alpha=0.2))model.add(Dropout(0.25))model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))model.add(BatchNormalization(momentum=0.8))model.add(LeakyReLU(alpha=0.2))model.add(Dropout(0.25))model.add(Conv2D(256, kernel_size=3, strides=1, padding="same"))model.add(BatchNormalization(momentum=0.8))model.add(LeakyReLU(alpha=0.2))model.add(Dropout(0.25))model.add(Flatten())model.add(Dense(1, activation='sigmoid'))model.summary()img = Input(shape=self.img_shape)validity = model(img)return Model(img, validity)

使用 binary_crossentropy 损失函数定义生成器和判别器的损失。生成器和判别器的优化器在初始化方法中定义。最后，定义了一个 TensorFlow 检查点，用于在模型训练过程中保存生成器和判别器模型。
DCGAN 的训练过程与原始 GAN 相同，在每一步中，首先将随机噪声输入到生成器中。生成器的输出与真实图像用于训练判别器，然后训练生成器，使其生成能够欺骗判别器的图像。GAN 的训练通常需要几百到数千个训练 epoch：

    def train(self, epochs, batch_size=256, save_interval=50):# Load the dataset(X_train, _), (_, _) = mnist.load_data()# Rescale -1 to 1X_train = X_train / 127.5 - 1.X_train = np.expand_dims(X_train, axis=3)# Adversarial ground truthsvalid = np.ones((batch_size, 1))fake = np.zeros((batch_size, 1))for epoch in range(epochs):# ---------------------#  Train Discriminator# ---------------------# Select a random half of imagesidx = np.random.randint(0, X_train.shape[0], batch_size)imgs = X_train[idx]# Sample noise and generate a batch of new imagesnoise = np.random.normal(0, 1, (batch_size, self.latent_dim))gen_imgs = self.generator.predict(noise)# Train the discriminator (real classified as ones and generated as zeros)d_loss_real = self.discriminator.train_on_batch(imgs, valid)d_loss_fake = self.discriminator.train_on_batch(gen_imgs, fake)d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)# ---------------------#  Train Generator# ---------------------# Train the generator (wants discriminator to mistake images as real)g_loss = self.combined.train_on_batch(noise, valid)# Plot the progressprint ("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss))# If at save interval => save generated image samplesif epoch % save_interval == 0:self.save_imgs(epoch)

最后，定义辅助函数保存图像：

    def save_imgs(self, epoch):r, c = 5, 5noise = np.random.normal(0, 1, (r * c, self.latent_dim))gen_imgs = self.generator.predict(noise)# Rescale images 0 - 1gen_imgs = 0.5 * gen_imgs + 0.5fig, axs = plt.subplots(r, c)cnt = 0for i in range(r):for j in range(c):axs[i,j].imshow(gen_imgs[cnt, :,:,0], cmap='gray')axs[i,j].axis('off')cnt += 1fig.savefig("images/dcgan_mnist_%d.png" % epoch)plt.close()

训练 DCGAN 模型：

dcgan = DCGAN(28,28,1)
dcgan.train(epochs=5000, batch_size=128, save_interval=50)

随着训练的进行，GAN 学习生成手写数字的能力逐渐增强：

训练监控

在第 50 个训练 epoch，生成的手写数字图像质量有了显著提升：

结果图像

下图是将 DCGAN 应用到名人图像数据集中的一些生成结果：

生成结果