前沿
人工智慧的浪潮已經席捲全球,深度學習(Deep Learning)和人工智慧(Artificial Intelligence, AI)等詞彙也不斷地充斥在我們身邊。人工智慧的發展是一個三起兩落的變化,90年代期間,知識推理>神經網路>機器學習;2005年左右,機器學習>知識(語義網)>神經網路;而從2017年之後,基於深度學習的神經網路>知識(知識圖譜)>機器學習。
卷積神經網路(convolutional neural network, CNN)作為深度學習中的代表,最早的靈感是來源於1961年Hubel和Wiesel兩位神經生物學家,在對貓視覺皮層細胞的實驗中,發現大腦可視皮層是分層的(CNN中的分層網路結構與其如出一轍)。深度學習作為機器學習(ML)的一個子領域,由於計算機能力的提高和大量資料的可用性,得到了戲劇性的復甦。但是,深度學習是否能等同或代表人工智慧,這一點筆者認為有待商榷,深度學習可以認為是目前人工智慧發展階段的重要技術。由於本文主要撰寫關於深度學習的入門實戰,關於細節概念不做深入研究,下面筆者從實際案例,介紹深度學習處理影象的大致流程。
目錄:
以手寫識別數字為例,作為深度學習的入門專案,本文以Keras深度學習庫為基礎。其中使用的tensorflow等模組需要提前配置好,同時注意模型,圖片儲存、載入的檔案路徑問題。在自己的計算機上執行時,需要建立或修改。下面的流程包括:使用Keras載入MNIST資料集,構建Lenet訓練網路模型,使用Keras進行模型的儲存、載入,使用Keras實現對手寫數字資料集的訓練和預測,最後畫出誤差迭代圖。
手寫數字資料集介紹:
手寫數字識別幾乎是深度學習的入門資料集了。在keras中內建了MNIST資料集,其中測試集包含60000條資料,驗證集包含10000條資料,為單通道的灰度圖片,每張圖片的畫素大小為28×28.一共包含10個類別,為數字0到9。
匯入相關模組:
# import the necessary packages
import numpy as np
from keras.utils import np_utils
from keras.optimizers import Adam
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dense
from keras import backend as K
from keras.models import load_model
載入MNIST資料集
Keras可實現多種神經網路模型,並可以載入多種資料集來評價模型的效果,下面我們使用程式碼自動載入MNIST資料集。
# load minit data
from keras.datasets import mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
顯示MNIST訓練資料集中的前面6張圖片:
# plot 6 images as gray scale
import matplotlib.pyplot as plt
plt.subplot(321)
plt.imshow(x_train[0],cmap=plt.get_cmap('gray'))
plt.subplot(322)
plt.imshow(x_train[1],cmap=plt.get_cmap('gray'))
plt.subplot(323)
plt.imshow(x_train[2],cmap=plt.get_cmap('gray'))
plt.subplot(324)
plt.imshow(x_train[3],cmap=plt.get_cmap('gray'))
plt.subplot(325)
plt.imshow(x_train[4],cmap=plt.get_cmap('gray'))
plt.subplot(326)
plt.imshow(x_train[5],cmap=plt.get_cmap('gray'))
# show
plt.show()
資料的預處理
首先,將資料轉換為4維向量[samples][width][height][pixels],以便於後面模型的輸入
# reshape the data to four dimensions, due to the input of model
# reshape to be [samples][width][height][pixels]
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1).astype('float32')
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1).astype('float32')
為了使模型訓練效果更好,通常需要對影象進行歸一化處理
# normalization
x_train = x_train / 255.0
x_test = x_test / 255.0
最後,原始MNIST資料集的資料標籤是0-9,通常要將其表示成one-hot向量。如訓練資料標籤為1,則將其轉化為向量[0,1,0,0,0,0,0,0,0,0]
# one-hot
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
模型的建立與計算
訓練模型的引數設定:
# parameters
EPOCHS = 10
INIT_LR = 1e-3
BS = 32
CLASS_NUM = 10
norm_size = 28
本文使用Lenet網路架構,下面定義Lenet網路結構,若要更改網路結構,如用VGGNet,GoogleNet,Inception,ResNets或自己構建不同的網路結構,可以直接在這一塊函式內進行修改。
# define lenet model
def l_model(width, height, depth, NB_CLASS):
model = Sequential()
inputShape = (height, width, depth)
# if we are using "channels last", update the input shape
if K.image_data_format() == "channels_first": # for tensorflow
inputShape = (depth, height, width)
# first set of CONV => RELU => POOL layers
model.add(Conv2D(20, (5, 5), padding="same", input_shape=inputShape))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# second set of CONV => RELU => POOL layers
model.add(Conv2D(50, (5, 5), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(500))
model.add(Activation("relu"))
# softmax classifier
model.add(Dense(NB_CLASS))
model.add(Activation("softmax"))
# return the constructed network architecture
return model
再附上兩個經典的模型:
VGG16:
import inspect
import os
import numpy as np
import tensorflow as tf
import time
VGG_MEAN = [103.939, 116.779, 123.68]
class Vgg16:
def __init__(self, vgg16_npy_path=None):
if vgg16_npy_path is None:
path = inspect.getfile(Vgg16)
path = os.path.abspath(os.path.join(path, os.pardir))
path = os.path.join(path, "vgg16.npy")
vgg16_npy_path = path
print(path)
self.data_dict = np.load(vgg16_npy_path, encoding='latin1').item()
print("npy file loaded")
def build(self, rgb):
"""
load variable from npy to build the VGG
:param rgb: rgb image [batch, height, width, 3] values scaled [0, 1]
"""
start_time = time.time()
print("build model started")
rgb_scaled = rgb * 255.0
# Convert RGB to BGR
red, green, blue = tf.split(axis=3, num_or_size_splits=3, value=rgb_scaled)
assert red.get_shape().as_list()[1:] == [224, 224, 1]
assert green.get_shape().as_list()[1:] == [224, 224, 1]
assert blue.get_shape().as_list()[1:] == [224, 224, 1]
bgr = tf.concat(axis=3, values=[
blue - VGG_MEAN[0],
green - VGG_MEAN[1],
red - VGG_MEAN[2],
])
assert bgr.get_shape().as_list()[1:] == [224, 224, 3]
self.conv1_1 = self.conv_layer(bgr, "conv1_1")
self.conv1_2 = self.conv_layer(self.conv1_1, "conv1_2")
self.pool1 = self.max_pool(self.conv1_2, 'pool1')
self.conv2_1 = self.conv_layer(self.pool1, "conv2_1")
self.conv2_2 = self.conv_layer(self.conv2_1, "conv2_2")
self.pool2 = self.max_pool(self.conv2_2, 'pool2')
self.conv3_1 = self.conv_layer(self.pool2, "conv3_1")
self.conv3_2 = self.conv_layer(self.conv3_1, "conv3_2")
self.conv3_3 = self.conv_layer(self.conv3_2, "conv3_3")
self.pool3 = self.max_pool(self.conv3_3, 'pool3')
self.conv4_1 = self.conv_layer(self.pool3, "conv4_1")
self.conv4_2 = self.conv_layer(self.conv4_1, "conv4_2")
self.conv4_3 = self.conv_layer(self.conv4_2, "conv4_3")
self.pool4 = self.max_pool(self.conv4_3, 'pool4')
self.conv5_1 = self.conv_layer(self.pool4, "conv5_1")
self.conv5_2 = self.conv_layer(self.conv5_1, "conv5_2")
self.conv5_3 = self.conv_layer(self.conv5_2, "conv5_3")
self.pool5 = self.max_pool(self.conv5_3, 'pool5')
self.fc6 = self.fc_layer(self.pool5, "fc6")
assert self.fc6.get_shape().as_list()[1:] == [4096]
self.relu6 = tf.nn.relu(self.fc6)
self.fc7 = self.fc_layer(self.relu6, "fc7")
self.relu7 = tf.nn.relu(self.fc7)
self.fc8 = self.fc_layer(self.relu7, "fc8")
self.prob = tf.nn.softmax(self.fc8, name="prob")
self.data_dict = None
print(("build model finished: %ds" % (time.time() - start_time)))
def avg_pool(self, bottom, name):
return tf.nn.avg_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)
def max_pool(self, bottom, name):
return tf.nn.max_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)
def conv_layer(self, bottom, name):
with tf.variable_scope(name):
filt = self.get_conv_filter(name)
conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME')
conv_biases = self.get_bias(name)
bias = tf.nn.bias_add(conv, conv_biases)
relu = tf.nn.relu(bias)
return relu
def fc_layer(self, bottom, name):
with tf.variable_scope(name):
shape = bottom.get_shape().as_list()
dim = 1
for d in shape[1:]:
dim *= d
x = tf.reshape(bottom, [-1, dim])
weights = self.get_fc_weight(name)
biases = self.get_bias(name)
# Fully connected layer. Note that the '+' operation automatically
# broadcasts the biases.
fc = tf.nn.bias_add(tf.matmul(x, weights), biases)
return fc
def get_conv_filter(self, name):
return tf.constant(self.data_dict[name][0], name="filter")
def get_bias(self, name):
return tf.constant(self.data_dict[name][1], name="biases")
def get_fc_weight(self, name):
return tf.constant(self.data_dict[name][0], name="weights")
GoogleNet:
from keras.models import Model
from keras.utils import plot_model
from keras import regularizers
from keras import backend as K
from keras.layers import Input,Flatten, Dense,Dropout,BatchNormalization, concatenate
from keras.layers.convolutional import Conv2D,MaxPooling2D,AveragePooling2D
# Global Constants
NB_CLASS=20
LEARNING_RATE=0.01
MOMENTUM=0.9
ALPHA=0.0001
BETA=0.75
GAMMA=0.1
DROPOUT=0.4
WEIGHT_DECAY=0.0005
LRN2D_NORM=True
DATA_FORMAT='channels_last' # Theano:'channels_first' Tensorflow:'channels_last'
USE_BN=True
IM_WIDTH=224
IM_HEIGHT=224
EPOCH=50
def conv2D_lrn2d(x,filters,kernel_size,strides=(1,1),padding='same',dilation_rate=(1,1),activation='relu',
use_bias=True,kernel_initializer='glorot_uniform',bias_initializer='zeros',
kernel_regularizer=None,bias_regularizer=None,activity_regularizer=None,
kernel_constraint=None,bias_constraint=None,lrn2d_norm=LRN2D_NORM,weight_decay=WEIGHT_DECAY):
#l2 normalization
if weight_decay:
kernel_regularizer=regularizers.l2(weight_decay)
bias_regularizer=regularizers.l2(weight_decay)
else:
kernel_regularizer=None
bias_regularizer=None
x=Conv2D(filters=filters,kernel_size=kernel_size,strides=strides,padding=padding,dilation_rate=dilation_rate,
activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,
bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,
activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x)
if lrn2d_norm:
#batch normalization
x=BatchNormalization()(x)
return x
def inception_module(x,params,concat_axis,padding='same',dilation_rate=(1,1),activation='relu',
use_bias=True,kernel_initializer='glorot_uniform',bias_initializer='zeros',
kernel_regularizer=None,bias_regularizer=None,activity_regularizer=None,kernel_constraint=None,
bias_constraint=None,lrn2d_norm=LRN2D_NORM,weight_decay=None):
(branch1,branch2,branch3,branch4)=params
if weight_decay:
kernel_regularizer=regularizers.l2(weight_decay)
bias_regularizer=regularizers.l2(weight_decay)
else:
kernel_regularizer=None
bias_regularizer=None
#1x1
pathway1=Conv2D(filters=branch1[0],kernel_size=(1,1),strides=1,padding=padding,dilation_rate=dilation_rate,
activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,
bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,
activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x)
#1x1->3x3
pathway2=Conv2D(filters=branch2[0],kernel_size=(1,1),strides=1,padding=padding,dilation_rate=dilation_rate,
activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,
bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,
activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x)
pathway2=Conv2D(filters=branch2[1],kernel_size=(3,3),strides=1,padding=padding,dilation_rate=dilation_rate,
activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,
bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,
activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(pathway2)
#1x1->5x5
pathway3=Conv2D(filters=branch3[0],kernel_size=(1,1),strides=1,padding=padding,dilation_rate=dilation_rate,
activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,
bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,
activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x)
pathway3=Conv2D(filters=branch3[1],kernel_size=(5,5),strides=1,padding=padding,dilation_rate=dilation_rate,
activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,
bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,
activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(pathway3)
#3x3->1x1
pathway4=MaxPooling2D(pool_size=(3,3),strides=1,padding=padding,data_format=DATA_FORMAT)(x)
pathway4=Conv2D(filters=branch4[0],kernel_size=(1,1),strides=1,padding=padding,dilation_rate=dilation_rate,
activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,
bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,
activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(pathway4)
return concatenate([pathway1,pathway2,pathway3,pathway4],axis=concat_axis)
class GoogleNet:
@staticmethod
def build(width, height, depth, NB_CLASS):
INP_SHAPE = (height, width, depth)
img_input = Input(shape=INP_SHAPE)
CONCAT_AXIS = 3
# Data format:tensorflow,channels_last;theano,channels_last
if K.image_data_format() == 'channels_first':
INP_SHAPE = (depth, height, width)
img_input = Input(shape=INP_SHAPE)
CONCAT_AXIS = 1
x = conv2D_lrn2d(img_input, 64, (7, 7), 2, padding='same', lrn2d_norm=False)
x = MaxPooling2D(pool_size=(2, 2), strides=2, padding='same')(x)
x = BatchNormalization()(x)
x = conv2D_lrn2d(x, 64, (1, 1), 1, padding='same', lrn2d_norm=False)
x = conv2D_lrn2d(x, 192, (3, 3), 1, padding='same', lrn2d_norm=True)
x = MaxPooling2D(pool_size=(2, 2), strides=2, padding='same')(x)
x = inception_module(x, params=[(64,), (96, 128), (16, 32), (32,)], concat_axis=CONCAT_AXIS) # 3a
x = inception_module(x, params=[(128,), (128, 192), (32, 96), (64,)], concat_axis=CONCAT_AXIS) # 3b
x = MaxPooling2D(pool_size=(2, 2), strides=2, padding='same')(x)
x = inception_module(x, params=[(192,), (96, 208), (16, 48), (64,)], concat_axis=CONCAT_AXIS) # 4a
x = inception_module(x, params=[(160,), (112, 224), (24, 64), (64,)], concat_axis=CONCAT_AXIS) # 4b
x = inception_module(x, params=[(128,), (128, 256), (24, 64), (64,)], concat_axis=CONCAT_AXIS) # 4c
x = inception_module(x, params=[(112,), (144, 288), (32, 64), (64,)], concat_axis=CONCAT_AXIS) # 4d
x = inception_module(x, params=[(256,), (160, 320), (32, 128), (128,)], concat_axis=CONCAT_AXIS) # 4e
x = MaxPooling2D(pool_size=(2, 2), strides=2, padding='same')(x)
x = inception_module(x, params=[(256,), (160, 320), (32, 128), (128,)], concat_axis=CONCAT_AXIS) # 5a
x = inception_module(x, params=[(384,), (192, 384), (48, 128), (128,)], concat_axis=CONCAT_AXIS) # 5b
x = AveragePooling2D(pool_size=(1, 1), strides=1, padding='valid')(x)
x = Flatten()(x)
x = Dropout(DROPOUT)(x)
x = Dense(output_dim=NB_CLASS, activation='linear')(x)
x = Dense(output_dim=NB_CLASS, activation='softmax')(x)
# Create a Keras Model
model = Model(input=img_input, output=[x])
model.summary()
# Save a PNG of the Model Build
#plot_model(model, to_file='../imgs/GoogLeNet.png')
# return the constructed network architecture
return model
設定最佳化方法,loss函式,並編譯模型:
model = l_model(width=norm_size, height=norm_size, depth=1, NB_CLASS=CLASS_NUM)
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
本文使用生成器以節約記憶體:
# Use generators to save memory
aug = ImageDataGenerator(rotation_range=30, width_shift_range=0.1,
height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,
horizontal_flip=True, fill_mode="nearest")
H = model.fit_generator(aug.flow(x_train, y_train, batch_size=BS),
steps_per_epoch=len(x_train) // BS,
epochs=EPOCHS, verbose=2)
結果分析
作出訓練階段的損失、精確度迭代圖,本文將epoch設定為10,已達到0.98的準確率(程式碼、影象如下所示)。
# plot the iteration process
N = EPOCHS
plt.figure()
plt.plot(np.arange(0,N),H.history['loss'],label='loss')
plt.plot(np.arange(0,N),H.history['acc'],label='train_acc')
plt.title('Training Loss and Accuracy on mnist-img classifier')
plt.xlabel('Epoch')
plt.ylabel('Loss/Accuracy')
plt.legend(loc='lower left')
plt.savefig('../figure/Figure_2.png')
公眾號:帕帕科技喵
歡迎關注與討論~