语音方面的资料不如图像识别的多,所以特地写了一份博客(并不如何严谨),希望可以帮到大家。
我们需要实现10种语音的分类:冷气机,汽车喇叭,儿童玩耍,狗吠声,钻孔,发动机空转,枪射击,手持式凿岩机,警笛,街头音乐
每个录音长度约为4s,被放在10个fold文件中。
我们采用keras(可以简单的认为keras是前端,tensorflow是后端,类似于tensorflow是个库,我们使用keras调用它的api)实现模型搭建,使用librosa(Librosa是一个用于音频、音乐分析、处理的python工具包)来处理语音。
-
导入这几个库即可
import keras from keras.layers import Activation, Dense, Dropout, Conv2D, Flatten, MaxPooling2D from keras.models import Sequentia import librosa import librosa.display import numpy as np import pandas as pd import random
-
读取csv文件
data = pd.read_csv('metadata/UrbanSound8K.csv') valid_data = data[['slice_file_name', 'fold' ,'classID', 'class']][ data['end']-data['start'] >= 3 ] valid_data['path'] = 'fold' + valid_data['fold'].astype('str') + '/' + valid_data['slice_file_name'].astype('str')
-
读入wav文件
from tqdm import tnrange, tqdm_notebook D=[] for row in tqdm_notebook(valid_data.itertuples()): print(row.path) print(row.classID) y1, sr1 = librosa.load("audio/" + row.path, duration=2.97) ps = librosa.feature.melspectrogram(y=y1, sr=sr1) if ps.shape != (128, 128): continue D.append( (ps, row.classID) )
-
划分训练集和测试集,前7000个为训练集,7000以后为数据集
dataset = D random.shuffle(dataset) train = dataset[:7000] test = dataset[7000:] X_train, y_train = zip(*train) X_test, y_test = zip(*test) X_train = np.array([x.reshape( (128, 128, 1) ) for x in X_train]) X_test = np.array([x.reshape( (128, 128, 1) ) for x in X_test]) y_train = np.array(keras.utils.to_categorical(y_train, 10)) y_test = np.array(keras.utils.to_categorical(y_test, 10))
-
搭建模型
model = Sequential() input_shape=(128, 128, 1) model.add(Conv2D(24, (5, 5), strides=(1, 1), input_shape=input_shape)) model.add(MaxPooling2D((4, 2), strides=(4, 2))) model.add(Activation('relu')) model.add(Conv2D(48, (5, 5), padding="valid")) model.add(MaxPooling2D((4, 2), strides=(4, 2))) model.add(Activation('relu')) model.add(Conv2D(48, (5, 5), padding="valid")) model.add(Activation('relu')) model.add(Flatten()) model.add(Dropout(rate=0.5)) model.add(Dense(64)) model.add(Activation('relu')) model.add(Dropout(rate=0.5)) model.add(Dense(10)) model.add(Activation('softmax'))
-
填入数据
model.compile( optimizer="Adam", loss="categorical_crossentropy", metrics=['accuracy']) model.fit( x=X_train, y=y_train, epochs=12, batch_size=128, validation_data= (X_test, y_test)) score = model.evaluate( x=X_test, y=y_test) print('Test loss:', score[0]) print('Test accuracy:', score[1])
内容来自:https://blog.csdn.net/c2c2c2aa/article/details/81543549