2017年3月19日 (日) 05:07的版本

语音识别

语音识别，Automatic Speech Recognition，简称ASR

语音文件预处理

SOX SOX_CODE

ffmpeg FFMPEG

神经网络架构单元

LSTM

Long short term memory neural network(LSTM)

Long short term memory neural computation, Neural computation 9 (8), 1735-1780, 1997. LSTM

CTC

Connectionist temporal classification(CTC)

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, ICML 2006.

GRU

Gated Recursive Unit(GRU)

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, SSST-8, 2014.

研究

传统方法综述

S. Karpagavalli and E. Chandra. "A Review on Automatic Speech Recognition Architecture and Approaches." International Journal of Signal Processing, Image Processing and Pattern Recognition 9, No. 4 (2016): 393-404.

Google

Alex Graves

Alex Graves，Google DeepMind研究员，语音识别多项技术开创者

Towards End-To-End Speech Recognition with Recurrent Neural Networks, ICML 2014.
Speech recognition with deep recurrent neural networks, 2013.
Hybrid speech recognition with deep bidirectional LSTM, ASRU 2013.
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, ICML 2006.

Google Speech

Google Speech Processing from Mobile to Farfield, CHiME 2016. Google_Speech_Processing
Tara N. Sainath et al., "Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing (2017).
Zazo Candil, Rubén; Tara N. Sainath, Simko, Gabor; Parada, Carolina, Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection, InterSpeech 2016.
Chan, William, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition." In Acoustics, Speech and Signal Processing, ICASSP 2015.
Sainath, Tara N., Oriol Vinyals, Andrew Senior, and Haşim Sak. "Convolutional, long short-term memory, fully connected deep neural networks.", ICASSP 2015.
Context dependent phone models for LSTM RNN acoustic modelling, ICASSP 2015.
Learning the Speech Front-end With Raw Waveform CLDNNs, InterSpeech 2015.

Baidu

Amodei, Dario, et al., Deep Speech 2 End-to-End Speech Recognition in English and Mandarin, JMLR 2016.
Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

JHU

Dan Povey

Parallel training of DNNs with natural gradient and parameter averaging, ICLR Workshop 2015.
Ko, Tom, Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur, "Audio augmentation for speech recognition.", InterSpeech 2015.

CMU

EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding, ASRU 2015.

Amazon Alexa

Cocktail party problem

Anchored Speech Detection, InterSpeech 2016.

@@ 第5行： / 第5行： @@
 ==语音文件预处理==
 SOX [http://sox.sourceforge.net/ SOX_CODE]
+ffmpeg [https://ffmpeg.org/ FFMPEG]
 == 神经网络架构单元 ==

“语音识别”版本间的差异