2017年3月6日 (一) 02:53的版本

语音识别

语音识别，Automatic Speech Recognition，简称ASR

基本工具

LSTM

Long short term memory neural network

Long short term memory neural computation, Neural computation 9 (8), 1735-1780, 1997. LSTM

CTC

Connectionist temporal classification

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, ICML 2006.

GRU

Gated Recursive Unit

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, SSST-8, 2014.

研究

传统方法综述

S. Karpagavalli and E. Chandra. "A Review on Automatic Speech Recognition Architecture and Approaches." International Journal of Signal Processing, Image Processing and Pattern Recognition 9, No. 4 (2016): 393-404.

Google

Alex Graves，Google DeepMind研究员，语音识别多项技术开创者

Speech recognition with deep recurrent neural networks, 2013.
Hybrid speech recognition with deep bidirectional LSTM, ASRU 2013.
Towards End-To-End Speech Recognition with Recurrent Neural Networks, ICML 2014.
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, ICML 2006.

Google Speech

Google Speech Processing from Mobile to Farfield, CHiME 2016. Google_Speech_Processing
Tara N. Sainath et al., "Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing (2017).

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, ICASSP 2015.
Convolutional, long short-term memory, fully connected deep neural networks, ICASSP 2015.
Context dependent phone models for LSTM RNN acoustic modelling, ICASSP 2015.
Learning the Speech Front-end With Raw Waveform CLDNNs, InterSpeech 2015.

Baidu

Deep Speech 2 End-to-End Speech Recognition in English and Mandarin, JMLR 2016.
Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

JHU

Parallel training of DNNs with natural gradient and parameter averaging, ICLR Workshop 2015.

Audio augmentation for speech recognition, InterSpeech 2015.

CMU

EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding, ASRU 2015.

@@ 第41行： / 第41行： @@
 # Google Speech Processing from Mobile to Farfield, CHiME 2016. [http://spandh.dcs.shef.ac.uk/chime_workshop/presentations/CHiME_2016_Bacchiani_keynote.pdf Google_Speech_Processing]
+#Tara N. Sainath et al., "Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing (2017).
 #Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, ICASSP 2015.

“语音识别”版本间的差异