“语音识别”版本间的差异
来自iCenter Wiki
(→研究) |
(→Google) |
||
第41行: | 第41行: | ||
# Google Speech Processing from Mobile to Farfield, CHiME 2016. [http://spandh.dcs.shef.ac.uk/chime_workshop/presentations/CHiME_2016_Bacchiani_keynote.pdf Google_Speech_Processing] | # Google Speech Processing from Mobile to Farfield, CHiME 2016. [http://spandh.dcs.shef.ac.uk/chime_workshop/presentations/CHiME_2016_Bacchiani_keynote.pdf Google_Speech_Processing] | ||
+ | #Tara N. Sainath et al., "Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing (2017). | ||
#Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, ICASSP 2015. | #Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, ICASSP 2015. |
2017年3月6日 (一) 02:53的版本
目录
语音识别
语音识别,Automatic Speech Recognition,简称ASR
基本工具
LSTM
Long short term memory neural network
- Long short term memory neural computation, Neural computation 9 (8), 1735-1780, 1997. LSTM
CTC
Connectionist temporal classification
- Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, ICML 2006.
GRU
Gated Recursive Unit
- On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, SSST-8, 2014.
研究
传统方法综述
- S. Karpagavalli and E. Chandra. "A Review on Automatic Speech Recognition Architecture and Approaches." International Journal of Signal Processing, Image Processing and Pattern Recognition 9, No. 4 (2016): 393-404.
Alex Graves,Google DeepMind研究员,语音识别多项技术开创者
- Speech recognition with deep recurrent neural networks, 2013.
- Hybrid speech recognition with deep bidirectional LSTM, ASRU 2013.
- Towards End-To-End Speech Recognition with Recurrent Neural Networks, ICML 2014.
- Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, ICML 2006.
Google Speech
- Google Speech Processing from Mobile to Farfield, CHiME 2016. Google_Speech_Processing
- Tara N. Sainath et al., "Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing (2017).
- Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, ICASSP 2015.
- Convolutional, long short-term memory, fully connected deep neural networks, ICASSP 2015.
- Context dependent phone models for LSTM RNN acoustic modelling, ICASSP 2015.
- Learning the Speech Front-end With Raw Waveform CLDNNs, InterSpeech 2015.
Baidu
- Deep Speech 2 End-to-End Speech Recognition in English and Mandarin, JMLR 2016.
- Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling
JHU
Parallel training of DNNs with natural gradient and parameter averaging, ICLR Workshop 2015.
Audio augmentation for speech recognition, InterSpeech 2015.
CMU
EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding, ASRU 2015.