“语音识别”版本间的差异
来自iCenter Wiki
(→语音识别) |
(→语音文件预处理) |
||
第5行: | 第5行: | ||
==语音文件预处理== | ==语音文件预处理== | ||
SOX [http://sox.sourceforge.net/ SOX_CODE] | SOX [http://sox.sourceforge.net/ SOX_CODE] | ||
+ | |||
+ | ffmpeg [https://ffmpeg.org/ FFMPEG] | ||
== 神经网络架构单元 == | == 神经网络架构单元 == |
2017年3月19日 (日) 05:07的版本
目录
语音识别
语音识别,Automatic Speech Recognition,简称ASR
语音文件预处理
SOX SOX_CODE
ffmpeg FFMPEG
神经网络架构单元
LSTM
Long short term memory neural network(LSTM)
- Long short term memory neural computation, Neural computation 9 (8), 1735-1780, 1997. LSTM
CTC
Connectionist temporal classification(CTC)
- Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, ICML 2006.
GRU
Gated Recursive Unit(GRU)
- On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, SSST-8, 2014.
研究
传统方法综述
- S. Karpagavalli and E. Chandra. "A Review on Automatic Speech Recognition Architecture and Approaches." International Journal of Signal Processing, Image Processing and Pattern Recognition 9, No. 4 (2016): 393-404.
Alex Graves
Alex Graves,Google DeepMind研究员,语音识别多项技术开创者
- Towards End-To-End Speech Recognition with Recurrent Neural Networks, ICML 2014.
- Speech recognition with deep recurrent neural networks, 2013.
- Hybrid speech recognition with deep bidirectional LSTM, ASRU 2013.
- Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, ICML 2006.
Google Speech
- Google Speech Processing from Mobile to Farfield, CHiME 2016. Google_Speech_Processing
- Tara N. Sainath et al., "Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing (2017).
- Zazo Candil, Rubén; Tara N. Sainath, Simko, Gabor; Parada, Carolina, Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection, InterSpeech 2016.
- Chan, William, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition." In Acoustics, Speech and Signal Processing, ICASSP 2015.
- Sainath, Tara N., Oriol Vinyals, Andrew Senior, and Haşim Sak. "Convolutional, long short-term memory, fully connected deep neural networks.", ICASSP 2015.
- Context dependent phone models for LSTM RNN acoustic modelling, ICASSP 2015.
- Learning the Speech Front-end With Raw Waveform CLDNNs, InterSpeech 2015.
Baidu
- Amodei, Dario, et al., Deep Speech 2 End-to-End Speech Recognition in English and Mandarin, JMLR 2016.
- Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling
JHU
Dan Povey
- Parallel training of DNNs with natural gradient and parameter averaging, ICLR Workshop 2015.
- Ko, Tom, Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur, "Audio augmentation for speech recognition.", InterSpeech 2015.
CMU
- EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding, ASRU 2015.
Amazon Alexa
Cocktail party problem
- Anchored Speech Detection, InterSpeech 2016.