“语音识别”版本间的差异

来自iCenter Wiki
跳转至: 导航搜索
传统方法综述
Google
第31行: 第31行:
 
=== Google ===
 
=== Google ===
  
[http://www.cs.toronto.edu/~graves/ Alex Graves],DeepMind研究员,语音识别多项技术开创者
+
[http://www.cs.toronto.edu/~graves/ Alex Graves],Google DeepMind研究员,语音识别多项技术开创者
  
 
# Speech recognition with deep recurrent neural networks, 2013.
 
# Speech recognition with deep recurrent neural networks, 2013.

2017年3月6日 (一) 02:14的版本

语音识别

语音识别,Automatic Speech Recognition,简称ASR

基本工具

LSTM

Long short term memory neural network

  1. Long short term memory neural computation, Neural computation 9 (8), 1735-1780, 1997. LSTM

CTC

Connectionist temporal classification

  1. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, ICML 2006.

GRU

Gated Recursive Unit

  1. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, SSST-8, 2014.

研究

传统方法综述

  1. S. Karpagavalli and E. Chandra. "A Review on Automatic Speech Recognition Architecture and Approaches." International Journal of Signal Processing, Image Processing and Pattern Recognition 9, No. 4 (2016): 393-404.

Google

Alex Graves,Google DeepMind研究员,语音识别多项技术开创者

  1. Speech recognition with deep recurrent neural networks, 2013.
  2. Hybrid speech recognition with deep bidirectional LSTM, ASRU 2013.
  3. Towards End-To-End Speech Recognition with Recurrent Neural Networks, ICML 2014.
  4. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, ICML 2006.

Google Speech

  1. Google Speech Processing from Mobile to Farfield, CHiME 2016. Google_Speech_Processing
  1. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, ICASSP 2015.
  2. Convolutional, long short-term memory, fully connected deep neural networks, ICASSP 2015.
  3. Context dependent phone models for LSTM RNN acoustic modelling, ICASSP 2015.
  4. Learning the Speech Front-end With Raw Waveform CLDNNs, InterSpeech 2015.

Baidu

  1. Deep Speech 2 End-to-End Speech Recognition in English and Mandarin, JMLR 2016.
  2. Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling