2016年10月27日 (四) 15:34的版本

版权申明

CC BY-NC-SA

教学团队

互联网+实验室 iNetLab

陈震马晓东章屹松王蓓蓓高英

助教：郑文勋王晗

教学背景

随着计算机应用的日益普及和计算机网络的快速发展，互联网+的社会经济浪潮日益显现，以互联网连接的物理硬件系统和以大数据云计算为基础的信息系统，使得具有智能的机器人、自动驾驶的无人机等等智能系统成为新的技术发展浪潮，并催生了“智能硬件”的新生态。

课程内容

云＋端

智能端

移动设备：Android / iOS

嵌入式设备：Raspberry Pi 2 / Arduino

可穿戴式硬件：手环 / Apple Watch

嵌入式开发平台

NVIDIA JETSON TK1

jetson-tk1

Raspberry Pi

云计算与大数据

数据中心

Google公司拥有分布全球的十几个数据中心，上百万台机器的集群计算，具体数目是保密数字

L.A. Barroso, J. Clidaras, and U. Holzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2nd ed., 2013.

云计算平台

iCenter-Cloud

Azure

Fox, Armando, et al. "Above the clouds: A Berkeley view of cloud computing." Dept. Electrical Eng. and Comput. Sciences, University of California, Berkeley, Rep. UCB/EECS 28 (2009): 13.

大数据平台

Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, OSDI 2004.
Chang F, Dean J, Ghemawat S, et al. Bigtable: A distributed storage system for structured data. OSDI 2006.
Yang, Fangjin, et al. "Druid: a real-time analytical data store." ACM sigmod, 2014.

机器学习

Jeffrey Dean et al. "Large scale distributed deep networks." Advances in Neural Information Processing Systems. 2012.

智能系统

人工智能定义

人工智能，是指计算机系统具备从听说读写到搜索、推理、决策和回答问题等类人智能的能力

感知、理解、决策

人工智能历史

过去经历了2次高潮与2次低谷

网络和云计算所支持的计算能力

基于大数据的机器学习的算法进步

机器感知

语音识别 Google_ASR

计算机视觉

自然语言理解

深度神经网络

Deep Learning

项目分组

第一组

组长：许越

组员：吴俣帅杨应人

第二组

组长：王亦凡

组员：刘梦旸张力

第三组

组长：刘晓明

组员：常昊男全光林朱泽宇

第四组

组长：郑钰琦

组员：郑安然郑钰琦高一川

作业1

Install TensorFlow Mobile in HUAWEI Kylin 930

论文研读

第一次

Group1

Convolutional, long short-term memory, fully connected deep neural networks, ICASSP 2015.

Context dependent phone models for LSTM RNN acoustic modelling, ICASSP 2015.

Group2

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, ICASSP 2015.

Group3

Audio augmentation for speech recognition

Group4

Deep Speech 2 End-to-End Speech Recognition in English and Mandarin, JMLR 2016.

第二次

Parallel training of DNNs with natural gradient and parameter averaging, ICLR Workshop 2015. Dan Povey
Long short term memory-neural computation, Neural computation 9 (8), 1735-1780, 1997. LSTM
Karpagavalli, S., and E. Chandra. "A Review on Automatic Speech Recognition Architecture and Approaches." International Journal of Signal Processing, Image Processing and Pattern Recognition 9, no. 4 (2016): 393-404.
EESEN_ End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding, ASRU 2015.
Learning the Speech Front-end With Raw Waveform CLDNNs, InterSpeech 2015.

课程项目

学生准备

携带笔记本，智能手机

(Bring your own laptop computers and camera-ready smart phones)

课程项目-语音识别

ASR-实验1

声控智能硬件-语音指令录音

通过给手机语音指令，手机APP自动识别指令种类，进行相应的控制。

语音指令录音

     a蓝牙开机
     b蓝牙拨打电话 / bb蓝牙打电话
     c蓝牙接听电话 / cc蓝牙接电话
     d蓝牙拒接
     e蓝牙播放音乐 / ee蓝牙开始音乐
     f蓝牙暂停音乐 / ff蓝牙停止音乐
     g蓝牙上一首   / gg蓝牙上一曲
     h蓝牙下一首   / hh蓝牙下一曲
     i蓝牙音量增大 / ii蓝牙声音增大   / iii蓝牙音量增加 / iiii蓝牙声音增加
     j蓝牙音量减小 / jj蓝牙声音减小 
     k蓝牙关机
     l蓝牙电量提醒 / ll蓝牙还剩多少电 / lll蓝牙还剩多少电量

存放目录：\\166.111.134.110\team-saturn\智能硬件录音

以学号建立文件夹，限定使用*.wav格式。

每位同学提交24条语音，要求高质量保证。

时间：10月7日中午12点之前

ASR-实验2

声控智能硬件-频谱图生成

将24条语音文件生成对应的频谱图（Spectrogram）[5]，频谱图文件名称为*.spec
熟悉TensorFlow环境[6]，使用TensorFlow搭建3层全连接的神经网络（24个softmax输出）。

时间：2016年10月14日中午12点之前

人脸识别

系统要求

Ubuntu 15.04 以上

安装 opencv3

$ workon cv3-python27

$ git clone https://github.com/bytefish/facerec.git

$ cd facerec/
$ cd py/
$ cd apps/
$ cd videofacerec/

$ python simple_videofacerec.py mymodel.pkl

介绍

fisherface

课程项目-云+端整合

Thrift协议

Client端

调用API录音

调用Thrift接口

Server端

接收录音文件

调用服务端程序

参考

Android开发入门

课程项目-深度学习

基本任务：手写数字识别

利用附件提供的Matlab Deep Learning Toolbox中的卷积神经网络（CNN），在MNIST手写数字样本集上，利用训练集样本进行训练，并对测试集样本进行测试。

建议阅读附件中的背景知识和工具箱中的CNN源代码，在了解算法原理和技术实现细节基础上，参照示例程序CNN/test_example_CNN.m，自行调整算法中至少一种关键参数或设置，例如调整CNN的卷积层及下采样层的层数、或模板大小等参数；在至少一种与原始程序不同的参数或设置下分别进行训练，做出测试集识别率（或错误率）随参数或设置变化的对照图或对照表，并标出最高识别率对应的参数或设置。

提高任务：物体检测

请参考工具箱中tests/test_example_SAE.m示例文件，利用STL-10数据集中的无标注图片训练一个自动编码器，再用自动编码器隐含层的参数初始化一个神经网络分类器，并使用训练集中有标注的图片进行训练，最后使用神经网络分类器在测试集上进行测试。测试阶段，调用nntest的返回值“er”为测试集上的识别错误率。

在训练过程中，尝试调整算法中至少一种关键参数或设置，例如自动编码器隐含层的节点个数；在至少一种与原始程序不同的参数或设置下进行实验，做出测试结果随参数或设置变化的对照图或对照表。

在test_example_SAE.m中，建议修改第16行和第28行的opts.numepochs数值，比如将原来的1改为3；如果改变自动编码器隐含层节点数目进行实验比较，需要注意：

第12行：sae = saesetup([784 100]);

第22行：nn = nnsetup([784 100 10]);

中红色标注的参数要修改一致。

报告要求

报告应包含以下内容：

在MNIST手写数字样本测试集上，识别率（或错误率）随参数或设置变化的对照图或对照表。
（选做）在STL-10 测试集上，识别率（或错误率）随参数或设置变化的对照图或对照表。
总结收获和体会。

提交报告时，请一并提交自己编写或修改过的源代码。

附件

更多帮助，请阅读实验指导书。

实验指导书和工具箱下载：

http://166.111.6.122/RegionDownloadService/1511/0A6A33E76DFD6DFCD75E20F7C1226B7E3.html

（助教：王晗 wang-han13@mails.tsinghua.edu.cn）

课程项目-智能医药问答

（常嘉辉）

致谢

本课程获得微软Azure云计算与机器学习捐赠支持。

感谢微软公司杨滔经理，章艳经理，刘士君工程师，闫伟工程师。

参考文献

Stuart Russell and Peter Norvig, Artificial Intelligence: A modern approach. Prentice-Hall, 2005.
Ferrucci, David A. "Introduction to “this is watson”." IBM Journal of Research and Development 56.3.4 (2012): 1-1.
Bradski, Gary, and Adrian Kaehler. Learning OpenCV: Computer vision with the OpenCV library. O'Reilly Media, Inc., 2008.
Hauswald, Johann, et al. "Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers." ASPLOS, ACM, 2015.
Spectrogram, https://en.wikipedia.org/wiki/Spectrogram
TensorFlow, https://www.tensorflow.org

@@ 第83行： / 第83行： @@
 ===机器感知===
-====语音识别====
+语音识别
 [http://spandh.dcs.shef.ac.uk/chime_workshop/presentations/CHiME_2016_Bacchiani_keynote.pdf Google_ASR]
-====计算机视觉====
+计算机视觉
-====自然语言理解====
+自然语言理解
 ===深度神经网络===
@@ 第104行： / 第103行： @@
 [https://github.com/dmlc/mxnet dmlc_mxnet]
-=作业1=
-Install TensorFlow Mobile in HUAWEI Kylin 930
 ===智能问答===
@@ 第152行： / 第147行： @@
 郑钰琦
 高一川
+=作业1=
+Install TensorFlow Mobile in HUAWEI Kylin 930
 =论文研读=
@@ 第225行： / 第224行： @@
 声控智能硬件-频谱图生成
-# 将24条语音文件生成对应的频谱图（spectrogram）[5]，频谱图文件名称为*.spec
+# 将24条语音文件生成对应的频谱图（Spectrogram）[5]，频谱图文件名称为*.spec
 # 熟悉TensorFlow环境[6]，使用TensorFlow搭建3层全连接的神经网络（24个softmax输出）。
@@ 第323行： / 第322行： @@
 =参考文献=
-. Stuart Russell and Peter Norvig, Artificial Intelligence: A modern approach. Prentice-Hall, 2005.
+# Stuart Russell and Peter Norvig, Artificial Intelligence: A modern approach. Prentice-Hall, 2005.
+# Ferrucci, David A. "Introduction to “this is watson”." IBM Journal of Research and Development 56.3.4 (2012): 1-1.
-. Ferrucci, David A. "Introduction to “this is watson”." IBM Journal of Research and Development 56.3.4 (2012): 1-1.
+# Bradski, Gary, and Adrian Kaehler. Learning OpenCV: Computer vision with the OpenCV library. O'Reilly Media, Inc., 2008.
+# Hauswald, Johann, et al. "Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers." ASPLOS, ACM, 2015.
-. Bradski, Gary, and Adrian Kaehler. Learning OpenCV: Computer vision with the OpenCV library. O'Reilly Media, Inc., 2008.
+# Spectrogram, https://en.wikipedia.org/wiki/Spectrogram
+# TensorFlow, https://www.tensorflow.org
-. Hauswald, Johann, et al. "Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers." ASPLOS, ACM, 2015.
-. Spectrogram, https://en.wikipedia.org/wiki/Spectrogram
-. TensorFlow, https://www.tensorflow.org

“智能硬件与智能系统”版本间的差异