计算机视听觉－人工智能之梦 Computer Seeing and Hearing-A Dream of AI

计算机视听觉－人工智能之梦Computer Seeing and Hearing-A Dream of AI 张钹清华大学信息科学与技术学院清华大学计算机科学与技术系清华信息科学与技术国家实验室智能技术与系统国家重点实验室

Computer Vision /Hearing Is it possible ? Yes No It is just a daydream !

The Characteristic of Auditory Information (Data) Ears, Earphones A continuous wave Digital Data: 20K-100K bits/s Sparseness (Redundant) Noisy

The Characteristics of Visual Information (Data) • Eyes, Digital Camera • Pixel-based (million, ten million bits) • Sparseness (Redundancy) • Noisy • Eyes: a sequence of images • 109 bits/sec

The Sparseness of Auditory Signal • 采样频率位分辨率 • 广播质量－48kHz • CD质量－44kHz 16位 • 收音音质－22kHZ 8位 • 可接受的音乐－11kHz 4位 • 可接受的语音－5kHz

The Sparseness of Visual Signal 分辨率与识别率的关系 (conceptual)

一个不适定问题An Ill-posed Problem Microphone (Ears) (Camera (Eyes)) Sparse, redundant, noisy data (110000111100011100011000………… ) Existence Uniqueness Stability Speaker-invariant Vowel Representation Vowel-invariant Speaker Representation （Object-invariant Representation）

1. Segmentation & Recognition

Image Segmentation vs. Recognition Where is the object ? What is the object ? ? Which comes first, Chicken or Egg

Speech Segmentation vs. Recognition ？What, Where

技术上的困难(Technology) Sparse, redundant, noisy data A Robust Detector An Invariant Descriptor Speaker-invariant Vowel Representation Vowel-invariant Speaker Representation

人类是如何解决的？ Top-down feedback Top-down feedback Local connection Data-driven High-level Apriori-knowledge From egg to chicken

The Relation Between Activation Patterns and Early Stages of Sound Processing Speech Encoding occurs not only in specialized high-level region but also in early stages of sound processing. The early sound processing may exhibit complex spectrotemporal receptive fields and may participate in high-level encoding of auditory objects, e.g., via local feedback

Multi-layer Neural Network with feedback connections G. E. Hinton, The “wake-sleep” algorithms for unsupervised neural networks, SCIENCE vol.268, 26 May 1995, 1158-1161

Representation RBM: Restricted Boltzmann Machine

Experimental Results G. E. Hinton, Learning multiple layers of representation, TRENDS in Cognitive Sciences vol.11, no.10, 428-434, 2007

2、Feature Extraction

Computer Robustly Extractable Features Sparse, redundant, noisy data Statistical Approaches Speech-base Invariant Statistics (Features) Speaker-invariant Vowel Representation Vowel-invariant Speaker Representation

Statistical Method • 选择一个语音训练库 • 提取语音特征 • 无监督学习（Classification） • 分类准则－Generalization • 提取何种特征？ Computer robustly detectable

Representation at Different Granularities Global Features-one vector The coarsest An Image The finest Pixel Based-1280X800X3 vectors

Pixel-based Representation-the finest representation ••••••••••• ••••••••••• ••••••••••• ••••••••••• ••••••••••• ••••••••••• ••••••••••• ••••••••••• • millionX3-dimensional vectors • -all the details

Global Features-the coarsest representation Color moments N-the number of pixels, P-the value of each color One 9-dimensional vector

Coarse vs. Fine Representation

Representation with Middle Grain-Size • Region-based Representation •••••• ••••••••••••••••

Local (Spatial) Feature Region-01 Region-11 Region-12 Foreground vs. Background

Vector Representation A set of vectors (tens) (with different length) Similarity Measure Weighted

Region-adaptive Grid Partition Jinhui Yuan (2005…)

Hierarchical（粒度）结构 Semantics (text, image) (X, F, f )-the finest space ([X], [F], [f] )-coarse space [X] the quotient space of X [F] the quotient structure of F an equivalence class [f]-the quotient attributes of f • ••• ••••••••••••• Semantic Gap ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• Primitive (words, pixels)

PM: Pyramid Match (feature space-quantization level) SPM: Spatial Pyramid Match (physical space-grid) FESCO: Feature Spatial Covariant Kernel

Concept Detection from Video Shots

Experiments TRECVID 2005, 10 concepts 170 hours news (MSNBC, NBC Nightly News, CNN, LBC, CCTV, NTDTV) TRECVID 2006, 20 concepts 170+150 hours news Keypoint descriptor: 64-dimensional SURF feature (Speeded Up Robust Features) AP: Non-interpolated Average Precision MAP: Mean Average Precision (7 concepts)

TRECVID Data d: training data, t: testing data

Coarse vs. Fine Granulation MAP: 7 concepts: car, explosion-fire, flag-US, maps, mountain, sports, waterscape-waterfront

Multi-granulation MAP: 7 concepts: car, explosion-fire, flag-US, maps, mountain, sports, waterscape-waterfront

Multi-granulation (2) MAP: 7 concepts: car, explosion-fire, flag-US, maps, mountain, sports, waterscape-waterfront

Multi-Granular & Multi-modal TRECVID2005 (Video Retrieval Evaluation Conference) 86.6 hours of news videos (45766 shots in 140 video clips) Features: A: auto-speech recognition text T: visual texture R: color of segmented image regions

PMSRA Probabilistic Model Supported Rank Aggregation

The Comparison between Uni-modal and Multi-granular, modal

TRECVID Text Retrieval Conference Video Retrieval Evaluation

声波、声谱图（Spectrograms）

语音信息 Global Features-one vector The coarsest The Finest-sampling

不同粒度的语音特征 • 语音单元（粒度）选择： • 音素、音节、词…. • 语音参数选择 • MFCC: Mel 频率倒谱参数 (Mel Frequency Cepstral Coefficients) • LSP：线谱对 (Line Spectrum Pair) • ICA (Independent Component Analysis) • 多（粒度）特征融合

3、Structural Model • Temporal Model (HMM) • Spatial Model

语音的时间结构(Temporal Structure) 多粒度结构

Image Region Annotation -horse, sky, mountain, grass, tree

Region-adaptive Grid Partition (2)

Experiments • 4002 Corel images (384256 or 256384) • 11 basic (region) concepts • Features: color moment + wavelet • 5 models: 2 without structural knowledge • (GMM, SVM) • 3 with structural knowledge • (HMM*, RMF*, CRF*)

Image Region Annotation

Spatial Structural Representation n images, each image has mi=HV grids (a) i.i.d generative model (b) i.i.d. discriminative model (c) 2-dimensional hidden Markov (2D HMM) (d) Markov Random Field (MRF) (e) Conditional Random Field (CRF)

计算机视听觉－人工智能之梦 Computer Seeing and Hearing-A Dream of AI

计算机视听觉－人工智能之梦 Computer Seeing and Hearing-A Dream of AI

Presentation Transcript

Hearing Impairment

Hearing Aids and Hearing Tests | Hearing & Balance Centre Sydney

Hearing Safety

Hearing Loss

How to Take Care of Your Hearing Equipment

The Dream of an Intelligent Machine

Hearing Aids Market size and Key Trends in terms of volume and value 2016-2026

Get Siemens Hearing Aids in Delhi at Best price

Hearing Aids Center | The Hearing Specialists | Best Hearing Aid Service Center in India

Hearing Aid

Hearing aid care

Hearing test

Hearing Loss

Best Hearing Aid Devices Cleaning Kit|Care of Hearing Aids

Best Hearing Aid Devices Cleaning Kit|Care of Hearing Aids

Rejoice the joy of Hearing with latest Digital Hearing Aids

Ear Solution Is Ultimate Hearing Aids Dealer in Mumbai | Supplier | Accessories

Best hearing aid clinic hyderabad | Top Hearing Aid Centre in Hyderabad

TV hearing aid

TV hearing aid