640 likes | 819 Views
新的挑战 - 基于内容的信息处理. 张 钹 清华信息科学与技术国家实验室 智能技术与系统国家重点实验室 清华大学信息学院、计算机系. 一、经典信息论 Classical Information Theory. Semantics vs. Information Processing Frequently the messages have meaning: that is they refer to or are correlated according to some
E N D
新的挑战 -基于内容的信息处理 张 钹 清华信息科学与技术国家实验室 智能技术与系统国家重点实验室 清华大学信息学院、计算机系
一、经典信息论Classical Information Theory • Semantics vs. Information Processing • Frequently the messages have meaning: that is • they refer to or are correlated according to some • system with certain physical or conceptual entities. • These semantic aspects of communication • irrelevant to the engineering problem. • C. E. Shannon, A mathematical theory of communication, 1948
经典通讯模型 • Communication Model: Sender Receiver (Markov) Stochastic Process -C. E. Shannon
图灵计算理论Turing Computation Theory Deterministic Turing Machine (1937) Computability Algorithms Computational Complexity Finite StateController Q Read-write Head P tape -4 -3 -2 -1 0 +1
经典信息论下的信息处理Information Processing The conversion of latent information into manifest information _C. E. Shannon X -p(x) Y - Dissipation Equivocation Transformation=equivocation-dissipation Uncertainty (error, noisy, deformation,…)
经典信息论在文本、图像处理中的应用 • The central paradigm of classical information theory is the engineering problem of the transmission of information over a noisy channel • Communication: coding, data compression, • noisy-channel coding,.. • Text: editor, compression, spell and syntax • correction,… • Image: editor, lossless compression, noise • suppression, enhancement, …
二、基于内容(语义)的信息处理 • Information (Web) Age • Complex (variety of ) information: • text, image, speech, video,… • A huge amount of data • Man-Machine Interaction • Content (Semantics) based Information Processing • Information Retrieval, Classification, Summary, • Recognition, Understanding,..
从处理形式(Form)到与内容(Semantics)相关的处理从处理形式(Form)到与内容(Semantics)相关的处理 Natural Man-Machine Interaction Users Input (Encoding) Output (Decoding) Content Codes Form Computer Network Computer
三、句法分析(Syntact Analysis) Holistic Coding (Gestalt) 11001001010100100010101010011111000010101000001000111111011110010010001000100000010010101000100111110100 ? Semantics S Code (Form) X The basic properties among different quotient spaces such as the falsity preserving, the truth preserving properties are discussed. There are three quotient-space model construction approaches. 00001001010100111110101010000011101010110100010001110101000110010110111010010011001000100111100100000111 数字视频编码技术发展至今已有半个世纪的历史,已取得很大的进展。从五十年代的差分预测编码,到七十年代的变换编码、基于块的运动预测编码,直到如今兴起的分布式编码、立体视编码、多视编码、视觉编码等等
Rule-based, Knowledge-based, Top-down, Parsing (text) Advantages: Deliberative behaviors (AI expert systems: decision making, diagnosis, design,…) Specific domain, Small-size Disadvantages: Perception, Common sense, Nature language,.. Uncertainty (exception, ambiguity, vague,..)
Detector(检测子) Semantically meaningful primitives Text: word-sentence-chapter- Image: subpart-part-object- There is no clear boundary among parts Segmentation Problem Descriptor(描述子) Structural uncertainty K. S. Fu, Syntactic pattern recognition, New York: Prentice-Hall, 1974 D. Marr, Vision, New York: Freeman, 1982
图像分割(分词) Where is the object ? What is the object ? Chicken or Egg ?
结构分析Structural analysis, Rule-based, Syntax 11001001010100100010101010011111000010101000001000111111011110010010001000100000010010101000100111110100 The basic properties among different quotient spaces such as the falsity preserving, the truth preserving properties are discussed. There are three quotient-space model construction approaches. • Image: Part, Object, … Text: Words, Sentences, … • Uncertainty problem • Scalability • Syntactic Analysis Faced Difficulty !
四、概率统计模型 Image (text) Classification: Categories Low level and local features (words) Computer easily detectable but less semantically meaningful Colors, Textures, Bag of words
不确定性处理 Uncertainty Management • How to deal with uncertainty ? • A probabilistic information processing model • Regularity: • Probabilistic distribution • Examples: • Caltech 101, 25 objects • R. O. Duda & P. E. Hart, Pattern Classification and Scene Analysis, • New York: John Wiley & Sons, 1973 * * * * * * * * * * * * * * * * * * * · · · · · ··· · ·· ·· ·· · ···· ··· ·· · · · · ·
词袋法 Bag of (Visual) Words • Defined in image patches (2005-06) • Descriptors extracted around interest points • (2002-2004) • Edge contours (2005-06) • Regions (2005-06)
检测子与描述子 Detector & Descriptor Kadir salience region (points) Histograms of Oriented Gradients (HOG) -72 dimension Zuo Yuanyuan (2010-)
CB-Codebook (Histograms) n-the number of regions in an image ri-an image region i D(w,ri)-the distance between a codeword w and region ri V-vocabulary
高维空间中的低维结构 -low-dimensional structure of high-dimensional space Data Structure
优化 Sparse representation in sample space L1 norm
数据驱动法(Data-driven) -learning from web data Germany English Learning from Data (probabilistic models) Annotation Speech Recognition Speech Synthesis Translation Service Text1 Text2 Microsoft Research Asia
概率方法的基本缺陷 • The semantic gap between low-level local features and high-level global concepts • Less semantically meaningful features: colors or their distribution (histogram), gray-values or their distribution, visual words (descriptors from interest points), image patches, image regions, edge, … • Lack of structural knowledge • Generalization capacity • Information processing without understanding
五、新的研究方向 Sender Reader X X S Uncertainty F (W,D)
句法分析的复苏 • Holistic (Gestalt), Probability, Inference • Information Structure Analysis • Syntactic + Probabilistic • Data-driven + Knowledge-driven • Bottom-up + Top-down • Part-based (Shape-based)
信息结构 Information Structure 11001001010100100010101010011111000010101000001000111111011110010010001000100000010010101000100111110100 ? Semantics Code (Holistic) The basic properties among different quotient spaces such as the falsity preserving, the truth preserving properties are discussed. There are three quotient-space model construction approaches. 00001001010100111110101010000011101010110100010001110101000110010110111010010011001000100111100100000111 数字视频编码技术发展至今已有半个世纪的历史,已取得很大的进展。从五十年代的差分预测编码,到七十年代的变换编码、基于块的运动预测编码,直到如今兴起的分布式编码、立体视编码、多视编码、视觉编码等等
信息结构分析 • History (Structural Analysis) • (1) Linguistics • Information Structure-Syntactic representation • Information packaging, Building the semantics • Focus, Topic, background, comment,… • M. A. K. Halliday (1967) Notes on transitivity and theme in English, Part II, Journal of Linguistics 3:199-244 • W. L. Chafe, Language and consciousness, Language 50:111-133
(2) Psychology • Structural Information Theory • Coding: descriptive complexities_ • frequencies of occurrence (Shannon) • Information Structure: • Formalization of visual regularity: iteration, • symmetry, and alternation • E. L. J. Leeuwenberg (1968) Structural information of visual • patterns: an efficient coding system in perception, The Hague: • Mouton • E.L. J. Leeuwenberg (1969), Quantitative specification of • information in sequential patterns, Psychological Review, 76, • 216-220
(3) Information Science • Algorithmic Information Theory • Kolmogorov Complexity • 16 bits • 1111111111111111 • 0101101001011010 • 1001110111010111 • 1100010001001011 • R. J. Solomonoff (1964), A formal theory of inductive inference, • Information & Control, V.7, No.1, pp1-22, No.2, pp. 224-254 • N. Kolmogorov (1965), Three approaches to the definition of • the quantity of information, Problems of Information Transmission, No. 1, pp. 3-11
信息结构的挖掘与利用 • Kolmogorov complexity • Related data structures • low-dimensional structure in high-dimensional • data space • Under the probabilistic (structure) framework • Learning from human being
(1) Kolmogorov Complexity The absolute measure of information content in an individual finite object Algorithmic information theory The minimum description length
信息距离 Information Distance K(.) is non-computable C. Bennett , et al, Information distance, IEEE Trans. on Information Theory, vol.44, no.4, 1998, pp.1407-1423
近似计算方法 A statistical approach to Kolgomorov complexity (information distance) f(x)-frequency N-total number
语义距离与形式距离 Semantic & Formal Distance CDM-Compressed Distance Measure (ASCII) NSD-Normalized Statistical Distance Xian Zhang, Yu Hao, et al, New information distance measureand its application in question answering system, J. Comput. Sci. & Tech. 23(4), 557-572, 2008
两篇文本之间的信息距离 Text representation: a bag of words
文本(图像)检索 Image-Text information distance Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
(2) 数据结构 -low-dimensional structure of high-dimensional space Sparse Representation A: mn matrix x: sample space, n-dimension y: pixel, m-dimension z: feature space, d-dimension Yi Ma, UIUC, USA
(3) 扁平的结构模型 -Image Region Annotation: horse, sky, mountain, grass, tree Yuan Jinhui (2008-)
区域自适应的网格划分Region-adaptive Grid Partition I-data xi-image position z-state (feature) g(zi,zj)-the probabilistic region-based constraints (co-occurrence)
模型学习 n images, each image has mi=HV grids (a) i.i.d generative model (b) i.i.d. discriminative model (c) 2-dimensional hidden Markov (2D HMM) (d) Markov Random Field (MRF) (e) Conditional Random Field (CRF)
标注配置 Given a training data, MAP (maximal a posterior) : label configuration For 2D HMM, MRF, CRF using path limited Viterbi algorithm
马尔科夫随机场模型 -MRF model Probabilistic distribution P Cs: labeling clique, C0: labeling and feature clique y* the optimal label configuration
相关文献 [1] L. E. Baum and T. Petrie. Statistical Inference for Probabilistic Functions of Finte State Markov Chains. The Annals of Mathematical Statistics, Vol. 37, No. 6, pp.1554-1563, 1966 [2] J. Lafferty et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proc. of International Conference on Machine Learning (ICML), 2001 [3] B. Taskar et al., Max-Margin Markov Networks. Advances in Neural Information Processing Systems (NIPS), 2003 [4] J. Zhu et al., Laplace Maximum Margin Markov Networks, In Proc. of International Conference on Machine Learning (ICML), 2008
实验设置 • 4002 Corel images (384256 or 256384) • 11 basic (region) concepts • Features: color moment + wavelet • 5 models: 2 without structural knowledge • (GMM, SVM) • 3 with structural knowledge • (HMM*, RMF*, CRF*)