PLSA 建模思想分析

PLSA建模思想分析 张小洪

Contents 什么是建模 LSA思想方法 PLSA图像建模 PLSA建模的应用条件和假设 PLSA应用及发展

建模是什么 • 软件开发中的建模 • 业务建模 • 需求模型 • 设计模型 • 实现模型 • 数据库模型 • 词法分析→提取对象→刻画对象（属性或方法） →对象关系 • 模型反映了事物或对象之间的关系

模型是什么：例子

模型是什么：例子 映射建筑汽车电话人像自行车书树木

模型是什么：例子

模型是什么：例子 映射人手马龟象犬鳄

模型是什么：例子 映射

建模是什么 y x 目标函数 G x S y LM 机器学习映射或函数

建模是什么 • 数学建模 • 模型函数 • 泛函 • 求满足目标和条件的函数过程 • 基于经验数据的建模机器学习问题 • 学习问题是指依据经验数据选取所期望的依赖关系的问题 • 学习过程是一个从给定的函数集中选择一个适当函数的过程。 • 模式识别 • 函数值Y：指标集

建模是什么：模式识别 选择函数的过程函数集

LSA方法 问题：如何分类文章 Technical Memo Titles c1: Human machine interface for ABC computer applications c2: A survey of user opinion of computersystemresponsetime c3: The EPSuserinterface management system c4: System and humansystem engineering testing of EPS c5: Relation of user perceived responsetime to error measurement m1: The generation of random, binary, ordered trees m2: The intersection graph of paths in trees m3: Graphminors IV: Widths of trees and well-quasi-ordering m4: Graphminors: A survey

LSA方法 如何表示文章：Vector Space Model r (human.user) = -.378 r (human.minors) = -.378 1单词本 2统计词频问题？

LSA方法：SVD • Singular Value Decomposition A=USVT • Dimension Reduction {~A}~={~U}{~S}{~V}T

LSA方法：SVD {U} = 降至2维

LSA方法：SVD {S} = 降至2维

LSA方法：SVD {V} = 降至2维

r (human.user) = .94 r (human.minors) = -.83 LSA方法：SVD 同义词问题

LSA方法：SVD

LSA 方法：讨论 • SVD方法为何能有效？其假设是什么？ • LSA does not define a properly normalized probability distribution • No obvious interpretation of the directions in the latent space • From statistics, the utilization of L2 norm in LSA corresponds to a Gaussian Error assumption which is hard to justify in the context of count variables • Polysemy problem • 怎样可视化SVD的结果？

PLSA：问题 建筑汽车电话人像自行车书树木

PLSA：问题 • 问题 • 图像怎样表示成特征向量？ • 特征向量怎样构成“图像单词”？ • 训练图像集怎样表示成共生矩阵（词频矩阵）？ • 模型选择？

….. PLSA：问题 frequency codewords

PLSA：问题 Object Bag of ‘words’

learning recognition 2.codewords dictionary 1.feature detection & representation 3.image representation category decision category models (and/or) classifiers

PLSA：Feature detection and representation

PLSA：Feature detectionand representation Compute SIFT descriptor [Lowe’99] Normalize patch Detect patches [Mikojaczyk and Schmid ’02] [Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03] Slide credit: Josef Sivic

… PLSA：Feature detection and representation

… PLSA：Codewordsdictionary formation

… PLSA：Codewordsdictionary formation Vector quantization

PLSA：Codewordsdictionary formation

….. PLSA：Image representation frequency codewords

Representation codewords dictionary feature detection & representation image representation 2. 1. 3.

Learning and Recognition codewords dictionary category decision category models (and/or) classifiers

PLSA Learning and Recognition • Generative method: • - graphical models • Discriminative method: • - SVM category models (and/or) classifiers

generative models • Naïve Bayes classifier • Csurka Bray, Dance & Fan, 2004 • Hierarchical Bayesian text models (pLSA and LDA) • Background: Hoffman 2001, Blei, Ng & Jordan, 2004 • Object categorization: Sivic et al. 2005, Sudderth et al. 2005 • Natural scene categorization: Fei-Fei et al. 2005

First, some notations • wn: each patch in an image • wn = [0,0,…1,…,0,0]T • w: a collection of all N patches in an image • w = [w1,w2,…,wN] • dj: the jth image in an image collection • c: category of the image • z: theme or topic of the patch

Object class decision Prior prob. of the object classes Image likelihood given the class Case #1: the Naïve Bayes model c w N Csurka et al. 2004

Case #2: Hierarchical Bayesian text models z d w N D “face” Probabilistic Latent Semantic Analysis (pLSA) Sivic et al. ICCV 2005

Observed codeword distributions Theme distributions per image Codeword distributions per theme (topic) The pLSA model Slide credit: Josef Sivic

Recognition using pLSA Slide credit: Josef Sivic

Learning the pLSA parameters Observed counts of word i in document j Maximize likelihood of data using EM M … number of codewords N … number of images Slide credit: Josef Sivic

PLSA：讨论 • 数据的特征，PLSA应用条件和假设？ • Not a well-defined generative model of documents; d is a dummy index into the list of documents in the training set (as many values as documents) • No natural way to assign probability to a previously unseen document • Number of parameters to be estimated grows with size of training set

PLSA的应用及发展

Thank You ! iiec.cqu.edu.cn

PLSA 建模思想分析

PLSA 建模思想分析

Presentation Transcript

Clustering Search Results Using PLSA

Advisor: Hsin-Hsi Chen Reporter: Y.H Chang 2008-03-21

LSA, pLSA, and LDA Acronyms, oh my!