450 likes | 889 Views
PLSA 建模思想分析. 张小洪. Contents. 什么是建模. LSA 思想方法. PLSA 图像建模. PLSA 建模的应用条件和假设. PLSA 应用及发展. 建模是什么. 软件开发中的建模 业务建模 需求模型 设计模型 实现模型 数据库模型 词法分析→提取对象→刻画对象(属性或方法) →对象关系 模型反映了事物或对象之间的关系. 模型是什么:例子. 模型是什么:例子. 映射. 建筑 汽车 电话 人像 自行车 书 树木. 模型是什么:例子. 模型是什么:例子. 映射.
E N D
PLSA建模思想分析 张小洪
Contents 什么是建模 LSA思想方法 PLSA图像建模 PLSA建模的应用条件和假设 PLSA应用及发展
建模是什么 • 软件开发中的建模 • 业务建模 • 需求模型 • 设计模型 • 实现模型 • 数据库模型 • 词法分析→提取对象→刻画对象(属性或方法) →对象关系 • 模型反映了事物或对象之间的关系
模型是什么:例子 映射 建筑 汽车 电话 人像 自行车 书 树木
模型是什么:例子 映射 人 手 马 龟 象 犬 鳄
模型是什么:例子 映射
建模是什么 y x 目标函数 G x S y LM 机器学习 映射或函数
建模是什么 • 数学建模 • 模型 函数 • 泛函 • 求满足目标和条件的函数过程 • 基于经验数据的建模 机器学习问题 • 学习问题是指依据经验数据选取所期望的依赖关系的问题 • 学习过程是一个从给定的函数集中选择一个适当函数的过程。 • 模式识别 • 函数值Y:指标集
建模是什么:模式识别 选择函数的过程 函数集
LSA方法 问题:如何分类文章 Technical Memo Titles c1: Human machine interface for ABC computer applications c2: A survey of user opinion of computersystemresponsetime c3: The EPSuserinterface management system c4: System and humansystem engineering testing of EPS c5: Relation of user perceived responsetime to error measurement m1: The generation of random, binary, ordered trees m2: The intersection graph of paths in trees m3: Graphminors IV: Widths of trees and well-quasi-ordering m4: Graphminors: A survey
LSA方法 如何表示文章:Vector Space Model r (human.user) = -.378 r (human.minors) = -.378 1单词本 2统计词频 问题?
LSA方法:SVD • Singular Value Decomposition A=USVT • Dimension Reduction {~A}~={~U}{~S}{~V}T
LSA方法:SVD {U} = 降至2维
LSA方法:SVD {S} = 降至2维
LSA方法:SVD {V} = 降至2维
r (human.user) = .94 r (human.minors) = -.83 LSA方法:SVD 同义词 问题
LSA 方法:讨论 • SVD方法为何能有效?其假设是什么? • LSA does not define a properly normalized probability distribution • No obvious interpretation of the directions in the latent space • From statistics, the utilization of L2 norm in LSA corresponds to a Gaussian Error assumption which is hard to justify in the context of count variables • Polysemy problem • 怎样可视化SVD的结果?
PLSA:问题 建筑 汽车 电话 人像 自行车 书 树木
PLSA:问题 • 问题 • 图像怎样表示成特征向量? • 特征向量怎样构成“图像单词”? • 训练图像集怎样表示成共生矩阵(词频矩阵)? • 模型选择?
….. PLSA:问题 frequency codewords
PLSA:问题 Object Bag of ‘words’
learning recognition 2.codewords dictionary 1.feature detection & representation 3.image representation category decision category models (and/or) classifiers
PLSA:Feature detectionand representation Compute SIFT descriptor [Lowe’99] Normalize patch Detect patches [Mikojaczyk and Schmid ’02] [Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03] Slide credit: Josef Sivic
… PLSA:Feature detection and representation
… PLSA:Codewordsdictionary formation
… PLSA:Codewordsdictionary formation Vector quantization
….. PLSA:Image representation frequency codewords
Representation codewords dictionary feature detection & representation image representation 2. 1. 3.
Learning and Recognition codewords dictionary category decision category models (and/or) classifiers
PLSA Learning and Recognition • Generative method: • - graphical models • Discriminative method: • - SVM category models (and/or) classifiers
generative models • Naïve Bayes classifier • Csurka Bray, Dance & Fan, 2004 • Hierarchical Bayesian text models (pLSA and LDA) • Background: Hoffman 2001, Blei, Ng & Jordan, 2004 • Object categorization: Sivic et al. 2005, Sudderth et al. 2005 • Natural scene categorization: Fei-Fei et al. 2005
First, some notations • wn: each patch in an image • wn = [0,0,…1,…,0,0]T • w: a collection of all N patches in an image • w = [w1,w2,…,wN] • dj: the jth image in an image collection • c: category of the image • z: theme or topic of the patch
Object class decision Prior prob. of the object classes Image likelihood given the class Case #1: the Naïve Bayes model c w N Csurka et al. 2004
Case #2: Hierarchical Bayesian text models z d w N D “face” Probabilistic Latent Semantic Analysis (pLSA) Sivic et al. ICCV 2005
Observed codeword distributions Theme distributions per image Codeword distributions per theme (topic) The pLSA model Slide credit: Josef Sivic
Recognition using pLSA Slide credit: Josef Sivic
Learning the pLSA parameters Observed counts of word i in document j Maximize likelihood of data using EM M … number of codewords N … number of images Slide credit: Josef Sivic
PLSA:讨论 • 数据的特征,PLSA应用条件和假设? • Not a well-defined generative model of documents; d is a dummy index into the list of documents in the training set (as many values as documents) • No natural way to assign probability to a previously unseen document • Number of parameters to be estimated grows with size of training set
Thank You ! iiec.cqu.edu.cn