Human Action Recognition using Spatio-Temporal Classification

Human Action Recognition using Spatio-Temporal Classification 方競賢 Ching-Hsien Fang 98.10.14 func1115@csie.ncku.edu.tw

Outline • Introduction • Flowchart • Learning for Spatio-Temporal Classification 3.1 Spatial Subspace Creation using Locality Preserving Projection(LPP) 3.2 Learning for Classification in Temporal Subspace • Recognition Process • Experimental Results • Conclusions

1. Introduction • The major concept is that we would like to add the temporal information into the action recognition process (p1,1,2). • Our “Temporal-Vector Trajectory Learning(TVTL)” is : -supervised (有使用label) -linear(有找到一個線性轉換矩陣) • Weuse the feature of human silhouettesinstead of feature points because the feature point method would be limited due to the discard of global structural information (p1,3,8). • The silhouette-based method are becoming more and more popular because the feature of human silhouette is easier to obtain and it still contain the detailed body shape information (p1,4,1). (Page No. , Paragraph No. , Line No.)

2. Flowchart– 變數介紹(1/2) • 共有M個trainingsequence, 這M個sequence全部加總共有N個frame. • , 以本篇paper影像大小為64*48, 故 • 是資料在做完LPP之後求到的轉換矩陣. • 是資料經由轉換矩陣轉至的新空間. • , 在本篇paper影像在LPP後降至31維, 故 • 是我們設計的一個矩陣, 跟相乘後可以得到temporal data , , 維度是d*(2t+1) , 而t代表取前後t個frame來當輔助加強的時間資訊, 例如t=2, 維度就會變成5*d = 5*31=155維度的資料. • 中的是做完metric learning後的空間, 其維度跟相同 • 是測試的data

2. Flowchart(2/2)

3.1 Spatial Subspace Creation using Locality Preserving Projection(1/3) • We would like to obtain a low-dimensional space to discover the intrinsically nonlinear structure of spatial-motion information where the local spatial information can be preserved (p4,1,2). • We choose LPP because : -linear (有轉換矩陣) -preserving local structure (可以保持住資料的區域結構) -unsupervised (因為在此步驟我們不希望考慮label) • LPP基本概念為, 如果原本在高維度靠近(相似)的資料, 在降維後希望其分布情況依然這兩點仍可以在附近, 進而由區域性的最佳化來建構出全域的分布.

3.1 Spatial Subspace Creation using LPP- LPP概念圖及weight matrix(2/3) • 假設有N個sample點, 在做完k-NN之後, 其LPP的weight matrix如下面定義: An example: B A C if data i is the neighbor of data j, vice versa else

3.1 Spatial Subspace Creation using LPP- LPP公式(3/3) • 求出weight matrix之後, LPP的objective function如下 : …………(1) • 利用graph embedding, 可以把公式推導成下式 : ………...(2) …………(3) Subject to Where is the “Laplacian matrix”, and . And is the diagonal matrix,

Q and A: • Q1 : 降維技術那麼多, 為甚麼選擇LPP? -Answer : 因為一開始在做這個行為偵測的時候, 看了一些論文, 直到看了LSTDE因此有了時間概念, 那我們想到如果資料本身就有時間資訊的話, 那麼幫助性有多大, 因此那時候第一直覺是想到把原本的data加上其跟時間鄰近資料間的軌跡合起來變成一個data, 如下圖所示那麼如果我在高維度就進行這動作, 那麼有兩個大問題, 第一是維度太高, 原本維度就很高的data把他展成temporaldata那麼矩陣會過大造成運算量龐大, 而另一個主因是高維度的資料並沒有特徵擷取的概念, 也就是他每個維度就只是一個pixel的黑白值, 因此我們需要一個特徵擷取的動作, 因此要找個特徵擷取方法又可以把維度同時降低的方法, 這時PCA.LDA.LSDA.LLE.ISOMAP.LDE.LPP…很多方法可以使用, 但是我們在這層主要是希望保持資料間的架構, 又可以找到一個線性矩陣, 而且並不希望在這邊就使用label, 因為充其量只是希望可以降低計算量又保持資料的架構, 因此LPP是一個很符合這邊期望的一個降維方式, 故選擇LPP. [+]

3.2 Learning for Classification in Temporal Subspace _ Temporal Data(1/3) • After obtain the spatial-motion subspace(LPP subspace),we would like to extend data to temporal data. Here we propose three kinds of temporal data. 1.Locations’ Temporal Motion of Mahalanobis Distance (LTM) t = 2

3.2 Learning for Classification in Temporal Subspace _ Temporal Data(2/3) 2.Difference’ Temporal Motion of Mahalanobis Distance (DTM) t = 2

3.2 Learning for Classification in Temporal Subspace _ Temporal Data(3/3) 3.Trajectory Temporal Motion of Mahalanobis Distance (TTM) t = 2

3.2 Metric Learning by LMNN(1/2) • Large Margin Nearest Neighbor (LMNN), is a metric learning method, that it tries to produce a new space which have better distance measurement, and the distance in this space is called Mahalanobis distance. The objective function is shown below : Minimize Subject to (i) (ii) (iii) M has to be semi-definite

3.2 Metric Learning by LMNN(2/2) 1.For the neighbors with the same label, try to pull it in. 2.For the neighbors with different labels, try to push it away with a distance. The result after pulling and pushing.

Q and A: • Q2 : 為甚麼要安排一個LMNN的metric learning方法在這邊? -Answer : 可以注意到, 到此我們還沒有使用到label的概念, 而在做完LPP並且把資料變成temporal data之後, 我們希望可以有一個機制, 把同動作(label)並且軌跡又相似的資料聚集, 而反之把一些侵入者(imposter), 那些不同動作(label)卻又很靠近的資料往外推出一個距離之外. 也就是說在這邊我們不只有利用data本身的資料來推拉, 資料中更有temporal資訊存在, 所以這邊LMNN是對一個同時具有資料本身的空間資訊又有資料在sequence中的時間資訊的spatio-temporaldata做一個距離學習的方法. 而LMNN會學習出一個轉換矩陣L, 經由空間轉換至這個空間, 其資料間在這個空間的距離就是我們學習出來的Mahalanobis distance.

4. Recognition Process(1/2) LPP Temporal data LMNN Mahalanobis Distance KNN Assign “Label” to each frame in test sequence

4. Recognition Process(2/2) Test data KNN Walk Run Jump 5-NN 3/5 The Winner Takes All 1/5 1/5 belongs to (Run)

Q and A: • Q3 : Recognition process 只使用k-NN分類機制, 合適嗎? -Answer : 這個地方我也認為的確有可以再改良的地方, 不過使用k-NN分類方式是因為LMNN也是以一個k-NN的方式去對資料做推移, 所以直覺的分類也就是使用k-NN的分類方式, 不過ACCV的reviewer對於這個部分有提出, 只使用一個k-NN的機制好像有點簡單, 因此他有提問是否有更好的分類方法, 這個部分我也有思考,SVM分類機制, 還是其他分類方式, 有看過某些paper有其他方法, 目前這部分還沒有做比較深入的探討, 但是我覺得這邊也是一個可以改進的地方.

5. Experimental Results(1/5) • Weizmann Dataset : - 共有9個人 -10種動作 -93個sequence -本篇論文有把圖片normalize至64*48, 且有置中 • Human Behavior Database • … • … 在本篇論文中我們用二質化的影像: • … • … 48 • … 64 • … • … • … • … • … • …

5. Experimental Results(2/5) • Weizmann Dataset : -用cross validation來測試, 即選一個人當test data, 其餘八人的資料當training data, 共測九次, 即每人都會當過test data, 之後九組數據平均就是實驗數據. -而變數t, 代表時間資料取多長, 例如t=2就是取前後各兩張來當輔助時間資訊. -降維方法有用LPP, Supervised LPP, LSDA(Locality Sensitive Discriminant Analysis)三種來做比較. -有五種架構做比較 1.SE(只做降維) 2.SM(做降維加上metric learning) 3.LTM(降維+LTM時間概念+metric learning) 4.DTM(降維+DTM時間概念+metric learning) 5.TTM(降維+TTM時間概念+metric learning)

5. Experimental Results(3/5) • 分析(1): 看第一列可以看出加上時間概念, 對於實驗結果的確有幫助, 尤其是DTM跟TTM兩種方法, 效果更好, 我們討論是因為在這邊有用時間上資料間的差異性當資訊, 因此進步幅度比LTM來的好. • 分析(2): 降維使用LPP,比起SLPP以及SLDA, 其效果更好, 我們討論的原因是因為第一層降維我們希望保持資料原本的架構, 因此如果在第一層降維就使用supervised的方法, 那麼資料的分布就其實有被更改過, 我認為如果在這邊就使用一次label, 然後又加上temporalinformation, 然後又使用一次label有點重複的感覺, 也有點擾亂重點, 因為我認為重點是在後面的spatio-temporal data, 所以我覺得把label概念用在這個部分比較合適.

5. Experimental Results(4/5) • 分析(3): 在這邊我要分析的是時間t的大小(時間概念的長短)有甚麼影響, 可以看出上一頁的粗體數據跟本頁的粗體數據, 可以看出當t增加的時候, 對於DTM以及TTM影響可以看出來沒有很大, 但是LTM卻下滑了頗多, 其實我們有討論, 其實問題是出在於LMNN這個機制,LMNN在做metriclearning的時候並沒有weight的概念, 也就是說LMNN並沒有一個加權的概念來使得時間點上跟我比較相近的點比較重要, 影響度就比較高, 因此當時間越加越長或許在t還不大的時候數據變動會不大, 但是當t太大我個人認為不僅資料量變太大, 準確度也會下滑, 因為t越大代表使用了時間點上很遠的資料, 其實相關性已經很小, 卻還拿來使用, 覺得就會有點模糊焦點, 多此一舉的感覺, 因此我認為t這個參數不是越大越好, 不過t的選定這邊我們並沒有比較深入的探討, 我個人是覺得這個參數選定應該是可以由實驗來得出最佳的值.

5. Experimental Results _ Noisy data(5/5) • 這邊要測試我們的系統對於有雜訊的data會不會受到很大的影響, 我們用matlab產生variance不等的雜訊, 其圖片如左圖所示. 實驗結果如下表 : v=0.1 v=0.15 v=0.2 • 分析(4): 在這邊我們可以看出雜訊對於我們系統的影響性其實不大, 但前提是這個雜訊並不是一大塊被遮住的那種雜訊, 而是一些salt noise, 為甚麼影響不大, 我想是因為在降維之後這些雜訊鮮少會被當成特徵留下來, 因此影響並不大, 但是如果是一大塊的, 雖然這邊我們沒有測試, 但是我認為當然是會影響的, 因為如果遮住的部分太大還是遮住了某動作的特徵部位, 那想當然爾, 對於實驗結果一定是有影響的.

6. Conclusion • We propose a novel framework “TVTL” for human action recognition, and in this framework we try to find a proper way to measure the similarity by take not only the spatial information into consideration but also the temporal information. • We prove that the addition of the temporal information do have positive influence, and moreover our method is robust to noisy data. • 未來我想可以想辦法改良我們的方法, 不管是速度還是準確度, 都可以繼續研究深入探討, 我想行為偵測這個主題是日漸重要, 也我在參加ACCV時也有看了不同做法, 而時間概念的使用我認為是很有幫助的, 雖然我們的方法很直覺得把時間概念加入資料中, 但是其實加的方式還可以再探討一番.

Reference • C. Fang, J. Chen,C.Tseng, and J. Lien, “Human Action Recognition using Spatio-Temporal Classification,” ACCV 2009 • L. Jia, and D. Yeung, “Human Action Recognition using Local Spatio-Temporal Discriminant Embedding,” CVPR, pp. 1-8 ,2008 • S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, and S. Lin, “Graph Embedding and Extensions: A General framework for dimensionality reduction,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 29, No. 1, pp. 40–51, 2007. • X. He, and P. Niyogi, “Locality Preserving Projections,” Advances in Neural Information Process Systems, pp. 153-160, 2003 • K. Weinberger, and L. Saul, “Distance Metric Learning for Large Margin Nearest Neighbor Classification,” Journal of Machine Learning Research, pp. 209-244, 2009

Human Action Recognition using Spatio-Temporal Classification

Human Action Recognition using Spatio-Temporal Classification

Presentation Transcript

Behavior Recognition via Sparse Spatio-Temporal Features

Spatio-Temporal Data Mining

The Recognition of Human Movement Using Temporal Templates

SPATIO TEMPORAL FRAMEWORKS

Spatio Temporal Video Retrieval

Human Action Recognition

Spatio-temporal HAC

Spatio-Temporal Databases

Spatio-Temporal Aggregation Using Sketches

Spatio – Temporal Cluster Detection Using AMOEBA

HUMAN ACTION RECOGNITION IN TEMPORAL-VECTOR TRAJECTORY LEARNING FRAMEWORK

Spatio-Temporal Clustering

Spatio-Temporal Databases

SPATIO-TEMPORAL DATABASES

Spatio-Temporal WiFi Localization

SPATIO-TEMPORAL DATABASES

Local Descriptors for Spatio-Temporal Recognition

Spatio-temporal Pattern Queries

Spatio-temporal Databases

Spatio-Temporal Predicates

UCERF3 Spatio-Temporal Clustering

Spatio-Temporal Databases