190 likes | 321 Views
Week 6. Shelby Thompson. This week…. Emailed Enrique; my kNN /Threshold graphs were wrong Redid them and experimented with many k -values; results are still too noisy Values ranged from 5-500 The greater the number, the closer the threshold graph was to the p -distance graph.
E N D
Week 6 Shelby Thompson
This week… • Emailed Enrique; my kNN/Threshold graphs were wrong • Redid them and experimented with many k-values; results are still too noisy • Values ranged from 5-500 • The greater the number, the closer the threshold graph was to the p-distance graph
Best Threshold Graph(non neighbors 0, thresh graph right) Best k-value was found to be 475
Best Threshold Graph(non neighbors Inf, thresh graph right) Best k-value was found to be 475
Clustering • Next looked at clustering • Used a paper Mahdi suggested: “PICS: Parameter-free Identification of Cohesive Subgroups in Large Attributed Graphs” by Leman Akoglu, Hanghang Tong, Brendan Meeder, and Christos Faloutsos • Paper proposed PICS method of clustering
PICS • Method for mining attributed graphs • Requires no user input/parameters • Running time scales linearly with total graph and attribute size • PICS can reveal useful insight into datasets such as Twitter and YouTube • The above datasets have tens of thousands of nodes
Images generated from PICS(Figure 1) • Figure 1 shows all of the nodes, separated, before any operation is performed on them
Images generated from PICS(Figure 2) • Figure 2 shows the node groups in Figure 1, divided based on the average location of the group and number of nodes in the group before the operations are performed
Images generated from PICS(Figure 3) • Figure 3 shows the node groups in Figure 1, divided based on the average location of the group and number of nodes in the group after the operations are performed
Images generated from PICS(Figure 4) • Figure 4 shows the major node groups after the clustering operations are performed
Other work this week… • Worked with a number of scripts • None have yielded good results yet • Will continue to work on them this coming week
kNN Code: • Part 2: • %kNN graph • knn=100; • knnIndZero = zeros(length(fbgTestIds),length(fbgTrainIds)); • for i = 1 : length(fbgTestIds) • [vals,ind] = sort(dist(i,:),'ascend'); • knnIndZero(i,ind(1:knn)) = 1; • end • % Threshold Graph • threshIndZero = zeros(length(fbgTestIds),length(fbgTrainIds)); • for i = 1 : length(fbgTestIds) • ind = dist(i,:) <= dist(i,i); • threshIndZero(i,ind) = 1; • end • figure;imagesc(zeroMatrix) • figure;imagesc(knnIndZero) • figure;imagesc(threshIndZero) Part 1: load('pf83_gabor_lbp_hog_2048.mat') dist = pdist2(fbgTestImgs',fbgTrainImgs','cosine'); figure;imagesc(dist); [rows, cols] = size(dist); zeroMatrix = zeros(length(fbgTestImgs),length(fbgTrainImgs); for i = 1:numel(fbgTestIds) for j = 1:numel(fbgTrainIds) if fbgTestIds(i) == fbgTrainIds(j) zeroMatrix(i,j) = 1; end end end
Clustering Code:(Runs fine but no good output) Part 1: load('data/A_call.mat') load('data/F_call.mat') load('pf83_gabor_lbp_hog_2048.mat') xlabels = {'prof','grad','grad-1','ugrad','ugrad-1','staff','sloan'}; groundTruthLabel = zeros(length(fbgTestImgs),length(fbgTrainImgs)); for i = 1:numel(fbgTestIds) for j = 1:numel(fbgTrainIds) if fbgTestIds(i) == fbgTrainIds(j) groundTruthLabel(i,j) = 1; end end end Part 2: clust = test_reality('call', 1, inf); lengthClust=length(clust); cHist = zeros(lengthClust,83); for c=1:lengthClust ind=clust==c; for i=1:83 cHist(c,i)=sum(groundTruthLabel(ind)==i); end end figure;imgsc(cHist) figure;imgsc(groundTruthLabel)
K-Means:(Still in progress) X = [fbgTrainImgs';fbgTestImgs']; k = 100; opts = statset('MaxIter’,10); [idx,ctrs] = kmeans(X,k,'Replicates',1,'options',opts); classes = unique(fbgTrainIds); trnCtrs = ctrs(1:length(fbgTrainIds)); trainHist = zeros(k,length(classes)); for c = 1 : k ind = trnCtrs == c; for i = 1 : length(fbgTrainIds) trainHist(c,i) = sum(ind & fbgTrainIds == i); end end tstCtrs = ctrs(length(fbgTrainIds)+1:end);