170 likes | 266 Views
Using Blind S earch a nd F orm al Concepts for Binary Factor Analysis. Aleš Keprt ales.keprt @vsb.cz. Synopsis. Bin ary Fa ctor Analysis (BFA) - introduction to BFA - exact solution of BFA - quality checking Possible optim izations Blind S earch method
E N D
Using Blind SearchandFormal ConceptsforBinary Factor Analysis Aleš Keprt ales.keprt@vsb.cz
Synopsis • Binary Factor Analysis (BFA)- introduction to BFA- exact solution of BFA- quality checking • Possible optimizations • Blind Search method • Method based on Formal Concepts • Testsand the comparison of methods • Possible future work
Binary Factor Analysis (BFA) • Factor analysis of binary data • Using boolean arithmetic • Trying to express matrix X as a product of two matrices • is binary matrix multiplication • Initial conditions:we know X, dimensions of all matrices,number of one’s per row of A
Exact BFA Solution(i.e. reference factorizer) • For checking other algorithms • Searches for the best (optimal) solution • Exact = opposite toapproximatesolution The key:Perform all possible optimizations to avoid checking of all bit-combinations
Boolean arithmetic • Classical arithmetic: • Boolean arithmetic:
Quality check • Discrepancy (česky „odchylka”) • Our goal is to minimize discrepancy • We distinguish between positive and negative discrepancy
Possible optimizations • Empty rows and columns – skip • Duplicate rows and columns – merge • Order factor loadings (rows of A) • Constant number of one’s per row of A • Use the knowledge of A to get F • Parallel (distributed) computaion using multiple computers
Quality check • When merging duplicate rows/columns:
Blind Search • Blind Search = „slepé hledání” • The strategy:1. Build up particular candidate for A2. Find the best F for this A3. Compute discrepancy4. Remember the best A,F pair so far5. Back to step 1
Blind Search (2.) • The key tasks:1. Building up the candidates for A2. How to get F when knowing A • Building up the candidates for A:- row by row (one row one factor)- bit-coded matrices can be much faster- factors cannot repeat on several rows
Blind Search (3.) • How to get F when knowing X and A:- bit-coded matrices may help (yet again)- going row by row (better than cassical„rows x columns”multiplication algo) • How to get the particular row of F:(a) blindsearch (b) do some preprocessing, then blindsearch
Using Formal Concepts • Concepts are our candidates for the rows of matrix A • Example: set“p3”- data matrix size100x100- 10 concepts (see pict.) • Searching for 5 factors:- only 8 candidates- 56 possible combinations
Details • Concepts generate no negative discrepancy higher computation error higher semantic value (for us) • We can get negative discrepancy when computing F (as discussed before) • Performed tests gave promising results
Final Comparison • The presented algo’s are very similar • Giving the same results in our tests • Computation times are very different- times of set “p2” (mentioned before): blindsearch: approx. 3x109 yearsconcepts: 7 seconds
Possible Future Work • Current implementaion uses independent application to compute concepts lattices • Integration to a single application may speed up the computation • We don’t need to compute whole concept lattice, even don’t need to know all concepts • Need to find better algorithm for binary matrix pseudo-division F = X/A