70 likes | 193 Views
Regression Analysis. DataSet Data Preprocess Normalize LARS. DataSet. X the SNP sequence of 163 subjects each sequence has 5222888 SNPs Y the Wolbachia infected tables. Preprocess of X. As the email said, get an data array of 0,1,0.5 and N
E N D
Regression Analysis • DataSet • Data Preprocess • Normalize • LARS
DataSet • X the SNP sequence of 163 subjects each sequence has 5222888 SNPs • Y the Wolbachia infected tables
Preprocess of X • As the email said, get an data array of 0,1,0.5 and N • Set the values:0->0; 0.5->1; 1->2; N->1; • Get the file new X(DataSet) on http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/x.rar
Preprocess of Y • Choose the sheet of Wolbachia status • Set Values: y->1 n->0 (as they will be normalized, so we get the same results when y->2 n->0) • Get y here: • http://gdm.fudan.edu.cn/attach/lasso_on_GU/y.txt
Normalize X and Y • Use multithread algorithm(2048 threads) to get normalized X (bigger than 8G) • Normalized Y • Normalized X and Y are packaged here: http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/normalize.rar
LARS • Use LARS for 163 iterations • Get the result as each line contains: The max angle between the remaining error and 5222888 vectors In which SNP we get the max angle in some iteration. Here is the result: http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/result.txt
Findings • Are SNP's importance concerned with how many 0s it contains? • As the result file:http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/rstAnd0s.txt • Showes: NO! • Means The Reference Sequence is not reliable.