370 likes | 543 Views
CSE891-002 Selected Topics in Bioinformatics. Jin Chen 232 Plant Biology Bld. 2011 Spring. About me…. Jin Chen, Assistant Professor in CSE and PRL from 2009 Office: 232 Plant Biology Lab. Tel: (517) 355-5015. Email: jinchen@msu.edu. Outline. Course Description
E N D
CSE891-002 Selected Topics in Bioinformatics Jin Chen 232 Plant Biology Bld. 2011 Spring
About me… • Jin Chen, Assistant Professor in CSE and PRL from 2009 • Office: 232 Plant Biology Lab. Tel: (517) 355-5015. Email: jinchen@msu.edu
Outline • Course Description • Introduction to Computational Network Biology
Course Description • Course objectives: study interesting computational network biology problems and their algorithms, with a focus on the principles used to design those algorithms. (3 credits) • Instructor: Jin Chen, Office: 232 Plant Biology Bld. Email: jinchen@msu.edu • Office hours: Thursday 2PM-3PM. If you cannot attend office hours, email me about scheduling a different time. • Web page:http://www.msu.edu/~jinchen/cse891a
Course Description • Course work: One 80 minutes lecture, and 80 minutes of discussion & student presentations each week • Grading policies: The course will be graded on attendance (10%), participation (20%), and presentation (70%). • No Final Exam
Course Description • Prerequisites: Graduate students in science or engineering. Note: an override is necessary for non-CSE graduate students; please send your PID & NetID to me. • No prior knowledge of biology is required. Computationally inclined biology graduate students are encouraged to take the class as well.
Suggested books • A.-L. Barabási, Linked: The new science of networks • U. Alon, An Introduction to Systems Biology • B. Palsson. Systems Biology: Properties of Reconstructed Networks • K. Kaneko, Life: An Introduction to Complex Systems Biology
Course Description Network Biology Graph Mining
Paper list • Chua et al. Exploiting indirect neighbours and topological weight to predict protein function from protein–protein interactions. Bioinformatics (2006) 22 (13): 1623-1630. • Kashani et al. Kavosh: a new algorithm for finding network motifs. BMC Bioinformatics 2009, 10:318 • Deng et al. Prediction of Protein Function Using Protein–Protein Interaction Data. Journal of Computational Biology. December 2003, 10(6): 947-960. • Hu et al. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics. Vol. 21 Suppl. 1 pp. i213–i221. 2005 • Xu et al. Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles. ICDE 2006 • Xu et al, Discovering cis-Regulatory RNAs in Shewanella Genomes by Support Vector Machines. PLoS Computational Biology. 5(4) 2009 • Huang et al. Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule mining. Decision Support Systems. 43. 1207–1225. 2007 • Honkela et al. Model-based method for transcription factor target identification with limited data. PNAS vol 107(17) pp. 7793–7798. 2009 • Vermeirssen et al. Transcription factor modularity in a Gene-Centered C. elegans Protein-DNA interaction network. Genome Research 17, 061-1071. 2007 • Covert et al, Transcriptional Regulation in Constraints-Based Metabolic Models of Escherichia coli, Journal of Biological Chemistry, 277(31): pp. 28058-28064. 2002 • Herrgard et al. Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae. Genome Research. 16:627–635. 2006 • Barabási et al. Network Biology: Understanding the Cell's Functional Organization. Nature Reviews Genetics 5, 101-113. 2004 • Dongen. A cluster algorithm for graphs. Technical Report INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000 • Huan et al. Mining Family Specific Residue Packing Patterns from Protein Structure Graphs, RECOMB, pp. 308-315, 2004
Course Description • Select at least one paper for presentation from the paper list. Email me which paper you will present by next Mon (1/17/2011) • Each presentation is 45 min, including 15 min Q&A, followed with a discussion • Your grade will be largely determined by the presentation (70%) • Presentation starts from next Tue (1/18/2011)
Important Days: Class Begins 1/10/2011 Open adds end 1/14/2011 Last day to drop with refund 2/3/2011 Last day to drop with no grade reported 3/2/2011 Class Ends 5/6/2011
Introduction to Computational Network Biology • Network biology belongs to systems biology, which belongs to genomics • Interested in the relations between entities rather than the entities themselves http://bionet.bioapps.biozentrum.uni-wuerzburg.de/
Network’s everywhere • Internet, social network, anti-terrorism network • Biological networks • Protein-protein interaction (PPI) network • protein-DNA interaction network • gene correlation network • gene regulatory network • metabolic network • signaling network… • Network is a tool for under standing complex systems • Network models explains network properties and support network behavior study • Network measures provide quantitative analysis for complex systems
Definition of network (graph) Self-loop Multi-set of edges Edge G(V,E) Node (vertex) Simple graph: does not have loops (self-edges) and does not havemulti-edges.
Definition of network (graph) Directed graph vs. Undirected graph Labeled graph vs. Unlabeled graph Symmetric graph vs. Asymmetric graph
Webpage layout Pages on a web site and the hyperlinks between them M. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 2004
Yeast Protein-Protein Interaction network HawoongJeong
Gene regulation network of sea urchin Eric Davidson
Metabolic flux analysis of E. coli AbhishekMurarka
Why study networks? • Complex systems cannot bedescribed in a reductionist view • Behavior study of complex systems starts withunderstanding the network topology • Network - related questions: • How do we reconstruct a network? • How can we quantitatively describe large networks? • How did networks get to be the way they are?
Simple measures • Node Degree: the number of edges connected to thenode • In-degree & Out-degree • Total in-degree == total out-degree • Average Degree: the average of node degrees for all the nodes in the network, denoted as: • Degree distribution: the degree distribution P(k) gives the fraction of nodes that have k edges where N is the number of nodes in the network, ki is the node degree of node i
Simple measures • Shortest path: to find a path between two nodes such that the sum of the weights of its constituent edges is minimized • Graph diameter: the longest shortest path between any pair of nodes in the graph. • Connected graph:any two vertices can be joined by a path • Bridge: if we erase the edge, the graph becomes disconnected
Simple measures • Betweenness centrality: for all node pairs (i, j),find all the shortest paths between nodes i and j, denoted asC(i,j), and determine how many of these pass through node k, denoted as Ck(i,j).Betweenness centrality of node k is • Calculating the betweenness involves calculating the shortest paths between all pairs of vertices on a graph. O(V2logV + VE) for sparse graph with Johnson’s algorithm. L. C. Freeman, Sociometry 40, 35 (1977); P. E. Black, Dictionary of Algorithms and Data Structures (2004)
Complex measures • Frequent subgraph mining • Graph comparison & classification • Graph isomorphic testing
Useful software • Visualization & Topological Analysis • Cytoscape (www.cytoscape.org) • Pajek (vlado.fmf.uni-lj.si/pub/networks/pajek) • Graph related programming • LEDA (www.algorithmic-solutions.com) • Nauty (www.cs.sunysb.edu/~algorith/implement/nauty/implement.shtml)
1960 1999 2002
Real networks are much more complex • Transcription regulatory networks of Yeast and E. coli show an interesting example of mixed characteristics • how many genes a TF interacts with • how many TFs interact with a given gene - scale-free - exponential
Modularity and network motif • Cellular function are likely to be carried out in a highly modular manner • Modular -- a group of genes/proteins that work together to achieve distinct functions • Biology is full of examples of modularity
Remaining challenges • Discovery of network motifs is closely related to the generation of random networks • Structure of network motifs does not necessary determine function • Relation between higher-level organizational, functional states and networks has not yet been studied Voigt, W. et al. Genetics 2005 Ingram P.J.et al. BMC Genomics 2006 Eric Werner. Nature 2007
Next class • PPI network construction • False-positive detection