Enzyme Function Initiative Overview John A. Gerlt, PI Enzyme Function Initiative (EFI)

Enzyme Function Initiative Overview John A. Gerlt, PI Enzyme Function Initiative (EFI) Advisory Committee Meeting November 30, 2011

The number of protein sequences is “exploding” !

At least one-half have unknown/uncertain functions Functional assignment: high-throughput computation ?

function U54 GM093342: “Enzyme Function Initiative” (EFI) V max bioinformatics x-ray / computation enzymology biology Group 2 initial velocity KM Group 1 » » » Group 3 Outliers [substrate] Group 4 sequence structure reaction function

EFI: Deliverables 1. Develop a robust sequence/structure-based strategy for facilitating discovery of in vitro enzymatic and in vivo metabolic/physiological functions of unknown enzymes discovered in genome projects. Disseminate to the community the intellectual, computational, and experimental tools, protocols, materials, and guidelines for determining in vitro and in vivo functions of unknown enzymes. Collaborate with the community to facilitate sequence/superfamily analyses as well as homology modeling and in silico docking of ligand libraries to unknown membes of other enzyme superfamilies.

EFI’s “funnel” for functional discovery

Scientific Cores • Superfamily/Genome (Patsy Babbitt, UCSF): Sequences, genome context, operons • Protein (Steve Almo, AECOM): Gene cloning/synthesis, protein purification, ligand binding • Structure (Steve Almo, AECOM): Crystallization and structure determination (50 new structures/year) • Computation (Matt Jacobson, Andrej Sali, Brian Shoichet; UCSF): Functional prediction by homology modeling and in silico ligand docking • Microbiology (John Cronan and Jonathan Sweedler, UIUC): Genetics, transcriptomics, metabolomics • Data/Dissemination (Heidi Imker, UIUC, Wladek Minor, UVa, and Patsy Babbitt, UCSF): EFI website, EFI-DB/LabDB, SFLD

Bridging Projects: targets from diverse superfamilies • Amidohydrolase (Frank Raushel, TAMU): large/diverse superfamily, single substrate, single domain • Enolase (John Gerlt, UIUC): small/”simple” (?) superfamily, single substrate, catalytic and specificity domains • Glutathione Transferase (Richard Armstrong, Vanderbilt): large/diverse superfamily, bisubstrate (“always” glutathione), small molecule and protein substrates • Haloalkanoic Acid Dehalogenase (Karen Allen, BU, and Debra Dunaway-Mariano, UNM): : large/diverse superfamily, phosphomonoesterases, catalytic and specificity domains, • Isoprenoid Synthase (C. Dale Poulter, UU): one (cyclases) or two (isoprenyl transfer) substrates, limited number of substrates, product determined by active site shape

EPI pipeline: develop assignment strategy

EPI pipeline: if correct, functional assignment

EPI pipeline: if incorrect, inform and improve strategy

Criteria for Target Selection Specificity Boundaries: As sequence diverges within a superfamily, the substrate specificity (function) changes. An important test of substrate specificity predictions by the Computation Core is whether changes in the substrate specificity of homologous enzymes can be predicted. Sequence/Function Diversity: Sequence similarity networks allow facile identification of divergent families that have not been experimentally or structurally characterized, and such divergent families likely will have new substrate specificities. An important test of the Computation Core’s algorithms is whether novel specificities can be predicted for targets selected from divergent families. Structures with No Functions (SNFs): The goal of the Protein Structure Initiative (PSI‑1 and PSI‑2) was to explore sequence space in order to define “fold space.” To meet that goal, structures were determined for many functionally uncharacterized enzymes. A challenge is to “rescue” these targets by testing Computation Core generated predictions of substrate specificities.

EFI: enzymefunction.org

EFI: Data Access

SFLD (“dry data”)

EFI-DB (“wet” data)

LabDB (internal LIMS)

Collaborations

Challenges Yr 1 budget was reduced 18% for the initial award Yr 2 budget was reduced 3.9% for the first noncompeting renewal Yr 2-Yr 5 budgets are projected to be flat, but the out-years may be subject to additional reductions Given the available resources, funds for synthetic genes are limited, restricting most of the targets to organisms for which gDNAs are available (currently 559 from ATCC) How does the EFI move forward: reduction in scope and/or reallocation (number of targets, number of Bridging Projects, Core activities) ?

Is the EFI doing something important ?

RFI from the White House Office of Science and Technology Policy

http://www.whitehouse.gov/blog/2011/10/12/building-bioeconomyhttp://www.whitehouse.gov/blog/2011/10/12/building-bioeconomy

RFI for National Bioeconomy Blueprint (4) The speed of DNA sequencing has outstripped advances in the ability to extract information from genomes given the large number of genes of unknown function in genomes; as many as 70% of genes in a genome have poorly or unknown functions. All areas of scientific inquiry that utilize genome information could benefit from advances in this area. What new multidisciplinary funding efforts could revolutionize predictions of protein function for genes? What the EFI is doing is important !!

Perspective · The EFI is a “once in a lifetime opportunity” to define the “new enzymology” that allows the potential provided by genome projects to be fully realized. · The PIs cannot be “independent operators”, the hallmark of P01/R01 projects. Instead, with the support of the EFI, the PIs must be dedicated to collaborations that will ensure that the EFI will be greater than the sum of its parts. · The EFI is receiving significant support from NIGMS to achieve its deliverables: let’s make it work and be a success!

function U54 GM093342: “Enzyme Function Initiative” (EFI) V max bioinformatics x-ray / computation enzymology biology Group 2 initial velocity KM Group 1 » » » Group 3 Outliers [substrate] Group 4 sequence structure reaction function

Enzyme Function Initiative Overview John A. Gerlt, PI Enzyme Function Initiative (EFI)