310 likes | 544 Views
Carnegie Institution for Science, Department of Plant Biology. Putting TAIR to work for you: Tips and Techniques for Accessing Arabidopsis Data for Plant Biology Research. Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science. Overview. Part I: Presentation (with exercises)
E N D
Carnegie Institution for Science, Department of Plant Biology
Putting TAIR to work for you: Tips and Techniques for Accessing Arabidopsis Data for Plant Biology Research Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science
Overview • Part I: Presentation (with exercises) • Finding a specific gene of interest in TAIR • Looking at the data on the locus, gene model, and protein pages • Getting to know GBrowse • Creating and enhancing customized data sets • Tips for working with Arabidopsis • Part II: Practice problems and individual help • Hand-outs with practice problems to work on • Questions from participants • Individual help • All documents are available in electronic form: • Resource guide • Questions, answers, and practice data • “Bienvenidos a TAIR” presentacion y esta presentacion
What is TAIR? • The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model plant Arabidopsis • www.arabidopsis.org • Curators and programmers at TAIR: • Collect, store, and organize Arabidopsis data • Attach functional information to genes • Improve gene structures • Provide tools to analyze data • Work with the ABRC to provide seeds and clones
Tips and Techniques for Accessing Arabidopsis Data • Finding the gene you want • Case 1: You have a non-Arabidopsis gene and want to find its homolog • http://www.ncbi.nlm.nih.gov/nuccore/148189857?report=genbank • Case 2: You know exactly what Arabidopsis gene you want • You know the AGI locus code (e.g. At2g46990) • You know the gene symbol (e.g. PhyA)
Finding a gene: practice problems • You are reading a paper about an interesting phenotype caused by a mutation in the AN gene. • Find the AGI locus code of this gene • You find an EST that is expressed at high levels in the seed of your Phaseolus vulgaris variety: GenBank: AB304457 • (To find gene in GenBank – google “NCBI” and you should find the page) • Find the AGI locus codes of the top three hits in TAIR using BLAST • Is it the same if you BLAST with the transcript or the protein? • Based on the transcript • Based on the protein
Finding a gene: practice problems • You are reading a paper about an interesting phenotype caused by a mutation in the AN gene. • Find the AGI locus code of this gene • AT1G01510 (a.k.a. ANGUSTIFOLIA) • You find an EST that is expressed at high levels in the seed of your Phaseolus vulgaris variety: GenBank: AB304457 • Find the AGI locus codes of the top three hits in TAIR using BLAST • Is it the same if you BLAST with the transcript or the protein? • Based on the transcript • AT1G14920.1 | Symbols: GAI, RGA2 | GAI (GIBBERELLIC ACID IN... 62 3e-08 • AT3G03450.1 | Symbols: RGL2 | RGL2 (RGA-LIKE 2); transcript... 44 0.007 • AT2G01570.1 | Symbols: RGA1, RGA | RGA1 (REPRESSOR OF GA1-3... 44 0.007 • Based on the protein • AT1G14920.1 | Symbols: GAI, RGA2 | GAI (GIBBERELLIC ACID IN... 647 0.0 • AT2G01570.1 | Symbols: RGA1, RGA | RGA1 (REPRESSOR OF GA1-3... 632 0.0 • AT1G66350.1 Symbols: RGL1, RGL | RGL1 (RGA-LIKE 1
Choosing the proper search result Gene Model Protein Locus
Looking at the Locus page: practice problems 1 • You’re interested in learning more about a gene called: • PMR2 (Powdery Mildew Resistant 2) • What is its AGI locus code? • How many splice variants does it have? • Which one has the shorter coding region? • What is another name for this gene? • What is the evidence for it being involved in the “defense response to fungus, incompatible interaction?” • How many total loci are annotated to this term? • Which paper provides experimental evidence that PMR2 is located in the plasma membrane? • What is the title of that paper?
Looking at the locus page: practice problems 1 • You’re interested in learning more about a gene called PMR2: • Powdery Mildew Resistant 2 • What is its AGI locus code? • At1g11310 • How many splice variants does it have? • 2 • Which one has the shorter coding region? • At1g11310.2 • What is another name for this gene? • Mildew Resistant Locus 2 (MLO2) • What is the evidence for it being involved in the “defense response to fungus, incompatible interaction?” • Inferred from Mutant Phenotype; analysis of visible trait; Consonni 2005 • How many total loci are annotated to this term? • 44 • Which paper provides experimental evidence that PMR2 is located in the plasma membrane? • Benschop 2007 • What is the title of that paper? • Quantitative phospho-proteomics of early elicitor signalling in Arabidopsis.
Looking at the locus page: practice problems 2 • You’re interested in learning more about a gene called: • PMR2 (Powdery Mildew Resistant 2) • How many cDNAs are associated with this locus? • Which are available to order from the ABRC? • What is the length of the full-length coding region? • What is the isoelectric point of the protein? • For the PERL0025782 polymorphism, what is the nucleotide difference between the Col and Bor-4 ecotypes?
Looking at the locus page: practice problems 2 • You’re interested in learning more about a gene called: • PMR2 (Powdery Mildew Resistant 2) • How many cDNAs are associated with this locus? • 3 • Which are available to order? • none • What is the length of the full-length coding region? • 1722 bp • What is the isoelectric point of the protein? • 9.8492 • For the PERL0025782 polymorphism, what is the nucleotide difference between the Col-0 and Bor-4 ecotypes? • Col
Looking at the locus page: practice problems 3 • You’re interested in learning more about a gene called: • PMR2 (Powdery Mildew Resistant 2) • Does the pmr2-1 mutant form lesions in response to powdery mildew attack? • What is the putative location of the T-DNA insertion in mlo2-6? • What is the ecotype of SAIL_878_H12? • How many publications are available for this gene for 2007? • Which paper also mentions the PMR3 gene? • How many papers mention the “mlo2” allele/ mutant when you do a Textpresso search?
Looking at the locus page: practice problems 3 • You’re interested in learning more about a gene called: • PMR2 (Powdery Mildew Resistant 2) • Does the pmr2-1 mutant form lesions in response to powdery mildew attack? • no • What is the putative location of the T-DNA insertion in mlo2-6? • intron • What is the ecotype of SAIL_878_H12? • Col-0 • How many articles and how many abstracts are available for this gene for 2007? • 2 abstracts, 1 article • Which paper also mentions the PMR3 gene? • Isolation and characterization of powdery mildew-resistant Arabidopsis mutants • PNAS 2000 • How many papers mention the “mlo2” allele/ mutant when you do a Textpresso search? • 8
Locus page links: practice problems • You’re interested in learning more about a gene called: • PMR2 (Powdery Mildew Resistant 2) • According to the Genevestigator Gene Atlas, which organ has the highest level of expression? • According to the Genevestigator Response viewer, was the level of PMR2 transcript higher 1 hr or 4 hrs after treatment with the fungal elicitor FL22? • According to the eFP site, are the absolute levels of PMR2 expression higher in the root or the shoot of a seedling, 6 hours after a cold treatment? • In the SUBA database, where does the MS/MS data indicate that this protein is located? • According to InParanoid, how many poplar genes fall into the same group? • On the AT-TED II page, how many genes are directly linked to PMR2 by co-expression analysis, and which has the strongest correlation?
Locus page links: practice problems • You’re interested in learning more about a gene called: • PMR2 (Powdery Mildew Resistant 2) • According to the Genevestigator Gene Atlas, which organ has the highest level of expression? • senescent rosette leaf • According to the Genevestigator Response viewer, was the level of PMR2 transcript higher 1 hr or 4 hrs after treatment with the fungal elicitor FL22? • It is higher 1 hour after treatment • According to the eFP site, are the absolute levels of PMR2 expression higher in the root or the shoot of a seedling, 6 hours after a cold treatment? • They are higher in the root • In the SUBA database, where does the MS/MS data indicate that this protein is located? • plasma membrane • According to InParanoid, how many poplar genes fall into the same group? • 2 • On the AT-TED II page, how many genes are directly linked to PMR2 by co-expression analysis, and which has the strongest correlation? • 5, At2g44180 is the strongest
Do we need anything besides the locus, gene model, and protein pages?
How many Papaya genes are found in the same cluster as PMR2 in Phytozome? • How many Vitis vinifera genes?
Basic navigation and tools in GBrowse Use controls to zoom and scroll along chromosome Enter locus, marker, etc. Get sequence ***Many tracks now contain data from the TAIR9 release on Monday, June 22
GBrowse: practice problems • How many papaya homologs are displayed from Phytozome? And how many amino acids are in the putative ortholog that has the Mlo domain? • There are two upstream regulatory regions located upstream of this gene? Which one has been linked to the a cis element in rice? • Which of the following has a longer transcript assembly aligning with PMR2? • Saccharum officinarum or Triticum aestivum? • Solanum tuberosum or Vitis vinifera? • Are there any experimentally supported phosphorylation sites? • What polymorphism appears to occur in the 5th intron? • Is there peptide support for the third exon? the fourth exon? the fifth exon? And which gene model is supported by peptide evidence? • Which exon structure seems to be better supported by the Brassica cDNA? by the Radish clones?
GBrowse: practice problems • How many papaya homologs are displayed from Phytozome? And how many amino acids are in the putative ortholog that has the Mlo domain? • 2; 350 amino acids • There are two upstream regulatory regions located upstream of this gene? Which one has been linked to the a cis element in rice? • AtREG417 • Which of the following has a longer transcript assembly aligning with PMR2? • Saccharum officinarum or Triticum aestivum? Triticum aestivum • Solanum tuberosum or Vitis vinifera? Solanum tuberosum • Are there any experimentally supported phosphorylation sites? • Yes, from the motif: SVENYPSSPSPR • What polymorphism appears to occur in the 5th intron? • PERL0025787 • Is there peptide support for the third exon? the fourth exon? the fifth exon? And which gene model is supported by peptide evidence? • third – yes; fourth – no, fifth – yes; the At1g11310.1 model is supported • Which exon structure seems to be better supported by the Brassica cDNA? by the Radish clones? • the At1g11310.1 model is better supported by both types of transcripts
Sometimes, one gene isn’t enough . . . • Scientists often want to work with more than one gene or protein that are related through some common feature • TAIR (and the PMN) offer some basic tools to create and/or enhance these customized data sets
Creating customized data sets • Data sets can be based on many different criteria: • Overall sequence alignment (DNA or protein) • Sequence motifs (DNA or protein) • Protein domains and biochemical properties • Gene/Protein “function” • Subcellular location • Molecular function • Biological process • Expression pattern • Biochemical pathway • Mapping region • Phenotype • Gene families • How do you generate these data sets?
Creating data sets: practice problems • How many DNA stocks are associated with NPR1? Do any of them that are available from the ABRC have full length cDNAs? • How many keywords contain the term “oxalate”? How many of them have been used to annotate Arabidopsis genes? • How many germplasms are associated with a “reduced seed set” phenotype? • How many genes encode proteins that are found in the “chloroplast stroma” based on a “direct assay?” • Try to get the calculated PIs for all the “chloroplast stroma” proteins and find the highest and lowest values. • How many proteins have the following domain “Gly-Arg-Ala-Asn-hydrophobic residue” (GRAN[hydrophilic])?
Creating data sets: practice problems • How many DNA stocks are associated with NPR1? Do any of them that are available from the ABRC have full length cDNAs? • 11; yes, the two stocks available from the ABRC have full-length cDNAs • How many keywords contain the term “oxalate”? How many of them have been used to annotate Arabidopsis genes? • 11 keywords; two have been used for Arabidopsis • How many germplasms are associated with a “reduced seed set” phenotype? • 68 • How many genes encode proteins that are found in the “chloroplast stroma” based on a “direct assay?” • 396 loci • Try to get the calculated PIs for all the “chloroplast stroma” proteins and find the highest and lowest values. • 4.25, 12.66 • How many proteins have the following domain “Gly-Arg-Ala-Asn-hydrophobic residue” (GRAN[hydrophilic])? • 32
Putting TAIR to work for you • Use TAIR to find detailed information for a specific gene / protein • Locus page, gene model page, protein page • Many sections, many data types, many external links • GBrowse • Many tracks • Use TAIR to create and enhance customized data sets • Specific and Advanced Search pages • Motif analysis tools • FTP files with large data sets • Use TAIR for data visualization and “analysis” • GO categorization (TAIR) • OMICs viewer (PMN) • If you’re having trouble getting any information you want from TAIR . . .
We are here to help: www.arabidopsis.org • Please use our data • Please use our tools • Please use TAIR to help improve your research on IMPORTANT plants! • Please contact us if we can be of any help! • Make an appointment to meet with me during my visit • (Puedo tratar de hablar en español) curator@arabidopsis.org www.arabidopsis.org
Thank you! TAIR, AraCyc, and the PMN Eva Huala (Director and Co-PI) Sue Rhee (PI and Co-PI) Current Curators: - Tanya Berardini (lead curator – functional annotation) - David Swarbreck (lead curator – structural annotation) - Peifen Zhang (Director and lead curator- metabolism) - A. S. Karthikeyan (curator) - Philippe Lamesch (curator) - Donghui Li (curator) - Rajkumar Sasidharan (curator) Recent Past Contributors: - Debbie Alexander (curator) - Christophe Tissier (curator) - Hartmut Foerster (curator) Tech Team Members: - Bob Muller (Manager) - Larry Ploetz (Sys. Administrator) - Raymond Chetty - Anjo Chi - Vanessa Kirkup - Cynthia Lee - Tom Meyer - Shanker Singh - Chris Wilks Metabolic Pathway Software: - Peter Karp and SRI group
We are here to help: www.arabidopsis.org • Please use our data • Please use our tools • Please use TAIR to help improve your research on IMPORTANT plants! • Please contact us if we can be of any help! • Make an appointment to meet with me during my visit • (Puedo tratar de hablar en español) curator@arabidopsis.org www.arabidopsis.org