1 / 32

milkER – a milk informatics resource

milkER – a milk informatics resource. Stephen Edwards BSc. University of Edinburgh BioNLP meeting 6th June 2005. Overview. Aims of milkER milkER database Text-mining Potential targets. milkER aims.

Download Presentation

milkER – a milk informatics resource

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. milkER – a milk informatics resource Stephen Edwards BSc. University of Edinburgh BioNLP meeting 6th June 2005

  2. Overview • Aims of milkER • milkER database • Text-mining • Potential targets

  3. milkER aims • To amalgamate disperse milk information into one resource, allowing more focused analysis of milk proteins in relation to dairy issues, health and disease.

  4. A milk database • Knowledge on milk affects many industries • UniProt, GenBank excellent resources • Marsupial genomics database (New Zealand) • Glasgow genomics data • Chinese database • Polish bioactive peptide database • Food property database (commercial)

  5. Milk components • Fat, carbohydrates, proteins, minerals • Growth factors, enzymes, enzyme inhibitors, immunoglobulins, allergens, disease factors, anti-bacterial proteins, opioids 1. Deliberate 2. Leakage from blood 3. Result of disease conditions 4. Engineered 5. Bacterial origin

  6. milkER database • Database using BioSQL which allows incorporation of UniProt, EMBL, GenBank entries

  7. LOCUS NM_173929 790 bp mRNA linear MAM 27-OCT-2004 DEFINITION Bos taurus lactoglobulin, beta (LGB), mRNA. ACCESSION NM_173929 VERSION NM_173929.2 GI:31343239 KEYWORDS . SOURCE Bos taurus (cow) ORGANISM Bos taurus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Cetartiodactyla; Ruminantia; Pecora; Bovidae; Bovinae; Bos. REFERENCE 1 (bases 1 to 790) AUTHORS Jayat,D., Gaudin,J.C., Chobert,J.M., Burova,T.V., Holt,C., McNae,I., Sawyer,L. and Haertle,T. TITLE A recombinant C121S mutant of bovine beta-lactoglobulin is more susceptible to peptic digestion and to denaturation by reducing agents and heating JOURNAL Biochemistry 43 (20), 6312-6321 (2004) PUBMED 15147215 REMARK GeneRIF: Results suggest that the stability of beta-lactoglobulin arising from the hydrophobic effect is reduced by the C121S mutation so that unfolded or partially unfolded states are more favored. ORIGIN 1 actccactcc ctgcagagct cagaagcgtg atcccggctg cagccatgaa gtgcctcctg 61 cttgccctgg ccctcacctg tggcgcccag gccctcatcg tcacccagac catgaagggc …..

  8. Information retrieval Other Databases EMBL UniProt Information extraction milkER population Other Sources (e.g. published tables) milkER Web Query

  9. milkER database • Database using BioSQL which allows incorporation of UniProt, EMBL, GenBank entries • Library of literature on milk • User interface (www.milker.org.uk)

  10. Text-mining • Machine ‘reading’ of text • Many techniques involved: • Tokenisation • Stemming (Activation  Activat) • POS tagging (Protein  noun) • Abbreviation expansion (CN  Casein) • Entity identification (Casein  protein) • Dictionary

  11. ”Increased levels of IgA antibodies to B-LG were found and were shown to be an independent risk marker for type 1 diabetes.” Increased [past participle] levels [plural noun] of [preposition] Tokeniser / POS tagger IgA [antibody] B-LG [protein] Diabetes [disease] Entity identification Parser [IgA antibodies to B-LG] ‘MARKER’ [type 1 diabetes]

  12. Information extraction • Rule based • ‘interact’ ‘bind’ ‘activate’ • [protein] (0-5 words) [verbs] (0-5 words) [protein] (Blaschke and Valencia, 2002) • Machine-learning • Statistical methods, Hidden Markov Models • Learn interfillers, text lying between tagged entities (Bunescu et al, 2004)

  13. Difficulties • Synonyms • Proteins and genes with same name • Funny names e.g. ERK-1/2, ‘and’ gene! • Variability of natural language • Compounded names • Co-ordination, negatives, speeling errors

  14. Evaluation • Precision (P) - how correct is output • Recall (R) - how often does it pick • F-measure - combines P and R • IE systems can achieve high results, but not enough to populate databases automatically

  15. Text-mining uses • Aim to extract interactions and diseases • Swanson (Fish oil) • Srinivasan (Turmeric)

  16. General model for discovering implicit links between topics Starting topic: Turmeric (inhibits) Intermediate topic: Nuclear factor-kappa B (involved in) Terminal topic: Crohn’s disease Diagram taken from Srinivasan et al, 2004

  17. Targets for text mining • Many milk relationships still require further investigation • Positive reasons - nutritional benefits - neonatal growth - antimicrobial activity - bioactive peptides

  18. Targets for text mining (cont.) • Negative reasons - recent link with Alzheimer's - diabetes link - asthma - human reactions to cow hormones (e.g. Acne, Danby 2005) - drug transfer to milk and effects - allergic reactions/intolerance - toxic contaminants

  19. milkER process • 897 proteins, 772 dna, 1232 rna • Analyze references (1465 MEDLINE refs) • MeSH terms, GO terms etc • POS tag • UMLS standardisation • Gene/protein dictionary • Extract relations

  20. Milk literature

  21. milkER interactions • Table of interacting proteins • Store as queryable XML strings? • Discover links between proteins and disease • Create hypotheses • Confirm experimentally

  22. Diabetes • Pancreas secretes hormones • Glycagon, increases conversion glycagon  glucose • Insulin, increases conversion glucose  glycagon. Allows glucose into cells. • “Condition where the amount of glucose in the blood is abnormally high as the body cannot use it adequately as fuel”

  23. Diabetes • Affects 3-5% of industrialised populations • Type 1 (~10%) • Genetic and environmental factors (e.g. diet) • Decreased insulin production • Mostly develops < age 20 • Type II (~90%) • Resistance of body to insulin • Normally develops > age 40 • Often associates with high B.P, cholsterol and arterial disease

  24. Milk and diabetes

  25. Selected quotes • “More research is needed on all aspects of lactation in women with diabetes.” • Reader D. et al, Curr Diab Rep. 2004 • “The effect of high protein intakes from different sources on glucose-insulin metabolism needs further study” • Hoppe et al, European Journal of Clinical Nutrition 2005 • “American children also tend to be heavier than those from European countries, skewing the [growth] charts further.” • The Scotsman Sat 5 Feb 2005 • The government currently recommends that babies should be fed breast milk alone for the first six months - the WHO recommends two years.

  26. Conclusions • Knowledge of milk vital in many areas • milkER aims to bring disparate milk data together • Text-mining can wade through large amounts of data to retrieve and discover vital information

  27. Future work • Relation extraction of milk literature • Extend content of milkER to include interaction data • Create hypotheses for experimental work

  28. Acknowledgements • Prof. Lindsay Sawyer • Dr. Carl Holt (Hannah Research Institute, Ayr) • Prof. Bonnie Webber (Informatics) • Dr. Alistair Kerr and Dr. Douglas Armstrong for technical support

  29. References • Acne/milk • Acne and milk, the diet myth, and beyond (Danby, 2005) • Diabetes/milk • Milk and diabetes (Schrezenmeir et al, 2000) REVIEW • The role of -casein variants in the induction of insulin-dependent diabetes (Elliott et al, 1997) • Text-mining • Natural language processing and systems biology (Cohen et al, 2004) REVIEW • Mining MEDLINE for implicit links between dietary substances and diseases (Srinivasan et al, 2004) • Learning to extract proteins and their interactions from MEDLINE abstracts (Bunescu et al, 2003)

More Related