1 / 60

U N A M Universidad Nacional Autónoma de México

U N A M Universidad Nacional Autónoma de México. Servicios Web con aplicaciones en Bioinformática 24 de marzo, 2009. Introducción. Navegando a través del tiempo en la genética Era Genómica Genoma Humano Retos Explosión de datos. Análisis integrados.

Download Presentation

U N A M Universidad Nacional Autónoma de México

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. U N A M Universidad Nacional Autónoma de México Servicios Web con aplicaciones en Bioinformática 24 de marzo, 2009

  2. Introducción. • Navegando a través del tiempo en la genética • Era Genómica • Genoma Humano • Retos • Explosión de datos. • Análisis integrados. • Bioinformatica • Qué es? • Consorcios y Grupos. • Herramientas • Web Services • web services • workflows

  3. Navegando a través del tiempo en la genética 1865: Mendel's Peas Gregor Mendel describes his experiments with peas showing that heredity is transmitted in discrete units. 1869: Friedrich Miescher isolates DNA for the first time. Miescher isolated a material rich in phosphorus from the cells and called it nuclein. Walter Flemming described chromosome behavior during animal cell division. 1879: Mitosis observed http://www.genome.gov/25019887

  4. 1900s 1940's 1950's http://www.genome.gov/25019887

  5. 1960's 1970's 1980's http://www.genome.gov/25019887

  6. 1990's

  7. 2000 - 2001 The President and Prime Minister Blair issued a Joint Statement in an effort to ensure that the public derives the maximum possible benefit from the sequence of the human genome. http://www.genome.gov/25019887

  8. 2002 -2003 http://www.genome.gov/25019887

  9. 2004 - The Future http://www.genome.gov/25019887

  10. Retos de la genómica

  11. Explosión de datos. El genoma humano The Human Genome Project is involved in determining the exact order of the DNA bases of the entire human genome. The human genome contains more than 3.2 billion base pairs and more than 30 000 genes. "If our strands of DNA were stretched out in a line, the 46 chromosomes making up the human genome would extend more than six feet [close to two metres]. If the ... length of the 100 trillion cells could be stretched out, it would be ... over 113 billion miles [182 billion kilometres]. That is enough material to reach to the sun and back 610 times." [Source: Centre for Integrated Genomics]

  12. Que tanta informacion hay? NCBI - National Center for Biotechnology Information Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease. http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=search&term=

  13. Genoma: tamaño del genoma, número de genes Human Genome: 3 billion DNAbase pairs and has a data size of approximately 750 Megabytes

  14. Mas bases de datos especializadas.

  15. El futuro. Análisis integrados y aplicados Pilares Retos

  16. I. Genomics to Biology . Elucidating the structure of genomes and identifying the function of the myriad encoded elements will allow connections to be made between genomics and biology and will, in turn, accelerate the exploration of all realms of the biological sciences. II. Genómica y salud La genómica encierra la promesa del desarrollo de una medicina individualizada y el manejo de ésta para cada perfil genético.

  17. Los últimos avances en la investigación en Ciencias Biológicas están produciendo un enorme crecimiento en el volumen y la complejidad de la información biológica disponible. Las Tecnologías de la Información y las Comunicaciones son cruciales para posibilitar el almacenamiento e interpretación de estos datos en los centros de investigación de un modo eficiente y robusto Bioinformática

  18. Pero, ¿qué es la bioinformática?

  19. Una definición de Bioinformática • Aplicación de las tecnologías de la información en Biología Molecular Esto incluye la compilación, mantenimiento, distribución, análisis y uso de las inmensas cantidades de información biológica disponibles

  20. Principales áreas de su aplicación • 2 Major research areas • 2.1 Sequence analysis • 2.2 Genome annotation • 2.3 Computational evolutionary biology • 2.4 Measuring biodiversity • 2.5 Analysis of gene expression • 2.6 Analysis of regulation • 2.7 Analysis of protein expression • 2.8 Analysis of mutations in cancer • 2.9 Prediction of protein structure • 2.10 Comparative genomics • 2.11 Modeling biological systems • 2.12 High-throughput image analysis • 2.13 Protein-protein docking

  21. Major Organizations • Bioinformatics Organization (Bioinformatics.Org): The Open-Access Institute • EMBnet • European Bioinformatics Institute • European Molecular Biology Laboratory • The International Society for Computational Biology • National Center for Biotechnology Information • National Institutes of Health homepage • Open Bioinformatics Foundation: umbrella non-profit organization supporting certain open-source projects in bioinformatics • Swiss Institute of Bioinformatics • Wellcome Trust Sanger Institute • Major Journals • Algorithms in Molecular Biology • Bioinformatics • BMC Bioinformatics • Briefings in Bioinformatics • Evolutionary Bioinformatics • Genome Research • The International Journal of Biostatistics • Journal of Computational Biology • Cancer Informatics • Journal of the Royal Society Interface • Molecular Systems Biology • PLoS Computational Biology • Statistical Applications in Genetic and Molecular Biology • Transactions on Computational Biology and Bioinformatics - IEEE/ACM • International Journal of Bioinformatics Research and Applications • List of Bioinformatics journals at Bioinformatics.fr • EMBnet.News at EMBnet.org EMBnet is the organisation world-wide bringing bioinformatics professionals to work together to serve the expanding fields of genetics and molecular ...

  22. Software en Bioinformática Software tools for bioinformatics simple command-line tools, complex graphical programs, CGI Best-known algorithms: BLAST, an algorithm for determining the similarity of arbitrary sequences against other sequences, possibly from curated databases of protein or DNA sequences. EMBOSS. Software analysis package. RSAT. Regulatory Sequence Analysis Tools.

  23. A bioinformatics « world » for humans http://tux.crystalxp.net/en.id.10838-brunocb-leonard-de-vinci----tux-de-vitruve.html

  24. My sweet home-made bioinformatics platform Complete datasets BLAST BLAT Download Download and install RSAT Filtered datasets Clustalw Download MEME SQL queries … Parsing HTML Perl script Web page only ressources Do your analysis: scripts

  25. My nightmare (home-made) platform DEPENDENCIES Complete datasets UPDATES BLAST BLAT Download Download and install UPDATES RSAT NEW ANNOTATION Filtered datasets Clustalw Download MEME LIBRARIES SQL queries … NEW DATABASE SCHEMA Parsing HTML Perl script Web page only ressources Do your analysis: scripts

  26. Bye bye home-made platform… http://www.genomequest.com/landing-pages/ODI-webinar-web.html

  27. Problemas : • Datos masivos. Necesidad de procesarlos e integrarlos. • Los datos se encuentran en distintos servidores, en distintas bases de datos, y en distintos formatos. Problema de intercambio de datos. • Muchas herramientas y se encuentran en distintos servidores, en distintas formas de acceso (CGI-Forms, HTML), distintos formatos de entrada y salida, y en distintos lenguajes. Problema de interoperabilidad (comunicación entre herramientas)

  28. Solución al Problema de intercambio de datos. Intercambio de datos a través de un formato definido en XML. XML permite estructurar datos y documentos en forma de árboles de etiquetas con atributos. El modelo de datos XML consiste en un árbol que no distingue entre objetos y relaciones, ni tiene noción de jerarquía de clases. Si queremos semántica (significado) Lenguajes para la definición de ontologías y metadatos en la web. RDF Schema Query Language. OWL Ontology Web Language.

  29. Solución al Problema de interoperabilidad Un servicio web (en inglés Web service) es un conjunto de protocolos y estándares que sirven para intercambiar datos entre aplicaciones. Distintas aplicaciones de software desarrolladas en lenguajes de programación diferentes, y ejecutadas sobre cualquier plataforma, pueden utilizar los servicios web para intercambiar datos en redes de ordenadores como Internet. La interoperabilidad se consigue mediante la adopción de estándares abiertos. Las organizaciones OASIS y W3C son los comités responsables de la arquitectura y reglamentación de los servicios Web.

  30. Programs « talking » to programs click retrieve-seq -org Saccharomyces_cerevisiae -feattype CDS -type upstream -format fasta … login ssh Anonymous access RSAT server in Bruxelles #!/usr/bin/perl -w anywhere

  31. A future bioinformatics « world » for computers ? I have a dream…

  32. A future bioinformatics « world » for computers ? Only retrieve necessary data I have a dream… Run analysis remotely No need for local installation A unified way to access data and programs Data always up-to-date Programs interacting with programs over the internet

  33. Web Services to the rescue ? « Although this proposal may seem a far cry from what happens now, the technology exists to make it reality. The World Wide Web consortium, with industry heavy-weights such as IBM and Microsoft, are providing an alphabet soup of standards: SOAP/XML, WSDL, UDDI and XSDL. » Stein. Creating a bioinformatics nation. Nature (2002) vol. 417 (6885) pp. 119-20

  34. What are Web Services (WS) ? network => internet Service provider (server) client call run_BLAST() run_BLAST () blastall PERL script #!/usr/bin/perl -w send back the results • Definition: • A Web service is a software system designed to support interoperable machine-to-machine interaction over a network Source: W3C: http://www.w3.org/TR/ws-gloss/

  35. Various types of Web services : SOAP XML XML XML HTTP run_BLAST () blastall PERL script #!/usr/bin/perl -w BLAST result XML BLAST parameters $sequence $subst_matrix $threshold XML XML $result • SOAP-based Web Services: • SOAP: Simple Object Access Protocol • Standard of the W3C with specifications: messaging with XML, HTTP for transport

  36. Various types of Web services : SOAP <soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:blas="http://tempuri.org/Blast"> <soapenv:Body> <blas:searchParam soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <program xsi:type="xsd:string">blastp</program> <database xsi:type="xsd:string">SWISS</database> <query xsi:type="xsd:string">MHLEGRDGRR YPGAPAVELL QTSVPSGLAE LVAGKRRLPR GAGGADPSHS</query> <param xsi:type="xsd:string"></param> </blas:searchParam> </soapenv:Body> </soapenv:Envelope> Request envelope XML run_BLAST () blastall PERL script #!/usr/bin/perl -w <soap:Envelope soap:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <soap:Body> <n:searchParamResponse xmlns:n="http://tempuri.org/Blast"> <Result xsi:type="xsd:string">BLASTP 2.2.18 [Mar-02-2008] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Reference for compositional score matrix adjustment: Altschul, Stephen F., John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109. Query= query (50 letters) Database: SWISS: SWISS sequence taken from the header [Last update Mar/02/2009] 405,506 sequences; 146,168,000 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value sp|Q04671|P_HUMAN RecName: Full=P protein; AltName: Full=Melanoc... 104 1e-22 >sp|Q04671|P_HUMAN RecName: Full=P protein; AltName: Full=Melanocyte-specific transporter protein; AltName: Full=Pink-eyed dilution protein homolog; Length = 838 Score = 104 bits (260), Expect = 1e-22, Method: Compositional matrix adjust. Identities = 50/50 (100%), Positives = 50/50 (100%) Query: 1 MHLEGRDGRRYPGAPAVELLQTSVPSGLAELVAGKRRLPRGAGGADPSHS 50 MHLEGRDGRRYPGAPAVELLQTSVPSGLAELVAGKRRLPRGAGGADPSHS Sbjct: 1 MHLEGRDGRRYPGAPAVELLQTSVPSGLAELVAGKRRLPRGAGGADPSHS 50 Database: SWISS: SWISS sequence taken from the header [Last update Mar/02/2009] Posted date: Mar 2, 2009 5:30 AM Number of letters in database: 146,168,000 Number of sequences in database: 405,506 Lambda K H 0.314 0.136 0.403 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Sequences: 405506 Number of Hits to DB: 17,615,102 Number of extensions: 565364 Number of successful extensions: 858 Number of sequences better than 10.0: 2 Number of HSP's gapped: 858 Number of HSP's successfully gapped: 2 Length of query: 50 Length of database: 146,168,000 Length adjustment: 23 Effective length of query: 27 Effective length of database: 136,841,362 Effective search space: 3694716774 Effective search space used: 3694716774 Neighboring words threshold: 11 Window for multiple hits: 40 X1: 16 ( 7.2 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 42 (21.9 bits) S2: 62 (28.5 bits)</Result> </n:searchParamResponse> </soap:Body> </soap:Envelope> XML Response envelope

  37. Various types of Web services : SOAP SOAP::Lite SOAP::WSDL XML::Compile::WSDL11 PERL ZSI SOAPpy AXIS METRO XML PHP-SOAP BLAST parameters serialization run_BLAST () blastall Client XML result deserialization

  38. Various types of Web services : SOAP SOAP::Lite/Apache PERL ? AXIS / Tomcat XML PHP-SOAP/ Apache BLAST result XML deserialization run_BLAST () blastall Client serialization

  39. Various types of Web services : SOAP PERL PERL XML XML BLAST parameters BLAST result serialization XML XML deserialization run_BLAST () blastall Client Client XML XML result serialization deserialization

  40. Various types of Web services : SOAP-WSDL <?xml version="1.0" encoding ='UTF-8' ?> <?xml-stylesheet type="text/xsl" href="RSATWS.xsl"?> <definitions name="RSATWS" targetNamespace="urn:RSATWS" xmlns:tns="urn:RSATWS" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://schemas.xmlsoap.org/wsdl/" xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:html="http://www.w3.org/1999/xhtml" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <types> <xsd:schema targetNamespace="urn:RSATWS" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <!-- RSA TOOLS REQUESTS --> <xsd:complexType name="RetrieveSequenceRequest"> <xsd:annotation> <xsd:documentation>Parameters for the operation retrieve_seq.</xsd:documentation> </xsd:annotation> <xsd:sequence> <xsd:element name="output" type="xsd:string" minOccurs="0"> <xsd:annotation> <xsd:documentation>Return type. Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client), 'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client). Default is 'both'.</xsd:documentation> </xsd:annotation> </xsd:element> <xsd:element name="organism" type="xsd:string" minOccurs="1"> <xsd:annotation> <xsd:documentation>Organism. Words need to be underscore separated (example: Escherichia_coli_K12).</xsd:documentation> </xsd:annotation> </xsd:element> <xsd:element name="query" type="xsd:string" minOccurs="0" maxOccurs="unbounded"> <xsd:annotation> <xsd:documentation>A list of query genes.</xsd:documentation> </xsd:annotation> </xsd:element> <xsd:element name="all" type="xsd:int" minOccurs="0"> <xsd:annotation> <xsd:documentation>Return sequences for all the genes of the organism if value = 1. Incompatible with query.</xsd:documentation> </xsd:annotation> </xsd:element> <xsd:element name="noorf" type="xsd:int" minOccurs="0"> • WSDL: • Web Services Description Language: XML • « a machine-readable description of the operations offered by the service » • The server « introduce himself » to the clients • Names of the available services (=methods) • Parameters of each service (name + type) • Result of each service (type)

  41. Various types of Web services : SOAP-WSDL • Example: to write a client for RSAT Web Services in PERL • - SOAP::WSDL installed • http://rsat.ulb.ac.be/rsat/web_services/RSATWS.wsdl • PERL library « RSATWS  » downloadable on RSAT Website, generated from the WSDL XML parameters serialization Client XML • WSDL: • The URL of the WSDL is necessary to « consume » a SOAP/WSDL Web Service (=write a client) • Allows for automatic generation of client-side libraries «  client stub » => Reduce the amount of code you have to write result deserialization

  42. Various types of Web services : SOAP-WSDL #!/usr/bin/perl –w use SOAP::WSDL; use lib 'RSATWS'; use MyInterfaces::RSATWebServices::RSATWSPortType; ## new soap object my $soap=MyInterfaces::RSATWebServices::RSATWSPortType->new(); ## parameters my %args = ('format' => ‘text’); ## Send the request to the server my $som = $soap->supported_organisms({'request' => \%args}); ## Get the result unless ($som) { printf "A fault (%s) occured: %s\n", $som->get_faultcode(), $som->get_faultstring(); } else { my $results = $som->get_response(); my $result = $results -> get_client(); print   "Supported organism(s): \n".$result; } • Example of code for RSAT PERL Client:

  43. Various types of Web services : REST >gi|540023|gb|U12345.1|AMU12345 Aepyceros melampus isolate am5 D-loop, partial sequence; mitochondrial ACTACCGCTATCAATATACTCCCACAAATATCAAGAGCCTTCCCAGTATTAAATTTGCTAAAATTTTAAA AATTCAATACGAACTTCACACTCCACAGCCTCACGCGAAATTAATAATACGTATTTAAATTCTAGAGTAC ATACCATGAACTATCGTTTAGTACATGAATTTACACACGTCAGCCCGATCAAATGTTTATGTACATAACA CATTATATATGTACATTTCAGTTTGTGTATATAGACATAACATTAATGTAATAAAGACATAATATGTATA TAGTACATTAATTGATTGTCCTCAAGCATATAAGCAAGTACTAGACATTCACTAGCGGTACATAGTACAT TTCATTGTTCATCGTACATAGCGCATGTCAGNCAAATCCGTTCTTGTCAACATGCATATCCCGTCCACTA GATCAC • RESTful Web services: • HTTP transport but no messaging system • Can be seen as a way to retrieve resources via their URLs • Most often used for databases • Often not really considered as « Web Services » • Example: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=U12345&rettype=fasta

  44. Web Services: pros and cons • Advantages • Independency of languages => interoperability • Standard for accessing and describing the services • Improved connectivity between the programs • Possibility of constructing workflows • Drawbacks • Independency of languages • not that straightforward to make a “universal” server • Each language has its own “implementation” of the standard • Heavy system (SOAP/WSDL), need maintenance by service providers • Efficiency => heavy network traffic + serializing/deserializing

  45. WS everywhere • Amazon • Google • http://seekda.com/ • Extensive search engine for Web Services (currently 27 813 services) • http://demo.service-finder.eu(alpha version, promising)

  46. WS in Bioinformatics http://xml.ddbj.nig.ac.jp/index.html http://www.ncbi.nlm.nih.gov/entrez /query/static/eutils_help.html http://www.ebi.ac.uk/Tools/webservices/ http://www.genome.jp/kegg/soap/ http://rsat.bigre.ulb.ac.be/rsat/ http://api.bioinfo.no/wsdl/JasparDB.wsdl

  47. Agregando Significado… Los servicios web semánticos proponen extender estas tecnologías, en vías de consolidación, con ontologías y semántica que permitan la selección, integración e invocación dinámica de servicios, dotándoles así mismo de la capacidad de reconfigurarse dinámicamente para adaptarse a los cambios (p.e. interrupción de servicios o aparición de otros más adecuados) sin intervención humana.

  48. ¿Qué son los servicios Web semánticos? Los Servicios Web Semánticos son una nueva tecnología resultante de la combinación de la Web Semántica y los Servicios Web. Servicios Web Semánticos = Servicios Web + WebSemántica

More Related