330 likes | 439 Views
‘MIAME in practice’ loading microarray expression data with maxd. What is MIAME ?. A common language for describing things is required in order to make repositories useful. MIAME is the M inimal I nformation for the A nnotation of M icroarray E xperiments.
E N D
‘MIAME in practice’ loading microarray expression data with maxd http://www.bioinf.man.ac.uk/microarray/
What is MIAME ? • A common language for describing things is required in order to make repositories useful. • MIAME is the Minimal Information for the Annotation of Microarray Experiments. • The result of a MGED driven effort to codify the description of a microarray experiment. • MIAME aims to define the core that is common to most experiments. http://www.bioinf.man.ac.uk/microarray/
How does MIAME work • Semi-formal textual description of what information should be provided for each type of data. • The main topics are: • The array design description • Features, reporters and composite sequences • The experiment description • Experimental design • Samples used, extract preparation and labeling • Hybridisation procedures and parameters • Measurement data and specifications of data processing http://www.bioinf.man.ac.uk/microarray/
How is data represented? • Since few controlled vocabularies have been fully developed, MIAME encourages the users, if necessary, to provide their own qualifiers and values identifying the source of the terminology. This is achieved through the use of (qualifier, value, source) triplets, for instance: • (qualifier: ‘cell type’, value: ‘epithelial’, source: ‘Gray’s anatomy, 38th ed.’) • There is considerable interest in using ontologies rather than controlled vocabularies but they are in the early stages of development. http://www.bioinf.man.ac.uk/microarray/
MIAME Components • Array design: • An array is composed of features. • Each feature contains a reporter. • Reporters identify composite sequences. • Experimental design: • Each sample comes from a bio-source. • Biomaterial manipulations represent laboratory protocols.(including: extract preparation protocol, labeling protocol and hybridisation protocol) • Hybridisations result in one or more images. • Images are analysed to generate (normalised) expression data. http://www.bioinf.man.ac.uk/microarray/
What is maxd ? • ‘maxd’ refers to the Manchester Array-Express Database. • There is a database schema (maxdSQL), a data loading component (maxdLoad) and an analysis/visualisation package (maxdView). • The current version (a.k.a. the ‘old’ version) is based on a very early EBI design – although the schema is quite recognisable. The problem is that most of the useful information is stored in free-text descriptions. • The second-generation of the software adds explicit support for MIAME – now there are fields for entering the information required by MIAME. http://www.bioinf.man.ac.uk/microarray/
Introduction to maxdLoad2 • Integrated data loading, browsing, editing and searching. • Enables MIAME data capture. • Supports any SQL92 database: Oracle, MySQL, Postgres, Sybase, Firebird • Customisable attributes for each table. http://www.bioinf.man.ac.uk/microarray/
Some maxd terminology ID Name Size Date 012 img003/a 12/03/2002 1600x1200 013 img003/b 12/03/2002 4800x3600 014 img004/a 1600x1200 17/03/2002 015 img004/b 4800x3600 17/03/2002 016 img005/a 1600x1200 22/03/2002 017 img005/b 22/03/2002 4800x3600 018 img006/a 1600x1200 29/03/2002 • Instanceone ‘item’ of data, for example a sample or an image. • Tablethe collection of similar data items, for example all images or all arrays. • Attributea data value that can be associated with an instance. • Schemadefinition of the tables, their attributes and the links between tables. http://www.bioinf.man.ac.uk/microarray/
Overview of new features • Schema extended to more closely fit to MIAME and MAGE concepts. • LabelledExtract links Hybridisation to Extract • Experiment is now the ‘root’ table • Fully configurable attributes for each table. • Description of the attributes is stored in the database and can refer to external data via a URL • Attribute description can be altered without breaking the database. http://www.bioinf.man.ac.uk/microarray/
Database Schema • The schema is fixed: the set of tables and the links between them cannot be changed (at the moment…) http://www.bioinf.man.ac.uk/microarray/
Configurable Attributes • Fixed set of types: • INTEGER • DOUBLE • STRING • TEXT • CVLIST (controlled vocabulary) • TOGGLE • Attributes can be grouped hierarchically. http://www.bioinf.man.ac.uk/microarray/
Attributes Free text string(multiple lines) Free text string(single line) CV List(multiple selection) Toggle Grouping of fields Integer(value is type-checked) Horizontal layout http://www.bioinf.man.ac.uk/microarray/
Attribute Descriptions • The attributes to present for each table are defined using an XML-based syntax: <Group name="SamplingProtocol"> <Text name="Description" hint.lines="10" hint.cols="50"/> <CVList name="Treatment Type" alternative="Other" completion="REQUIRED" > <CVOption name="Heat shock" /> <CVOption name="Heat shock" /> <CVOption name="Radiation" /> <CVOption name="Drug Injection" /> <CVOption name="Plasmid Overexpression" /> </CVList> <CVList name="Separation Technique" alternative="Other" > <CVOption name="None" /> <CVOption name="Trimming" /> <CVOption name="Microdissection" /> <CVOption name="FACS" /> </CVList> <Group name="Time Elapsed" hint.layout="HORIZONTAL" > <String name="Value" completion="OPTIONAL" /> <String name="Unit" completion="OPTIONAL" /> </Group> </Group> http://www.bioinf.man.ac.uk/microarray/
Attribute Descriptions • Attributes can be tagged as OPTIONAL or REQUIRED. • A default value can be provided. • Integer and double attributes are type checked, and illegal values are indicated. • Comment text can be displayed alongside the attributes. http://www.bioinf.man.ac.uk/microarray/
External References • Attribute descriptions can refer to external elements using a HTTP URL: • This is the mechanism through which updates to the MIAME specification and to controlled vocabularies will be tracked. <Group name="Spot"> <Group name="Location"> <Integer name="Row" completion="OPTIONAL" comment="position on array" /> <Integer name="Col" completion="OPTIONAL" comment="position on array" /> </Group> <CVListRef name="FeatureShape" url="http://www.maxd.org/xml/miame/cv_lists.xml" /> <Group name="Physical"> <GroupRef name="Spatial.Location" url="http://www.maxd.org/xml/util/basic.xml" /> <GroupRef name="Spatial.Size" url="http://www.maxd.org/xml/util/basic.xml" /> </Group> </Group> http://www.bioinf.man.ac.uk/microarray/
Support for MIAME (q-v-s) • (qualifier: ‘organ type’, value: ‘spleen’, source: ‘NCBI tissue ontology v1.0’) • The qualifier and source components are encoded in the attribute descriptions. • The allowable values are restricted using a controlled vocabulary. • At the moment there is no support for the more fancy features that can be specified via an ontology, for example exclusion or inheritance – however the way that maxdLoad stores it’s data (see later) will enable this support to be added incrementally. http://www.bioinf.man.ac.uk/microarray/
Creating a new instance http://www.bioinf.man.ac.uk/microarray/
Create Mode • Required fields are coloured differently to optional fields. • All of the required fields must be completed before the new instance can be created. • Links to other instances are chosen from pull down lists, or by picking them in ‘select mode’. http://www.bioinf.man.ac.uk/microarray/
Navigator Tree • The schema can be viewed as a tree. • The red line shows the path taken to the current form. • Instances are ticked off as they specified and their names are shown. • Instances which have not yet been specified are tagged with yellow dots. http://www.bioinf.man.ac.uk/microarray/
Loading data from files • All of the different types of data can be loaded from plain-text files. • Data is extracted from columns. • Value and type checking is applied in the same way as ‘Create Mode’ • Header, footer and comment lines can be indicated and will be ignored. http://www.bioinf.man.ac.uk/microarray/
Loading data from files • Column Specification • Columns can be concatenated • Default and missing values can be specified on a per-column basis • Values can be converted to upper- or lower-case. • Presets • The column specifications and header/footer ignore rules can be saved and recalled. • Preview • A preview of how the file will be interpreted can be displayed prior to loading the data. http://www.bioinf.man.ac.uk/microarray/
Browse Mode • Instances can be browsed by selecting them from a list. • The list can be ordered chronologically or alphabetically. • The links between any pair of tables can be followed. http://www.bioinf.man.ac.uk/microarray/
Browse Mode: Find Linked • The “Find linked” feature can traverse the schema to locate the instance(s) linked to the selected instance. • Example: Find the Scanning Protocol(s) linked to Submitter “Fred” 1. Find the Experiment(s) which were submitted by “Fred”, 2. Find the Measurement(s) in those Experiments, 3. Find the Image(s) used by those Measurements, 4. Find the Scanning Protocol(s) used by those Images. http://www.bioinf.man.ac.uk/microarray/
Find Mode • Instances can be located by specifying: • One or more of the linked instances • One or more attribute values (partial values are allowed) • Part of the name • The list of matching instances is displayed in browse mode. http://www.bioinf.man.ac.uk/microarray/
Edit Mode • Edit mode is accessed by selecting an instance in browse mode and pressing the “Edit” button. • Existing instances are edited using the same interface as create mode. http://www.bioinf.man.ac.uk/microarray/
Deleting instances • Deletion is safe.Database integrity is preserved because deletion is not allowed if the instance is currently ‘in use’, i.e. linked to by some other instance. • Deletion can be ‘cascaded’.If an Experiment is deleted, all instances in every other table linked to the Experiment can also be deleted automatically. http://www.bioinf.man.ac.uk/microarray/
Import & Export • Direct import from ‘old’ maxd databases. • Import and export via a native file format. • Useful for sharing data between sites. • Export in MAGE-ML format. • Suitable for submission to public repositories. • The attribute description information controls how the various MAGE-ML elements are constructed. • [NOT IMPLEMENTED YET] http://www.bioinf.man.ac.uk/microarray/
Availability • maxdLoad 2 will be released as an open-source product. • Beta-quality version expected to be available by the end of April 2003. • Mailing list for announcements: • maxd_info@ecartis.cs.man.ac.uk http://www.bioinf.man.ac.uk/microarray/
Comfort Break Ahoy… • The remaining material is for “advanced users”, i.e. people who actually care how it works under the hood. • All others can disconnect their brains as of this point. http://www.bioinf.man.ac.uk/microarray/
How Measurement data is stored Measurement Spot Property Type Measurement TypeList Type • An arbitrary number of columns of data can be attached to each Measurement. • Data is stored in one or four Property tables (depending on the data type). • Properties are linked to Measurements via a TypeList which is an ordered list of Types. IntegerProperty DoubleProperty Property StringProperty CharProperty http://www.bioinf.man.ac.uk/microarray/
How attributes are stored • Attributes are not stored as normal fields within the tables because this would make it hard to add or remove attributes as the data model evolves. • Instead, they are encoded as chunks of text, and these chunks of text are stored. • This method has the disadvantage that extracting attributes from the database cannot be done using normal SQL queries. • An API is provided for external programs to make it easier from them to interact with the database. http://www.bioinf.man.ac.uk/microarray/
How attributes are stored Attributes are combined into a single string, then this string is split into fixed-length chunks for storage. http://www.bioinf.man.ac.uk/microarray/
How attributes are stored • Pros: • Flexibility – attributes can be added or removed easily • Supports arbitrary length strings • All data-types behave exactly the same irrespective of the underlying database server • Cons: • Hard to search text (cannot use the built-in searching provided the database server) • Hard for non-maxd applications to read and write data http://www.bioinf.man.ac.uk/microarray/