340 likes | 542 Views
Regions of Interest. Alternative Storage. Overview. What’s in a ROI? Use cases Requirements Current Storage System Problems Alternative Storage . What’s in an ROI?. ROI Geometry Measurements ROI on Channel Annotations ROI Measurement Links . U se Cases. User created ROI
E N D
Regions of Interest Alternative Storage
Overview • What’s in a ROI? • Use cases • Requirements • Current Storage System • Problems • Alternative Storage
What’s in an ROI? • ROI • Geometry • Measurements • ROI on Channel • Annotations • ROI • Measurement • Links
Use Cases • User created ROI • Measurement tools • HCS generated ROI • Automatic • External • External analysis • Particle Tracking • Other • Templates • ROIs without images
Use Cases – Human Generated • Human generated • More interactions • Merge, Propagate, Split, Delete • Measurements • Geometry • Intensity • Path • ROI/ROI Links • Tags mostly on ROI • Write Many/Read Many
Use Cases - HCS • HCS Generated ROI • Lots of ROI • Attached to Channel • Measurements Attached • Multiple measurements • Tags on ROI, Measurements • Analysis, results and meta. • Write Once, Read Many
Use Cases – External Tools • External Tool can Generate ROI (+ scripts) • Can be tagged • Links (ROI/ROI, ROI/Image) • Results can be in any format
Use Cases - Templates • ROI need not be attached to image • Template to define other ROI
ROI from the Nth Dimension • N-Dimensional Data • Storage of Image data simple • ROI more complex • Database entry, file format • We don’t just want to store in HDF
Current Storage Solutions • Database • ROI • ROI Annotations • PyTables • Mask ROI • Measurements
Current Status • Pytables • ROI are heterogeneous • Concurrency • Python behind a core service call • Measurements are optimal • Tagging is an issue • Inside file • Multiple annotations reported to be slow
Database • ROI can be stored in database • Mask data can be an issue • Tagging in RBD not best • Many more annotations than we’d like • Link to external source for measurements
Alternative Storage • Key-Value Pair Stores • Berkeley DB • Project Voldermort • Tokyo Cabinet • Document DB • MongoDB • CouchDB • Graph DB • Neo4J • InfoGrid • Table DB • Cassandra • Hypertables • HBase
Where others have gone before • Other opinions on the storage solutions • MongoDB vs CouchDB, Cassandra, .. • CouchDB vs MongoDB • Pros and cons of MongoDB • Digg on Cassandra • What is a supercolumn • Cassandra talk • Indexing nodes in Neo4J
MongoDB • Document Database • NOSQL movement • Schemaless • No Tables • Collections of like data • No Joins • Document is equivalent of row of data • Distributed file system (GridFS)
MongoDB– Pros and Cons Pros • It has bindings to numerous languages (C++, C#, Java, Python, ...). • Allows storage, indexing, linking of any user data • Annotations are now very easy, efficient • Has mechanisms for schema upgrade • Dynamic Queries • Replication • Sharding. • Map-Reduce framework. • Fast. • GridFS is a distributed file storage mechanism within Mongo. • Easy to install Cons • Schemaless, data integrity will need to be worked on. • Graph structures not inherently supported.
MongoDB - Deployments DEPLOYMENTS • SourceForge http://sourceforge.net/ • BusinessInsider http://www.businessinsider.com/ • New York Times http://www.nytimes.com/ • Disqus http://www.disqus.com/
MongoDB– Example insert connection = Connection(); db = connection['databaseName']; collection = db.['collectionName']; collection.insert({"tags" : [ ], "label" : “MyROI”, "shapes" : [{ "tags" : [{"tag" : "foo1", "namespace" : "bob"}], "rx" : 17, "ry" : 17, "label" : null, "cy" : 75, "cx" : 3, "t" : 0, "z" : 0, "type" : "Ellipse", "id" : 3 }, { "tags" : [{"tag" : "foo2", "namespace" : "bob"}], "rx" : 10, "ry" : 16, "label" : null, "cy" : 82, "cx" : 45, "t" : 0, "z" : 0, "type" : "Ellipse", "id" : 5 }], "type" : "Roi", "id" : 565 })
MongoDB– Example query Find roi with tag foofoo and shapes with tag foo1 connection = Connection(); db = connection['databaseName']; collection = db.['collectionName']; collection.find({”shapes.tags.tag”:”foo1”,”tags.tag”:”foofoo”}) Find roi shapes with tag containing mitosis connection = Connection(); db = connection['databaseName']; collection = db.['collectionName']; collection.find({"shapes.tags.tag":'/.*mitosis.*/i'})
Neo4J • Graph Database • use nodes to represent objects • User specifies relationship between nodes • Allows complex traversal of node structures
Neo4J – Pros and Cons PROS • Handles graph structures nicely • Transactional • Supported by Gremlin Gremlin • Native RDF http://components.neo4j.org/neo-rdf-sail/ • Easy to install CONS • No C++ language binding. • Not distributed. • Tables are not so easily modeled. • Difficult to query on node contents
Neo4J - Deployments DEPLOYMENTS • The Swedish Defence forces http://www.mil.se • Windh Technologies http://www.windh.com • Flextoll http://www.flextoll.se
Neo4J - Example public enumOMERORelations implements RelationshipType { ASSOCIATE, DERIVE, AGGREGATE, COMPOSE } Node image = neo.createNode(); image.setProperty("IObject",imageI); image.setProperty("id",imageI.getId().getValue()); image.setProperty("name",imageI.getName().getValue()); Node derivedImage = neo.createNode(); derivedImage.setProperty("IObject",derivedImageI); derivedImage.setProperty("id",derivedImageI.getId().getValue()); derivedImage.setProperty("name",derivedImageI.getName().getValue()); Relationship relationship = image.createRelationshipTo( derivedImage, OMERORelations.DERIVE ); relationship.setProperty("type","ROI"); relationship.setProperty("operation","crop"); relationship.setProperty("roi",cropRoiI);
Cassandra Implementation of Google’s BigTables, is a complex implement of a key/value store to represent a table. A sophisticated toolset is required to get the most out of this solutions, for instance Google has created sawzall to query this system. Digg have released a language to work with Cassandra called LazyBoy. Works by creating a table which has columns linked together called column families, like data will exist in the same column family (Ellipse ROI).
Cassandra – Pros and Cons Pros • Quick • Handles heterogeneous data well • Different rows can have different columns • Can manage distributed data • Map/Reduce • Focus on writes not reads • Scales nicely • Easy to Install Cons • Not simple to work with • Building hierarchical structures • Sorting • Querying • Ad Hoc Queries are bad, Digg still use MySQL for certain queries. • Have to manage secondary indexes, (K/V) • Version 0.5
Cassandra - Deployments Deployments • Facebook (MAYBE!!) http://www.facebook.com • Digghttp://www.digg.com
HyperTable Implementation of Google’s BigTables, is a complex implement of a key/value store to represent a table. A sophisticated toolset is required to get the most out of this solutions, for instance Google has created sawzall to query this system.HyperTable has a query language call HQL. Works by creating a table which has columns linked together called column families, like data will exist in the same column family (Ellipse ROI).
Hypertable–Pros and Cons Pros • Quick • Handles heterogeneous data well • Different rows can have different columns • Can manage distributed data • Map/Reduce • Scales nicely • Easy to Install Cons • GPL License • Building hierarchical structures • Docs are weak • HQL works for simple queries only • Map/Reduce for other work • limit of 255 column families • Secondary keys
HyperTable- Deployments Deployments • Rediffhttp://www.rediff.com • Zventshttp://www.zvents.com/
Are we Normal? • Why do we have an RDMS • We don’tnormalise the data • Each import will normalise on: • Image, ObjectiveSettings, LogicalChannel, LightSettings, Detector Settings. • Object Penalty • Difference between normalisation and view