10 likes | 150 Views
Big Data Storage and Access Issues for Phenotyping of Agricultural Data Stephen George,Susan Urban, Eric Hequet, and Hamed Sari-Sarraf Texas Tech 2013 NSF Research Experiences for Undergraduates Site Program. The Phenotyping Project
E N D
Big Data Storage and Access Issues for Phenotyping of Agricultural DataStephen George,Susan Urban, Eric Hequet, and Hamed Sari-SarrafTexas Tech 2013 NSF Research Experiences for Undergraduates Site Program • The Phenotyping Project • Plant phenotyping is the comprehensive assessment of plant complex traits such as growth, development, tolerance, resistance, architecture, physiology, ecology, yield, and the basic measurement of individual quantitative parameters that form the basis for the more complex traits. (LemnaTec) • Robotics is being used to monitor and capture the plant’s growth over time and keep track of the plant’s environment. • The navigation aspect of the project provides location information for each of the cotton plants in the fields. • Each individual plant will have multiple images that capture growth attributes over time. • Goals of the Texas Tech Phenotyping project • Determining which cross-breed would survive in harsh climate of West Texas. • Being able to store and analyze massive amounts of plant data overtime. • The F1 cross contains 430000 plants over the site of one cotton field. • Can potentially store close to 4 million images/attributes over a 10 week span for 1 generation • 20 crosses * 200 lines * 2 Reps * 5 environments = 17.2 billion plant data spanning over a year. Massive amounts of data being produced. • NoSQL • NoSQL groups all the stores created as an attempt to solve problems which cannot fit into a table/column/rows structures. • Many NoSQL systems produce better write performance than the traditional Relational Databases. • NoSQL handles high volumes of data faster than that of a Relational Databases. • Provides a greater level of flexibility when storing different data types such as images, documents and other objects. • Key-Value Store: MongoDB, • Document Store: MongoDB, Couch, Raven • Column store: Hbase, Cassandra Figures 8: Displays the write speeds for the wicking experiment. • Abstract • Plant phenotyping involves the assessment of plant traits such as growth, tolerance, resistance, and yield. The Texas Tech Phenotyping Project is specifically studying the cross-breed of cotton plants that will better survive the harsh climate of West Texas. Using robotics, images of individual plants in a field are being collected and analyzed over time to support the study, generating massive amounts of plant data. This research project is investigating the big data storage and organizational issues for phenotyping data. A conceptual design of the phenotyping data requirements has been generated to illustrate the large scope of the data required. NoSQL database technology has also been investigated as an alternative to relational databases to provide more efficient storage and retrieval. In particular, the utilization of the NoSQL-based Couchbase system has been investigated for its high scalability and cost effective storage of massive data. Temporal data management with respect to NoSQL databases has also been explored due to the time-oriented nature of phenotyping data collection and analysis. This research provides a prototype implementation of image data storage using CouchBase, together with examples of temporal queries and a performance analysis. Figures 9: Displays the read speeds for the wicking experiment. CouchBaseDataBase Primary unit of Storage on the server is JSON documents JSON documents offer a flexible structure that allows a document to be modeled as an object. Couch Base Server 2.0 uses a JavaScript-based query system that uses field values within JSON documents. Using Views to query specific data creates the ability to combine multiple attributes and retrieve documents based on a given specification. • Summary • After looking at various NoSQL databases, it was determined that a document-store based DB, Couchbase would not only satisfy the project requirements, but also provide an in-system crash prevention, making the system durability close to Relational DBs. • A data model for the Phenotyping project has been created and is ready for implementation. It supports not only the physical attributes of the plant but also environment variables that affect plant growth. • This work also experimented with other forms of data (Wicking data) in order to see if we could implement a similar data model based on the phenotyping project. • Objectives • Comparing different types of NoSQL Databases to determine which form is appropriate for the phenotyping project requirements. • Modeling the entity and attribute data requirements of the phenotyping project. • Capturing the temporal aspects and applying it as a data organization method. • Support for retrieval and querying of data over time. • Prototype using the wicking data application. • Wicking Data • Due to the unavailability of plant data in this state of the project, the experiment was be conducted on wicking data. • What is Wicking? • The ability of a fabric to absorb moisture from a surface (skin). • Used in active wear and performance fabrics. • Future Work • Implement the phenotyping database in CouchBase DB in order to store and handle attributes taken from the robot. • Create different Views in order to fit the specifications for querying plant data based on physical attributes, time-spatial data, and environment. • References • Chen, S. (2010). Multimedia Databases and Data Management: A Survey. International Journal of Multimedia Data Engineering and Management (IJMDEM), 1(1), 1-11. doi:10.4018/jmdem.2010111201 • Monger, M. D., Mata-Toledo, R. A., & Gupta, P. (2012). Temporal Data Management in Nosql Databases. Journal of Information Systems & Operations Management, 6(2), 237-243. Figure 2: Data Model for the Phenotyping Project. Figure 4: Sequence of frames of the drying cycle of active wear fabric. Area of Frames Area/cm Figure 5: Query code for displaying Area based on Experiment 1. Frames Figure 1: Image of a Cotton farm with respect to the time aspects of cotton growth. DISCLAIMER: This material is based upon work supported by the NationalScience Foundation and the Department of Defense under Grant No.CNS-1263183. An opinions, findings, and conclusions or recommendationexpressed in this material are those of the authors and do not necessarilyreflect the views of the National Science Foundation or the Department ofDefense. Figure 7: Graph for Figure 5 Query results. Figure 6: Query code for displaying Temperature based on Experiment 1. Figure 3: Data Model for the Wicking Experiment.