150 likes | 163 Views
Challenges facing data-enabled interdisciplinary training. What is DESE?. If your science and engineering is not data enabled… …you’re not doing it right. http:// drewconway.com / zia /2013/3/26/the-data-science- venn -diagram. Big Data in Agriculture (Today).
E N D
What is DESE? • If your science and engineering is not data enabled… • …you’re not doing it right.
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagramhttp://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Big Data in Agriculture (Today) • Syngenta Challenge: What seed varieties to plant? • Consider expected weather conditions, knowledge about the soil at their farms, and performance studies of candidate soybean varieties from numerous sources.
Tomorrow • Problem becomes Gene (60K) X Environment(?) X Phenotype (thousands) • G X E = P • Visualize results so a farmer can understand, actionable intelligence
VELOCITY • Up to thousands of respondents at one time • With each choice, back end must update embedding and deliver new query without noticeable delay for user • Serious data-handling and infrastructure design challenge Try it! http://nextml.org/chemistry
Leveling the playing field • Everyone comes in with different skills and tool sets. • How do we get each discipline “up to speed”on critical data-science skills… • …without requiring extensive additional coursework / time to degree?
One-way street problem • Students in computer science and engineering have data-science skills that apply broadly… • …but “apply skillset A to dataset B” != cross-disciplinary science. • What will engage interest from both computational and applied sides to promote true interactions?
Tower of Babel 1 • Data science tools and standards vary considerably across disciplines and even across labs… • …yet for students to interact, a common set of tools is required. • How do such standards get set, and what should they be?
Tower of Babel 2 • Each disciplines has its own jargon, which can be efficient within discipline but a barrier across disciplines. • Talks are hard to follow when (a) you can’t understand the terms and when (b) you have to stop to explain every third word. • How do we promote shared language for data science?
Data science infrastructure • Means of collecting, sharing, documenting data are proliferating. • Esp with big data, issues arise: • Privacy, data sharing, large data sets, documentation of data, etc. • What are the right tools and infrastructure for managing data storage, documentation, access, etc? • Open science? Amazon? Wiki? Github? Slack? WordPress?
Plan • Small group breakout 1 (15min): Elaborate and rank order list of challenges • What are we missing? Add any additional challenges to Google Doc • What is most pressing? Rank order listed challenges (last 2 min) • Report back (15 min): Which challenge was your table’s top ranked and why? • Small group breakout 2 (15 min): Top n challenges assigned to tables—regroup at a table that interests you and discuss solutions • Note solutions on Google doc • Report back: What are your table’s solutions?
These are questions for you! • Teaching basic data science to students who are not in quantitative areas. • What basic skills should scientists have to at least get started? • How should these skills be taught? • How do we promote true interdisciplinary collaboration, rather than partitioning tasks by discipline? • How do we balance the utility of jargon versus its alienating effects? How do we best promote good communication from data-science to discipline? • How do we manage variety and promoting standards in software use and development. • How do we build infrastructure for big data sharing, security, and documentation