340 likes | 354 Views
SilverLining is a project focused on scaling hardware infrastructure for biodiversity web apps, with a goal of improving reliability and user experience. Funding is allocated for infrastructure and core application development. The project explores Platform as a Service (PaaS) solutions, such as Google App Engine, to optimize resource efficiency and search performance. Total operating costs are projected to be significantly reduced with the implementation of PaaS.
E N D
Stuff we're covering • Hardware infrastructure and scaling • Cloud platform as a service • The SilverLining Project
Some context • We work at a university • Funding based on projects • Biodiversity web apps and APIs • Focus on software (not hardware)
Infrastructure • Applications depend on infrastructure • Infrastructure that "just works" is expensive • More money for infrastructure means less money for application development • Degenerates without long-term funding • Unreliability is bad for applications • Increasingly bad user experience over time
$1.6M USD total budget to 17 institutions • $245k USD (30.6% of direct costs) for infrastructure
$1.6M USD total budget to 17 institutions • $245k USD (30.6% of direct costs) for infrastructure • $100k USD (12.6% of direct costs) for core application development • DiGIR provider, DiGIR portal
MaNIS, ORNIS, HerpNet, FishNet • $7.6M USD combined budgets, 71 institutions • $196k USD annual operating cost
MaNIS, ORNIS, HerpNet, FishNet • $7.6M USD combined budgets, 71 institutions • $196k USD annual operating cost • $179k USD (92%) for infrastructure
Infrastructure as a Problem (IaaP) • Unsustainable • Creates a barrier to innovation • And this is all before scaling comes into play!
Scalability "The ability for infrastructure to reliably handle heavy request loads in a high performance way."
Scaling up • Scale up vertically with a server upgrade • Scale out horizontally with more servers
Scaling DiGIR networksMaNIS, ORNIS, HerpNet, FishNet • ~85 million records • ~100 servers
Scaling DiGIR networksMaNIS, ORNIS, HerpNet, FishNet • ~85 million records • ~100 servers s
"Scaling is hard." - Alex Payne
"Scaling is hard." - Alex Payne al3x.net/2010/07/27/node.html
Scaling in the small • Handling dozens or requests per second • Scaling up vertically is sufficient • Performance improvements are software related al3x.net/2010/07/27/node.html
Scaling in the large • Billions of requests per week (Google) • Millions of active users (Facebook) • Data centers worldwide with millions of servers al3x.net/2010/07/27/node.html
Are we scaling large or small? • GBIF ~220 million records • eBird ~2 million new records per month • Undigitized collections ~2.5 billion records
Scaling in the "small-ish" • We're at the brink! • IaaP is in the way, scaling is making it worse • Where's the silver lining in all of this?
Platform as a Service (PaaS)en.wikipedia.org/wiki/Platform_as_a_service Conceptually quite simple: • Computing power over the Internet • No servers to maintain • Pay for use • Scales large (even if your application is small) • Provided by companies such as Amazon, Microsoft, Google
SilverLiningsilver-lining.googlecode.com • Experiments, metrics, prototypes (not products) • Picked Google App Engine • PaaS with biodiversity data • Simple Darwin Core • Bulk loading, storage • MapReduce - indexes, validation, statistics • Optimize for resource efficiency, search performance
Cost comparison Total annual operating costs of vertebrate networks: • Current architecture: USD $195,600 • Projected App Engine: USD $19,540
Cost comparison Total annual operating costs of vertebrate networks: • Current architecture: USD $195,600 • Projected App Engine: USD $19,540 Total cost for SilverLining work to date: • 50 cents
App Enginecode.google.com/appengine • Develop scalable web apps on Google's infrastructure • No servers or hardware to maintain and free quotas • Standards based Java and Python SDKs • IDE support for Eclipse, NetBeans, IntelliJ • Local development server • Integrated support for unit testing
App Engine constraints • Practical constraints for performance and scalability • The datastore is not a relational database • Query can only use inequality filters on 1 property • Fails: year >= 1980 and year <= 1982 and elevation > 10 • Solution: Set membership queries
Set membership queries • Before: year >= 1980 and year <= 1982 and elevation > 10 • After: year "within 1 year" of 1981 and elevation > 10 • List for "within 1 year" of 1980: [1979, 1980, 1981]
Aggregation and synchronizationcode.google.com/p/pubsubhubbubcode.google.com/apis/feed/push • Fast aggregation via API • Subscribe to changes at the source • Changes pushed automatically
What's the end game? • PaaS instead of IaaP • SaaS (software as a solution) • BaaS (biodiversity applications at scale) Aaron Steele asteele@berkeley.edu John Wieczorek tuco@berkeley.edu
What's the end game? • PaaS instead of IaaP • SaaS (software as a solution) • BaaS (biodiversity applications at scale) Any QaaC? (Questions as a challenge) Aaron Steele asteele@berkeley.edu John Wieczorek tuco@berkeley.edu