400 likes | 781 Views
Big data in education. 臺 南市政府教育局 資訊中心主任:高誌健. Technology trends Data analytics Trends Cloud computing for big data in education Big Data Analytics in the Cloud Educational Practices. Technology trends. Top Technology Trends for 2014.
E N D
Big data in education 臺南市政府教育局 資訊中心主任:高誌健
Technology trends Data analytics Trends Cloud computing for big data in education Big Data Analytics in the Cloud Educational Practices
Top Technology Trends for 2014 • Next-generation mobile networksMobile infrastructure must catch up with user needs. • Balancing Identity and PrivacyGrowing risks and concerns about social networks. • Smart and Connected HealthcareIntelligent systems, assistive devices will improve health. • E-Government Interoperability a big challenge to delivering information. • Scientific Cloud ComputingKey to solving grand challenges, pursuing breakthroughs. • Emergence of the Mobile CloudMobile distributed computing paradigm will lead to explosion of new services. • From Internet of Things to Web of ThingsNeed connectivity, internetworking to link physical and digital. • From Big Data to Extreme DataSimpler analytics tools needed to leverage the data deluge. • The Revolution Will Be 3DNew tools, techniques bring 3D printing power to masses. • Supporting New Learning StylesOnline courses demand seamless, ubiquitous approach. http://www.computer.org/portal/web/membership/Top-10-Tech-Trends-in-2014
Big data: Why now? • 90% of the data in the world was created in the last 2 years. • The average person today processes more data in a day than a person in the 1500’s entire lifetime. • The LAPD is piloting a big data scheme to predict crime. An algorithm predicts where crime is likely to take place giving police teams in foothill LA the scheme.12% decrease in property crime, 26% decrease in burglary. Predictive policing is now being rolled out in 150 cities across America. • The algorithm was initially developed to predict earthquakes, 43% of data gathered on people comes from social media. • Twitter 100,000 tweets every minute, 650,000 shares on Facebook every minute, 144,000,000 Tweets and 936,000,000 Facebook shares every day. • NETFLIX records 30 million users ‘plays’ a day, it analyses when users pause…, rewind… fast-forward…, and search… , it also knows what users like… • But we’re just getting started. Augmented reality, the quantified self, the internet of things will all become ubiquitous. • Data production will be 44X greater in 2020 than it was in 2009. • Every day, the data mountain grows by 2.5 billion gigabytes. • In 2013, all human knowledge is estimated to be 12 exabytes. • 1 exabyte *1000 = 1 zetabyte = a hard drive • Information is the oil of the 21st century, and analytics is the combustion engine. -PerterSondergaard, senior vice president at Gartner https://www.youtube.com/watch?v=2D8oji5EKbM
Big data characteristics http://www.fico.com/en/Communities/Pages/BigData.aspx & https://www.youtube.com/watch?v=7D1CQ_LOizA
Top 10 Most Funded Big Data Startups Last update: March 31, 2014 http://www.forbes.com/sites/gilpress/2013/10/30/top-10-most-funded-big-data-startups-updated/
Opportunity in types of big data • Sentiment: understand how your students feel about your teaching and feedbacks-right now • Clickstream: capture and analyze website visitors’ data trails and optimize your website • Sensor/machine: discover patterns in data streaming automatically from remote sensors and machines • Geographic: analyze location-based data to manage operations where they occur • Server logs: research logs to diagnose process failures and prevent security branches • Unstructured (text, video, pictures, etc…): understand patterns in files across millions of web pages, emails, and documents
Data Analytics Challenges • Data capture at the user interaction level: • in contrast to the client transaction level in the Enterprise context • Summative to formative analysis • As a consequence the amount of data increases significantly • Greater need to analyze such data to understand user behaviors EDBT 2011 Tutorial
Customer (consumer) analytics • Propensity and Best Next Action • Sentiment analysis https://www.youtube.com/watch?v=Ga2jMY5nzzY&feature=player_embedded • Behavior scoring models http://www.statsoft.com/Solutions/Cross-Industry/Customer-Analytics
Paradigm Shift in Computing EDBT 2011 Tutorial
The NIST Definition of Cloud Computing • Essential Characteristics: • On-demand self-service. • Broad network access. • Resource pooling. • Rapid elasticity. • Measured service. • Service Models: • Software as a Service (SaaS). • Platform as a Service (PaaS). • Infrastructure as a Service (IaaS). • Deployment Models: • Private cloud. • Community cloud. • Public cloud. • Hybrid cloud. Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics (On-demand self-service, Broad network access, Resource pooling, Rapid elasticity, Measured Service); three service models (Cloud Software as a Service (SaaS), Cloud Platform as a Service (PaaS), Cloud Infrastructure as a Service (IaaS)); and, four deployment models (Private cloud, Community cloud, Public cloud, Hybrid cloud). Key enabling technologies include: (1) fast wide-area networks, (2) powerful, inexpensive server computers, and (3) high-performance virtualization for commodity hardware. http://www.nist.gov/itl/cloud/
Cloud Computing: Why Now? • Experience with very large datacenters • Unprecedented economies of scale • Transfer of risk • Technology factors • Pervasive broadband Internet • Maturity in Virtualization Technology • Business factors • Minimal capital expenditure • Pay-as-you-go billing model EDBT 2011 Tutorial
Economics of Cloud Users • Pay by use instead of provisioning for peak Capacity Resources Resources EDBT 2011 Tutorial Capacity Demand Demand Time Time Static data center Data center in the cloud Unused resources Slide Credits: Berkeley RAD Lab
Economics of Cloud Users • Heavy penalty for under-provisioning Resources Resources Resources Capacity Capacity Capacity EDBT 2011 Tutorial Lost revenue Demand Demand Demand 2 2 2 3 3 3 1 1 1 Time (days) Time (days) Time (days) Lost users Slide Credits: Berkeley RAD Lab
Cloud Computing Modalities • Hosted Applications and services • Pay-as-you-go model • Scalability, fault-tolerance, elasticity, and self-manageability • Very large data repositories • Complex analysis • Distributed and parallel data processing EDBT 2011 Tutorial “Can we outsource our IT software and hardware infrastructure?” “We have terabytes of click-stream data – what can we do with it?”
Challenges • Scalability to large data volumes: • Scan 100 TB on 1 node @ 50 MB/sec = 23 days • Scan on 1000-node cluster = 33 minutes Divide-And-Conquer (i.e., data partitioning) • Cost-efficiency: • Commodity nodes (cheap, but unreliable) • Commodity network • Automatic fault-tolerance (fewer administrators) • Easy to use (fewer programmers) EDBT 2011 Tutorial
Platforms for Big Data Analysis • Parallel DBMS technologies • Proposed in the late eighties • Matured over the last two decades • Multi-billion dollar industry: Proprietary DBMS Engines intended as Data Warehousing solutions for very large enterprises • Map Reduce • pioneered by Google • popularized by Yahoo! (Hadoop) EDBT 2011 Tutorial
Data Architecture Example 1 http://hortonworks.com/hadoop-modern-data-architecture/
Enterprise Predictive Analytics Platforms • FICOhttp://www.fico.com/ • IBM SPSS http://www-01.ibm.com/software/analytics/spss/ • KXENhttp://www.kxen.com/ • Oracle Advanced Analyticshttp://www.oracle.com/us/products/database/options/advanced-analytics/overview/index.html • Revolution Analyticshttp://www.revolutionanalytics.com/ • Salford Systemshttp://www.salford-systems.com/ • SAPhttps://www54.sap.com/pc/analytics/business-intelligence/software/predictive-analysis/index.html • SAShttp://www.sas.com/ • Statsofthttp://www.statsoft.com/ • TIBCO http://www.tibco.com/
Excel Data Mining Add-Ins • 11Ants Model Builderhttp://www.11antsanalytics.com/ • AlyudaForecasterXLhttp://www.alyuda.com/forecasting-excel-software-with-neural-network.htm • DataMinerXLhttp://www.dataminerxl.com/ • Predixion Enterprise Insighthttp://www.predixionsoftware.com/predixion/ • XLMinerhttp://www.solver.com/xlminer-data-mining
Open source and free data mining tools • Knimehttp://www.knime.org/ • R http://www.r-project.org/ • Orange http://orange.biolab.si/ • Rapid Minerhttp://rapid-i.com/ • WEKA http://www.cs.waikato.ac.nz/~ml/https://weka.waikato.ac.nz/ (Course) http://www.youtube.com/watch?v=wCvnO96d8h4
Learning R • 中華R軟體學會https://sites.google.com/site/zhonghuarruantixuehui/home • Introducing Rhttp://data.princeton.edu/R/default.html • Try Rhttp://tryr.codeschool.com/levels/1/challenges/2 • Data mining with Rhttp://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR/ • UCLA idrehttp://www.ats.ucla.edu/stat/r/
4 machine learning startups • Alpine Data Labs http://www.alpinedatalabs.com/ • BigMLhttps://bigml.com/ • SkyTreehttp://www.skytree.net/ • Wise.io http://about.wise.io/
Cloud computing in Educational Practices • Two issues: • Educational resources and necessary applications • Examples: • Providing lower level cloud services (such as data storage) • Open educational resources were produced, researched, collected, and shared. • Hosting learning management systems (LMSs) in the cloud. • Providing individual bundled applications in the cloud. (e.g. Google Apps for education or Microsoft Live@edu with office 365) that combine tools for communication and collaboration, office tools for working with documents, and space to store and synchronize data on demand.
Cloud service needs and uses Cloud Computing in Education and Student's Needs by E. KreljaKurelović, S. Rako, and J. Tomljanović
About cloud service & computinginTainan • 150000teacher&studentsingle-sign-on->completed • Iaas&paas&saas->completed • Allover168application&data(resource)