820 likes | 1.76k Views
The Trends in Computer Science for the Near Future. Liu, Jie (刘杰). Professor and Chair Department of Computer Science Computer Science Division Western Oregon University Monmouth, OR 97361 Liuj@wou.edu. Disclaimer .
E N D
The Trends in Computer Science for the Near Future Liu, Jie (刘杰) Professor and Chair Department of Computer Science Computer Science Division Western Oregon University Monmouth, OR 97361 Liuj@wou.edu
Disclaimer • The core parts of this presentation are based on my personal experiences and the article Your Strategic Guide to IT 2012: The New Cornerstones(received from Dell). • The article is sponsored by organizations given on the right
Big data • In the never-ending quest for a competitive advantage, organizations are turning to large repositories of corporate and external data to uncover trends, statistics and other actionable information to help them determine their next move. • Those data sets, along with their associated applications, platforms and analytics tools, are often referred to as “big data,” so it is more than just a lot of data
Examples of using big data • Amazon • Determines what will you buy, when will you buy • The New York Times • Uses big data tools for text analysis and Web mining • Disney • uses them to correlate and understand customer behavior across its stores, theme parks and Web properties. • Casinos hire data analysts • Retails tracks customer’s buying pattern through member cards (会员卡), in China and the US
Problems with Big Data • Store the data – we need a lot of fast disk spaces • We may have a lot of data- 100s of GB or even 100 of TB per database or per table, where and how do we store the data? • Remember that we not only have to store data, we have to store other database objects such as transaction logs, indexes, and backups • Move the data – we need fast networks • How do you move the data around • True story-- we tried to copy data from a data warehouse and killed the warehouse • Manage data – we need highly trained skilled people and good tools • It takes several hours just to make a backup • Creating an index takes hours
Problems with Big Data (2) • Using data • A routine that took a few seconds when first introduced, took minutes, then hours because the data size grows from several thousands records, to several 100s of thousand, to several millions records – doubling every 10 months. • True story -- we had a routine that takes more than an hour to run, longer than the interval of database update, this calls for new algorithms • We have a table that has 56 million records. To get information that is useful to our bosses, we have to join it with other tables that each has several million records – bosses just have to wait and buy more and better hardware while we add indexes
What can we do • Distribute data store • Store data in more places (servers or data centers) so we can scale, the cost is retrieval latency • Faster and better hardware • We can use faster network (from 1 Gb to 10 Gb), more memory, and faster disks such as SSD • Parallel processing • Use more processors so we can do several things at the same time; however, this often calls for new algorithms • Use indexes • This tool gives us a break in the O game from O(nk) to O(log n) so the performance becomes manageable and acceptable • Others • In memory database (Oracle’s new product) • Oracle’s new technology allows its DBMS to put 10 TB of data in memory. Remember RAM is 1000 time faster than disks • Develop better algorithms and approaches to conduct the same task • Real story -- we used temp tables rather than joins to improve a task from couple of hours to 20+ seconds • Better DBMS
Who are the player • Hadoop–an open source tool • consists of two key services: reliable data storage using the Hadoop Distributed File System (HDFS), and high-performance parallel data processing using a technique called MapReduce • Ebay and JP Morgan use it • EBay currently has more than 97 million active buyers and sellers and over 200 million items for sale across 50,000 categories. The auction site handles close to 2 billion page views, 250 million search queries and tens of billions of database calls each day (according to Williams). • The other big players are • IBM • DB2 • Microsoft • SQL Server 2012 • Oracle • ORACLE 11g • They all interact with Hadoop
Big Data and BI • BI – Business Intelligence • Data mining • Finding knowledge hiding in large amount of data • Select “Interesting” (non trivial, previously unknown, and potentially useful) patterns. • Result presentation • BI is the inevitable destination of big data because it tells future and is a Must-Have for any organizations that want to remain in the race with their competitors
The word of the year • The Cloud is a “a migration certainty(铁定的,没跑)” – By Gartner analyst Ken McGee @ Gartner Symposium and ITxpo
Definition • Cloud computing is Web-based processing, where by shared resources, software, and information are provided to clients (such as computers and smartphones) on demand over the Internet.
What is it? What’s new? • Old idea: Software as a Service (SaaS) • Basic idea predates timesharing in 1960s • Software hosted in the infrastructure vs. installed on local servers or desktops; dumb terminals • Recently: “[HW, Infrastructure, Platform] as a service” ?? HaaS, IaaS, PaaS HP paper--Everything as a service • New: pay-as-you-go utility computing • Illusion of infinite resources on demand • Public (utility) vs. private clouds
Why Now (not then)? • “The Web Space Race”: Build-out of extremely large datacenters (10,000’s of commodity PCs) • Build-out driven by growth in demand (more users) => Infrastructure software: e.g., Google File System => Operational expertise: failover, firewalls... • Discovered economy of scale: 5-7x cheaper than provisioning a medium-sized (100’s machines) facility • More pervasive broadband Internet • The QoS – is higher, sort of • Commoditization of HW & SW • Fast Virtualization • Standardized software stacks
Utility Computing Arrives • Amazon Elastic Compute Cloud (EC2) • “Compute unit” rental: $0.03, or $0.08-0.64/hr. • 1 CU ≈ 1.0-1.2 GHz 2007 AMD Opteron/Xeon core • No up-front cost, no contract, no minimum • Billing rounded to nearest hour; pay-as-you-go storage also available • A new paradigm for deploying services?
Classifying Clouds • Instruction Set VM (Amazon EC2) • Managed runtime VM (Microsoft Azure) • Framework VM (Google AppEngine) • Tradeoff: flexibility/portability vs. “built in” functionality Lower-level, Less managed Higher-level, More managed EC2 AMAZON Azure Microsoft AppEngine Google
Cloud Economics 101 • Not Cloud Computing: Static provisioning for peak - wasteful, but necessary for SLA Capacity Machines $ Capacity Demand Demand “Statically provisioned” data center “Virtual” data center in the cloud Time Time Unused resources
Risk of Under Utilization • Underutilization results if “peak” predictions are why over Unused resources Capacity Resources Demand Time Static data center
Risks of Under Provisioning Resources Resources Resources Capacity Capacity Capacity Lost revenue Demand Demand Demand 2 2 2 3 3 3 1 1 1 Time (days) Time (days) Time (days) Lost users
Why mobile • Five million smartphones were in use worldwide in 2010 • That number is projected to exceed 6.7 billion by 2015, according to data Gartner published in June in its report “Magic Quadrant for Mobile Consumer Application Platforms.” • At the same time, companies are increasingly using social networking to connect with their customers — Facebook says brands on its site get 100 million “likes” per day (missing mine because I still do not have an account ).
Other odds and ends • In US, you cannot get a new GOOD phone without data plan • Laptop sells are down quarter after quarter while desktop sells are mostly flat – the message? • Laptops are replaced by tablets and smartphones • In August 2011, more than 72.2 million people in the U.S. (that is 25% of population) accessed social networking sites or blogs from their mobile devices, an increase of 37% from the previous year, according to a study by ComScore • Take a look at Seabird – it is designed to replace computers
The message • Phones are getting more and more powerful to a point, it is good enough or more than good enough for none CS people’s “Computation” needs • For example, an auto repair business use smartphone to take pictures and pass the image to its customers – you could not do that with a laptop .
Social Media == customers • Levi Strauss links to its e-commerce site from its Facebook fan page. • Facebook users visit the Levi’s fan page 1 million times a month, and, in 2010, 4.4 million of them have clicked the “like” button — up from 180,000 in 2009. • During the big shopping days after Thanksgiving in 2010, 50% of the traffic to Levi.com came from Facebook. • Now, on a typical day, 30% comes from Facebook.
The Relationships of the big three • Mobile and social media are using the Cloud more and more for their applications • The Cloud makes developing application for smartphones much easier • Because the Cloud, we are getting more and more data. • So, the three trends are supporting each other and pushing each other