560 likes | 696 Views
Cloud Computing and its Technologies and Applications. Ming-Syan Chen 陳銘憲 Director Research Center for Information Technology Innovation Academia Sinica March 19, 2010. Received talk request… What to talk?. In an hour. Be relevant. Outline. Introduction to Cloud Computing
E N D
Cloud Computing and its Technologies and Applications Ming-Syan Chen 陳銘憲 Director Research Center for Information Technology Innovation Academia Sinica March 19, 2010
Received talk request…What to talk? In an hour Be relevant
Outline • Introduction to Cloud Computing • Taiwan’s opportunity • Cloud applications • Technology issues • Work in NetDB • Conclusion M.-S. Chen
雲端運算定義 (一)雲端運算是一種經由網路取得遠端電腦運算服務的商業模式與技術組成 1.資料(data)與軟體服務(service)移往網際網路上大型(通常為1萬台以上主機)、可延展(scalable)、共用(shared)的資料中心 2.雲端資料中心提供無限延展的運算、儲存與應用程式,使用者利用具備網際網路連線能力之電腦終端裝置(device),即等同擁有一部虛擬超級電腦 (二)未來電腦運算就像是水、電一樣,只要連上網路就可以使用,不必各自投資發展 M.-S. Chen
未來雲端資料中心的樣貌與效益 Thin Client Cloud Operation System 資料來源:ISI INTERNATIONAL STRATEGY & INVESTMENT Source: www.spectrum.ieee.org 02/2009 M.-S. Chen
Why Cloud Computing • Better resource sharing • Smaller overall IT cost • Better management of computing resources • Easy resource access for users • Better SW/AP dissemination • Facilitating ubiquitous computing/LBS • Prosperous in view of technology trend • HW cost, network availability, value of SW, etc M.-S. Chen
Outline • Introduction to Cloud Computing • Taiwan’s opportunity • Cloud applications • Technology issues • Work in NetDB • Conclusion M.-S. Chen
雲端運算創造台灣資訊業高值化、服務化轉型升級契機雲端運算創造台灣資訊業高值化、服務化轉型升級契機 • 雲端運算科技,徹底改變資訊產業供應鏈樣貌與資訊科技應用方式,對資訊產業產生重大影響: • (一)一般使用者的電腦終端,不需要安裝軟體,使用雲端軟體服務(SaaS, Software-as-a-Service),上網就可以使用軟體應用程式服務 • (二)程式設計師不用安裝軟體應用程式開發軟體,使用雲端應用平台服務(PaaS, Platform-as-a-Service),上網就可以完成軟體程式系統開發 • (三)企業不用自行建置機房,使用雲端資料中心服務 (IaaS, Infrastructure-as-a-Service),上網就可以購買主機運算與儲存空間,發展自己需要的資訊系統 • 台灣資通訊產業擅長製造,面對這一波雲端運算以軟體服務為主的競爭時代,台灣資訊產業必須轉型升級高附加價值的系統製造與應用服務發展。 M.-S. Chen
雲端運算為台灣硬體產業帶來機會:雲端貨櫃型電腦雲端運算為台灣硬體產業帶來機會:雲端貨櫃型電腦 • 1980年代 – “ 每個人的桌上都有一台 個人電腦” • 2010年代 – “ 每個人在行動中擁有一台 超級電腦” 3000 貨櫃型電腦 (雲端運算) Internet 行動裝置 (行動運算) Billion (US$) 提供行動服務 1000 個人電腦 (個人運算) 提供個人&辦公室應用 Mainframe (集中運算) 300 傳統企業ERP/MRP, 自動化等應用 100 Year 1939 1980 1995 2005 2015 「雲端貨櫃型電腦資料中心」讓未來電腦運算就像是水、電一樣,只要連上網路就可以無限使用。 M.-S. Chen
雲端運算為台灣軟體產業帶來機會:智慧生活創新應用服務雲端運算為台灣軟體產業帶來機會:智慧生活創新應用服務 半導體產業 Semiconductor Industry 資訊軟體產業 Software Service Industry 類比 資 訊 設 備 / 裝 置 雲 端 設 備 / 裝 置 Cloud Computing Foundries (TSMC, UMC) IT device Cloud device Enable Enable Datacenter-less SaaS provider 無須自建資料中心的軟體服務 Fabless Chip Design(e.g. nVidia…) 無須自建生產線的晶片設計 Source: “Above the Clouds: A Berkeley View of Cloud Computing” Feb. 4, 2009 & Revision Opportunities in Smart Living • 未來台灣資訊軟體公司可以不需要煩惱電腦主機建置與維護,可以基於雲端資料中心設施來直接開發軟體,供全世界用戶使用,創造無限商機 • 例如:Zynga.com基於Facebook公司的雲端運算平台,發展FarmVille(開心農場),於短短2年內創造1億美金營收,打破Google紀錄 Smart food systems Intelligent oil field technologies Smart energy grids Smart healthcare Smart retail Smart traffic systems M.-S. Chen Smart supply chains Smart countries Smart weather Smart regions Smart cities Smart water management Source: IBM
台灣雲端運算利基市場與商機 (一)貨櫃式電腦系統: 傳統資料中心要建設上萬台電腦組成之雲端資料中心,涉及複雜的軟、硬體系統與水、電、冷卻設施整合,採用貨櫃式電腦系統,可加速雲端資料中心建設,以台灣伺服器產業出貨量世界第一的實力,有發展利基 (二)開放、安全雲端作業系統: 10萬台以上電腦的伺服器、儲存器、網路設備需要作業系統支援虛擬化、叢集化之整合運作,目前產業沒有技術標準,業者需要開放架構以避免被牽制,以台灣資訊安全軟體國際實力與法人軟體工程人才投入,可基於開放源碼,開發開放、安全雲端作業系統,提供雲端資料中心建置必備軟體 (三)雲加端創新應用服務: 台灣基於多元、優勢服務業知識基礎,加上政府推動智慧台灣與六大新興產業,可藉此利基發展各式提昇民眾生活水準之智慧生活創新應用服務,並藉雲端服務加上台灣終端裝置出貨量世界第一優勢,快速銷往國際 M.-S. Chen
Outline • Introduction to Cloud Computing • Taiwan’s opportunity • Cloud applications • Technology issues • Work in NetDB • Conclusion M.-S. Chen
Software as a Service M.-S. Chen • Google 所有的應用服務 • 網路Video分享平台 • 網路照片分享 • Yahoo的應用服務. • Other
Software as a Service (cont’d) M.-S. Chen Social Network 微網誌 Twitter,Plurk 網路布落格 Blog CRM,ERP 等企業應用服務
Platform as a Service M.-S. Chen • 主要提供API給用戶開發雲端版本的應用,以提供服務, 例如: • Google AppEngine • 讓您在 Google 的基礎結構上執行您的網路應用程式 • Facebook f8 platform • facebook系統提出API,主要讓第三方廠商使用facebook的SNS平台,節省開發成本集中在創意的應用程式開發
Platform as a Service (cont’d) M.-S. Chen • Microsoft Azure • 新一代的微軟雲端運算平台,其包含Azure,SQLAzure和AppFabric • IBMPangoo Platform(盤古雲端服務平臺) • 架在WAS、DB2與Tivoli等IBM產品上的平台即服務(PaaS)雲端架構 • Amazon Web Services • AWS is PaaS, EC2 is IaaS
Infrastructure as a Service M.-S. Chen • 以Amazon 的EC2(Elastic Compute Cloud) 最為著名 • Pay as you go, 按執行的時間與數量有其計價模式 • 將資訊基礎設施透過虛擬化的平台整合的服務,用戶不需要採購伺服器、租用實際空間或網路,透過委外租用IaaS的方式取得所需要的資源
Amazon Business Model M.-S. Chen • EC2On-Demand Instances • 不同的CPU計算能力與記憶體需求有不同的收費 • Default is 1.7 GB of memory, 1 EC2 Compute Unit 160 GB of local instance storage, 32-bit platform
Outline • Introduction to Cloud Computing • Taiwan’s opportunity • Cloud applications • Technology issues • Work in NetDB • Conclusion M.-S. Chen
虛擬運算技術(支援運算資源分享能力) • 虛擬化/虛擬運算技術(Virtualization) 是藉由一種對應方式 (virtual machine monitor, hypervisor, or virtualization layer),將一群硬體,例如:伺服器、儲存器,轉成虛擬裝置(devices),使不同種作業系統(operating system) 能共同使用這一群硬體。 Source: Mendel Rosenblum Stanford U., 1998 M.-S. Chen
叢集運算技術(支援高效能高延展運算能力) • 將許多實體電腦(通常是相同規格),以網路連結,實現高延展與高效能的分散式運算(例如:Google Search) M.-S. Chen
軟體即服務供應平台技術 • 依據用戶之付費方案,提供租用軟體之運算資源配置,與服務品質管理 • 提供租用軟體之線上訂閱、認證、 、授權、使用紀錄/清算/計費等功能 Call Center Support System Enterprise A Enterprise B Enterprise C Enterprise D On-demand Application On-demand Application On-demand Application On-demand Application Multi-tenancy Service Delivery Platform Runtime Access Control Order Management Management Agent Metering Security Log Management Log Usage Tracking Identity Management SLA Monitoring CRM Availability Management Alerts Security Billing Performance M.-S. Chen 圖來源:Forrester Research 圖來源:Microsoft Provisioning
虛擬桌機與網路桌面技術 • 終端軟體與資料可全部移往雲端資料中心實體/虛擬主機執行,透過串流(streaming)技術與豐富型網頁(RIA,Rich Internet Application)技術,以平價簡易終端,就可提供使用者目前相同於或超越於個人電腦桌面的使用體驗 Virtual Desktop Web Desktop (Webtop) http://g.ho.st M.-S. Chen
Outline • Introduction to Cloud Computing • Taiwan’s opportunity • Cloud applications • Technology issues • Work in NetDB • Conclusion M.-S. Chen
Cloud Computing Related Work in NetDB • Content search • 2D bar code, watch (with CR Kung and H. Chi) • Social network enhanced search/queries • Mining for SaaS activities • 團購 (with YH Hung) • Sequential pattern mining on Cloud (with JW Huang) M.-S. Chen
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 … S01 S02 S03 S04 S05 S06 Db1,5 Db2,6 Db3,7 Db4,8 Db5,9 Db6,10 A C BD B C AD B AD B A C A A BC B C D D C A BC D D B A C D A C SID time POI=5, min_supp=0.5 M.-S. Chen 26
Motivation for Mining on Cloud M.-S. Chen With the increasing amount of data, single processors struggle to scale up. Mining progressive sequential patterns intrinsically suffers from the scalability problem. When the number of sequences grows and the POI becomes larger, the time and space used to conduct progressive sequential patterns will increase dramatically. 27
Distributed Algorithms in Hadoop Platform M.-S. Chen Hadoop is an open source project aiming at building a cloud infrastructure running on large clusters for a huge amount of data. Hadoop platform implements Map/Reduce paradigm. Easy to scale up (adding more machines into the clusters) in Hadoop platform. Distributed algorithms devised lead to performance improvement on sequential pattern mining (PAKDD 2010) 28
資策會成立雲端服務技術中心 研發雲加端服務平台研發與從事「高附加價值」雲端應用服務的委託設計(Cloud Device-and-Service ODM),協助台灣資訊業者技術升級,進軍雲端應用服務國際市場。 雲(加)端服務平台 雲端資安 Phone E-Book NetBook Display 虛擬桌機與網路桌面 Device OS (Android, WinCE, BIOS. .) 中小企業 應用 文化創意 應用 醫療照護 應用 軟體即服務供應平台 Java/.NET Application Platform 分工說明 研發實驗用途之雲端資料中心 (高雄軟體園區) 業者 資策會 開放式雲端作業系統 外商 M.-S. Chen 一般商用伺服務器、儲存與網路設備
工研院成立雲端運算中心 研發「貨櫃式電腦(Container Computer)」及「雲端作業系統(Cloud OS)」,扶植台灣業者推出「整廠輸出」形式的雲端資料中心產品,進軍全球雲端運算市場。 Server 分工說明 Cloud Data Center Cloud OS Container Computer 業者 Storage Cloud Data Center 工研院 Network M.-S. Chen 外商
Conclusion • Elastic IT 時代就要來臨 • Cloud computing will become prevalent in light of the following • Drop of hardware cost • Improvement of network/hardware speed • Increase of storage (disk/memory) size • Demand for ubiquitous applications • Explosion of information and data (data mining) M.-S. Chen
Thank you! M.-S. Chen
Amazon Business Model M.-S. Chen • EC2Reserved Instances • 若是要保留Instance, 則有不同的計價方法
Amazon Business Model M.-S. Chen • EC2 Data Transfer • 資料傳輸有額外的計價,
Simple Storage Service (Amazon S3) M.-S. Chen S3是Amazon提供的線上儲存服務,主要針對有網路上空間使用需求的企業或者使用者所設,將儲存檔案用的線上空間租借給使用者
Amazon EC2(Elastic Compute Cloud) • Create an Amazon Machine Image (AMI) containing your applications, libraries, data and associated configuration settings. Or use our pre-configured, templated images to get up and running immediately. • Upload the AMI into Amazon S3. Amazon EC2 provides tools that make storing the AMI simple. Amazon S3 provides a safe, reliable and fast repository to store your images. • Use Amazon EC2 web service to configure security and network access. • Choose the type(s) of instance you want to run. • Start, terminate, and monitor as many instances of your AMI as needed, using the web service APIs. • Pay for the instance-hours and bandwidth that you actually consume. M.-S. Chen
Amazon EC2(Elastic Compute Cloud) • Instances (per instance-hour )$0.10 - Small Instance (Default) 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform $0.40 - Large Instance 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform $0.80 - Extra Large Instance 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform • Data Transfer $0.10 per GB - all data transfer in $0.18 per GB - first 10 TB / month data transfer out$0.16 per GB - next 40 TB / month data transfer out$0.13 per GB - data transfer out / month over 50 TB M.-S. Chen
Cloud Computing Related Work in NetDB (cont’d) M.-S. Chen • Infrastructure as a Service (IaaS) • 現階段透過Eucalyptus架設使用者可執行並控制virtual machine instances的環境, 可直接相容於EC2/S3的API工具. • 暫時架設100左右的機器, 未來可能持續增加
Mining on Cloud: Sequential Pattern Mining Period of Interest (abbreviated as POI) is a sliding window whose length is a user-specified time interval, continuously advancing as the time goes by. The sequences having elements whose timestamps fall into this period, POI, contribute to the |Db| for current sequential patterns. M.-S. Chen 39
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 … S01 S02 S03 S04 S05 S06 Db1,5 Db2,6 Db3,7 Db4,8 Db5,9 Db6,10 A C BD B C AD B AD B A C A A BC B C D D C A BC D D B A C D A C SID time POI=5, min_supp=0.5 M.-S. Chen 40
Motivation for Mining on Cloud M.-S. Chen With the increasing amount of data, single processors struggle to scale up. Mining progressive sequential patterns intrinsically suffers from the scalability problem. When the number of sequences grows and the POI becomes larger, the time and space used to conduct progressive sequential patterns will increase dramatically. 41
Distributed Mining Algorithm on Cloud M.-S. Chen We design a distributed mining algorithm on the cloud to address the scalability problem. Distributed Progressive Sequential Pattern mining algorithm is implemented on top of Hadoop platform, which realizes the cloud computing environment. 42
Hadoop Platform M.-S. Chen Hadoop is an open source project aiming at building a cloud infrastructure running on large clusters to deal with a huge amount of data. Hadoop platform implements Google’s Map/Reduce paradigm. It is extremely easy to scale up (adding more machines into the clusters) in Hadoop platform. 43
Map/Reduce Paradigm M.-S. Chen By means of the map function, the application can be divided into several fractions. Each fraction is assigned to a single node in large clusters and executed by the node. After the execution, the reduce function merges these partial results to form the final output. 44
Cloud Computing Environment M.-S. Chen • The cloud computing environment allows developers to focus on designing distributed algorithms, and offers great scalability • Routine issues can be inherently handled by the cloud computing framework. • E.g., data allocation, job scheduling, load balancing, failure recovery • Lead to performance improvement on sequential pattern mining (PAKDD 2010) 45
Designs of DPSP M.-S. Chen • We propose two Map/Reduce jobs in DPSP. • CandidateComputingJob • computes current candidate sequential patterns • deletes obsolete itemsets • update the summary of each sequence • SupportAssemblingJob • accumulates the occurrence frequencies of candidate sequential patterns • report up-to-date frequent sequential patterns within each POI. 46
Algorithm DPSP M.-S. Chen Input: itemsets of all sequences arriving at the current timestamp 1. while (there is new data arriving at timestamp t ){ 2. CandidateComputingJob ; 3. SupportAssemblingJob ; 4. t = t + 1 ; 5. output frequent sequential patterns; 6. }end while output: frequent sequential patterns at each POI with their supports 47
input data at time t summaries at time t-1 CCMapper CCMapper CCMapper CCReducer CCReducer CCReducer <candidate itemset, null> summaries at time t Start DPSP Candidate Calculating Job many <SeqNo, itemset> or <SeqNo, itemset + timestamp> pairs M.-S. Chen 48
Example of CCJ t2 t3 t4 t5 t6 t7 t8 t9 t10 … t1 S01 A C BD B C AD B S02 AD B A C S03 A A BC B C D S04 D C A BC D S05 D B A C D S06 A C time Db1,5 Db2,6 Db3,7 Db4,8 Db5,9 Db6,10 S01 A B C AD B C BD … t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 time M.-S. Chen 49
S01 A B C AD B C BD … t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 time M.-S. Chen 50