Deploying Hybris on Windows Azure for Cloud Data Management

Hybris – cloud - bigdata V1.0 19/11/2014 Yassine MEJRI

Hybris-cloud-bigdata Cloud Windows Azure Deploying Hybris on Windows Azure Elasticsearch Kibana Use cases : Analytics, Machine learning. Agenda

CLOUD Cloud Computing A standardised IT capability (services, software or infrastructure) delivered via internet technologies in a pay-per-use, self-service way Cloudservices are shared services, under virtualised management, accessible over the internet A style of computing where massively scalable IT-related capabilities are provided “as a service” using internet technologies to multiple external customers

CLOUD History 1960 : John McCarthy’s Concept “Computation may someday be organized as a public utility." “Pioneered the concept of delivering enterprise applications via a simple website” 1999 : Salesforce.com 2000 : Microsoft 2001 : IBM “Expanded Sass Concept through web service” 2005 : Amazon “Launch of Amazon web services” 2007 : Google and IBM “Start researching Cloud Computing” 2008 : Gartner Research “Start using Cloud Computing in many organization”

Cloud http://www.cloudscreener.com/ Cloud computing providers

CLOUD WINDOWS AZURE

WINDOWS AZURE WINDOWS AZURE LAYERS

WINDOWS AZURE Cloud service model

Windows azure Geo-location Datacenter US Europe Asia South Central US North Central US Western Europe South East Asia West US East US Northern Europe East Asia

WINDOWS AZURE Building and running apps

Windows Azure Blob Storage

Windows azure blob storage Azure Blob storage is a service for storing large amounts of unstructured data, such as text or binary data, that can be accessed from anywhere in the world via HTTP or HTTPS. Common uses of Blob storage include: Serving images or documents directly to a browser Storing files for distributed access Streaming video and audio Performing secure backup and disaster recovery Architecture Services: PutBlob, GetBlob, DeleteBlob, CopyBlob, SnapshotBlob, LeaseBlob…

Windows azure blob storage Connexion String : publicstaticfinalString storageConnectionString ="DefaultEndpointsProtocol=http;"+"AccountName=your_storage_account;"+"AccountKey=your_storage_account_key"; Create container : CloudStorageAccount storageAccount =CloudStorageAccount.parse(storageConnectionString); CloudBlobClient blobClient = storageAccount.createCloudBlobClient(); CloudBlobContainer container = blobClient.getContainerReference("images"); container.createIfNotExists(); Java API

Windows azure blob storage Change permissions : BlobContainerPermissions containerPermissions =newBlobContainerPermissions(); containerPermissions.setPublicAccess(BlobContainerPublicAccessType.CONTAINER); container.uploadPermissions(containerPermissions); Upload blob : finalString filePath ="C:\\myimages\\myimage.jpg"; CloudBlockBlob blob = container.getBlockBlobReference("myimage.jpg"); File source =newFile(filePath); blob.upload(newFileInputStream(source), source.length()); Download blob : for(ListBlobItem blobItem : container.listBlobs()){ if(blobItem instanceofCloudBlob){ CloudBlob blob =(CloudBlob) blobItem; blob.download(newFileOutputStream("C:\\mydownloads\\"+ blob.getName())); } } Java API

Windows azure blob storage Tables NoSQL http://<account>.table.core.windows.net/<table> Services: Insert, Update, Delete, Query, Entity Group Transaction…

Windows azure blob storage Queue http://<account>.queue.core.windows.net/<queue>/messages Services: Put, Get, Peek, Delete, Update…

CLOUD Windows Azure Management Console

cloud Windows azure SDK : Import-AzurePublishSettingsFile -PublishSettingsFile "full path to downloaded file“ New-AzureAffinityGroup -Name pslab-group -Location "East US“ New-AzureQuickVM -ImageName $VMImage -Windows -Name $myVMName -ServiceName $myVMName -AdminUsername $myAdminName -Password $myAdminPwd -AffinityGroup pslab-grou Stop-AzureVM -Name $myVMName -ServiceName $myVMName Start-AzureVM -Name $myVMName -ServiceName $myVMName Restart-AzureVM -Name $myVMName -ServiceName $myVMName Azure SDK : Powershell, Node.js, Java …

HYBRIS Use Case : Deploying Hybris on Windows Azure Deploy Hybris

Deployhybris Architecture : auto-scalable horizontal and vertical VIP : windows Azure Load Balancer (Failover, Round Robin, Performance) CDN HTTP/HTTPS Ni N2 N1 N1 N2 Ni Cloud Service F.O Cloud Service B.O AZURE SQL SERVER Azure Blob Storage : Medias, Files, Attachements, orders.pdf…

Hybris-CLOUD Windows Azure Blob provides a simple web services interface that can be used to store and retrieve any amount of data. You can configure a specific MediaFolder to store binary data of a Media item directly in Windows Azure Blob. To configure your folder to use Windows Azure Blob you need to have: Windows Azure account Properly created Access Keys For more details read http://www.windowsazure.com/en-us/develop/net/how-to-guides/blob-storage/. Azure cloud Extension

Hybris-cloud Azure cloud Extension

Hybris-CLOUD https://wiki.hybris.com/display/release5/Using+Windows+Azure+Blob+Media+Storage+Strategy Import extension : azurecloud Configure blob storage in local.properties: Global settings : media.globalSettings.accountKey= media.globalSettings.accountName= media.globalSettings.connection=UseDevelopmentStorage\=True media.globalSettings.endPointProtocol=http media.globalSettings.local.cache=true media.globalSettings.public.base.url=http://127.0.0.1:10000/devstoreaccount1 media.globalSettings.secured=true media.globalSettings.storage.strategy=windowsAzureBlobStorageStrategy media.globalSettings.url.strategy=windowsAzureBlobURLStrategy Azure cloud Extension

Hybris-CLOUD 3. How to create new blob storagefolder : …….. media.folder.invoices.accountKey= media.folder.invoices.accountName= media.folder.invoices.connection=UseDevelopmentStorage\=True media.folder.invoices.endPointProtocol=http media.folder.invoices.local.cache=true media.folder.invoices.public.base.url=http://127.0.0.1:10000/devstoreaccount1 media.folder.invoices.secured=true media.folder.invoices.storage.strategy=windowsAzureBlobStorageStrategy media.folder.invoices.url.strategy=windowsAzureBlobURLStrategy …….. Azure cloud Extension

Hybris-CLOUD 4. StoringMedia Files : finalMediaModel media =modelService.create(MediaModel.class);media.setCatalogVersion(catalogVersionService.getCatalogVersion("productCatalog","Staged")); finalMediaFolderModelfolder=mediaService.getFolder("invoices"); media.setFolder(folder); mediaService.save(media); Azure cloud Extension

Hybris-cloud Secure media access

Hybris-cloud You can enable secure media access for specific Media folder by putting in your local.properties file the following property set to true: media.folder.<folderName>.secured=true It means that only secure URL will be rendered for each Media item stored in these folders. It also means that access to these medias will be filtered only by the SecureMediaFilter. ManagingPermissions: Use the MediaPermissionService Using hMC You can grant or deny access to a Media item for a give principal by opening specific Media item and going to Security tab. Using ImpEx Below you can find the example of an ImpEx import script for granting access to a Media item with code 1017895.jpg for the editor principal: INSERT_UPDATE media; code[unique=true]; catalogVersion(catalog(id),version)[unique=true]; permittedPrincipals(uid);;1017895.jpg; clothescatalog:Staged;editor; Secure media access

Hybris-CLOUD http://hybrisazure.blob.core.windows.net/hybris/sys_master/root/h3e/hd7/8796157378590.jpg Initialze or Update Hybris : Keep in mind that even if name of custom container is myContainer, then prefix with tenantId is added automatically, so finally container name is sys-master-myContainer. The pattern is sys-<tenantID>-<containerName>. To control cleaning Windows Azure storage on fresh initialization use following global property: media.globalSettings.windowsAzureBlobStorageStrategy.cleanOnInit={true or false} Azure cloud Extension

Deployhybris Azure Cloud Service ? VIP : windows Azure Load Balancer (Failover, Round Robin, Performance) CDN HTTP/HTTPS Ni N2 N1 N1 N2 Ni Cloud Service F.O Cloud Service B.O AZURE SQL SERVER Azure Blob Storage : Medias, Files, Attachements, orders.pdf…

Deployhybris AzureRunMe

Deployhybris Windows Azure Services are described by two important artifacts: Service Definition (*.csdef) Service Configuration (*.cscfg) Your code is zipped and packaged with definition (*.cspkg) Encrypted(Zipped(Code + *.csdef)) == *.cspkg Windows Azure consumes just (*.cspkg + *.cscfg) Packaging and Deploy Hybris

Deployhybris # import Azure dll $env:PSModulePath=$env:PSModulePath+";"+"C:\Program Files (x86)\Microsoft SDKs\Windows Azure\PowerShell Import-Module Azure # Connexion Import-AzurePublishSettingsFile$pubsettings Select-AzureSubscription-SubscriptionName $selectedsubscription Set-AzureSubscription-CurrentStorageAccount $storageAccountName-SubscriptionName $selectedsubscription # Create New deployement $opstat=New-AzureDeployment-Slot $slot-Package $packageLocation-Configuration $cloudConfigLocation-label $deploymentLabel-ServiceName $serviceName # Upgrade deployement $setdeployment=Set-AzureDeployment-Upgrade -Slot $slot-Package $packageLocation-Configuration $cloudConfigLocation-label $deploymentLabel-ServiceName $serviceName-Force # swap deployment, staging  production Move-AzureDeployment -ServiceName $serviceName Devops : Azure PowerShell cmdlets

Hybris-CLOUD Demo : AzureRunMe and Windows Azure Emulator AzureRunMe

Elasticsearch Elasticsearch

ElasticSearch https://github.com/elasticsearch Java Apache Lucene Plug and play Document Oriented Scalable Clustering Lucene Sharding and replication REST/ JSON Client Apache2 license Elasticsearch

Elasicsearch SQL VS ES

Elasticsearch Architecture

ElasticSearch Core types : String, Integer, Long , Double, Boolean, Date, Binary …. IP type : "address" : { "type" : "ip", "store" : "yes" } { "name" : "Tom PC", "address" : "192.168.2.123" } Geo point type : "location" : { "type" : "geo_point"} Attachement type : "my_attachment" : { "type" : "attachment" } Token count type : The token_count field type allows us to store index information about how many words the given field has instead of storing and indexing the text provided to the field. "address_count" : { "type" : "token_count", "store" : "yes" } Mapping fields types

ElasticSearch Object types : JSON documents are hierarchical in nature, allowing them to define inner "objects" within the actual JSON. "tweet" : { "properties" : { "person" : { "type" : "object", "properties" : { "name" : {"type" : "object", "properties" : { "first_name" : {"type" : "string"}, "last_name" : {"type" : "string"} } }, "sid" : {"type" : "string", "index" : "not_analyzed"} } }, "message" : {"type" : "string"}} } Mapping fields types

ElasticSearch Nested Types : The nested type works like the object type except that an array of objects is flattened, while an array of nested objects allows each object to be queried independently. To explain, consider this document: Mapping : { "type1" : { "properties" : { "users" : { "type" : "nested", "properties": { "first" : {"type": "string" }, "last" : {"type": "string" } } } } }} Mapping fields types

ElasticSearch Array types : JSON documents allow to define an array (list) of fields or objects. "Product" : [ { "id" : 12 "title" : "iphone", "categories" : [1,3,5,7], "tag" : ["iphone4", "iphone5","iphone6"], "author" : [ { "firstname" : "Francois", "lastname": "francoisg", "id" : 18 }, { "firstname" : "Gregory", "lastname" : "gregquat" "id" : "2" } ]}} Mapping fields types

ELASTICsearch Relationnel vs denormalize

ELASTICsearch "translation": { "_routing" : { "required" : true, "path" : "project_id" }, "_id" : { "path" : "id" }, "_all" : { "enabled" : "false" }, "dynamic" : "strict", "properties" : { "id" : { "type" : "string", "index" : "not_analyzed" }, "public_id" : { "type" : "integer", "index" : "not_analyzed" }, "project_id" : { "type" : "string", "index" : "not_analyzed" }, "title_na" : { "type" : "string", "index" : "not_analyzed" }, "title" : { "type" : "string", "index" : "analyzed", "analyzer" : "trans_standard" }, "title_cs" : { "type" : "string", "index" : "analyzed", "analyzer" : "trans_cs" }, "description" : { "type" : "string", "index" : "analyzed", "analyzer" : "trans_standard" }, "description_cs" : { "type" : "string", "index" : "analyzed", "analyzer" : "trans_cs" }, "resource_file_id" : { "type" : "integer", "index" : "not_analyzed" }, "created_at" : { "type" : "long", "index" : "not_analyzed" }, "updated_at" : { "type" : "long", "index" : "not_analyzed" }, "any_empty" : { "type" : "boolean", "index" : "not_analyzed" }, "all_empty" : { "type" : "boolean", "index" : "not_analyzed" }, "status" : { "type" : "string", "index" : "not_analyzed" }, "phrases" : { "_id" : { "path" : "id" }, "type" : "nested", "properties" : { "id" : { "type" : "string", "index" : "not_analyzed" }, "iso2_lang" : { "type" : "string", "index" : "not_analyzed" }, "content" : { "type" : "string", "index" : "analyzed", "analyzer" : "trans_standard" }, "content_cs" : { "type" : "string", "index" : "analyzed", "analyzer" : "trans_cs" }, "created_at" : { "type" : "long", "index" : "not_analyzed" }, "updated_at" : { "type" : "long", "index" : "not_analyzed" }, "status" : { "type" : "string", "index" : "not_analyzed" } } } } } Relationnel vs denormalize

ELASTICSEARCH Insert Data: $ cat data.json { "index" : { "_index" : "requests" , "_type" : "request" , "_id" : 33 } } { "client" : "client1" , "country" : "FR" , "id" : 1, "ip" : "100.1.1.3", "password" : "test" , "sensor" : "test" , "session" : "EFRFR34344" , "success" : "OK" ,"timestamp" : "1414183085848", "username" : "test" } $ curl -XPOST http://localhost:9200/requests -d @data.json Update : $curl -XPOST 'localhost:9200/test/type1/1/_update'-d '{"doc":{"name":"new_name"}}'}}‘ Delete : $ curl -XDELETE 'http://localhost:9200/twitter/tweet/1‘ Elasticsearch : CRUD

Elasticsearch $ curl -XPOST http://localhost:9200/_search?<YOUR_QUERY> Query DSL

Elasticsearch 'http://localhost:9200/requests/_search?pretty' -d '{ "query": { "filtered": { "query": { "bool": { "should": [ { "query_string": { "query": "marketing.cars >100" } }, { "query_string": { "query": "marketing.music > 100" } }, { "query_string": { "query": "marketing.electronics > 00" } }, { "query_string": { "query": "marketing.fashion > 100" } } ] } }, "filter": { "bool": { "must": [ { "match_all": {} }, { "exists": { "field": "location" } } ] } } } }, "fields": [ "location", "remoteAddr" ], "size": 1000 }' Query DSL

ElasticSearch SearchRequestBuilder requestOne = node.client().prepareSearch().setQuery(QueryBuilders.matchQuery("name","test1")).setSize(1); SearchRequestBuilder requestTwo = node.client().prepareSearch().setQuery(QueryBuilders.matchQuery("name","test2")).setSize(1);MultiSearchResponse response = node.client().prepareMultiSearch().add(requestOne ).add(requestTwo ).execute().actionGet(); // You will get all individual responses from MultiSearchResponse#getResponses()long nbHits =0;for(MultiSearchResponse.Item item : sr.getResponses()){SearchResponse response = item.getResponse(); nbHits += response.getHits().getTotalHits();} MULTI SEARCH API

ElasticSearch The bulk API makes it possible to perform many index/delete operations in a single API call. This can greatly increase the indexing speed. Example $ cat requests{"index":{"_index":"test","_type":"type1","_id":"1"}}{"field1":"value1"}$ curl-s -XPOST localhost:9200/_bulk--data-binary@requests;echo{"took":7,"items":[{"create":{"_index":"test","_type":"type1","_id":"1","_version":1}}]} Bulk API

ElasticSearch The following snippet captures the basic structure of aggregations: "aggregations" : { "<aggregation_name>" : { "<aggregation_type>" : { <aggregation_body> } [,"aggregations" : { [<sub_aggregation>]+ } ]? } [,"<aggregation_name_2>" : { ... } ]*} Aggregations

Deploying Hybris on Windows Azure for Cloud Data Management