420 likes | 534 Views
Everything you wanted to know about Velocity (but were afraid to cache). Scott Colestock scott@marcatopartners.com Marcato Partners, LLC. About Scott. Scott Colestock scott@colestock.net Twitter: scolestock Marcato Partners (MarcatoPartners.com) One of three partners
E N D
Everything you wanted to know about Velocity(but were afraid to cache) Scott Colestock scott@marcatopartners.com Marcato Partners, LLC
About Scott • Scott Colestock • scott@colestock.net • Twitter: scolestock • Marcato Partners (MarcatoPartners.com) • One of three partners • Focused on agile coaching • Focused on helping early-stage startup ventures in the mobile space
What is it? Velocity is a distributed key/value cache that provides .NET developers with a way to increase performance and scalability when writing data-centric applications.
What is it? (2) • The combined RAM available to all servers in a Velocity cluster is presented to Velocity clients as a unified whole • Any serializable CLR object can be stored • Actual location within cluster is transparent • Client is a simple key/value API at heart • Run as a service accessed across the network • Additional servers can be added on demand
What we’ll cover • What motivates this product/technology • Terms / Pictures / Concepts • Deploy / Install Process • A lap around the API & Admin model • Demos • Gotchyas
Motivation • Data-centric applications have been the norm for a long while • Relational data • More recently, “service-obtained” data • Velocity is about increasing performance by bringing the data physically closer to the consumer • Reduce pressure on underlying data stores/services • Velocity can be about storing data in value-added form (logically closer to the consumer) • Object graphs • Output caching (not explicit in V1) • Aggregated data in xml or other transformed formats
Motivation (2) • Databases are always a point of high contention as you scale out, and tuning is expensive • Are your data retrieval sprocs getting harder to maintain - excessive sql chops required? • Service calls for reference data (internal/external) are often slow or intentionally throttled • Caching has always been considered a solution for these issues…
Motivation (3) • Machine-local caching solutions (like Microsoft’s “Enterprise Library Caching Application Block”) can provide partial answer • Easy key/value API • Flexible store (memory, disk-backed, etc.) • Flexible expiration and eviction policy • Limitations: • Limited by the memory available to a single node… • Application recycles typically mean you lose the cache • In a load-balanced environment, a large data set means you will frequently “miss” when attempting to load from cache…
Motivation (4) Machine-local caches wind up being sparsely populated when used with a load balancer (if the data set has many keys) Key 3,5,23 Load Balancer Key 7,11,47 Key 12,16,33
Motivation (5) • With machine-local caches, you have no central place to update/delete cached items • This means you can only cache data that can afford to be stale by some time period • If the time period is short, you need a low TTL (time-to-live, aka expiration) which means more cache misses • You can’t cache data that must have changes visible to the system in (near) real time • With a single logical cache, you have one cache to shoot in the event of an update/delete • Might be able to live with no expiration
What we’ll cover • What motivates this product/technology • Terms / Pictures / Concepts • Deploy / Install Process • A lap around the API & Admin model • Demos • Gotchyas
Windows Server AppFabric Caching • History: AppFabric caching was a separate component • Public debut at TechEd 2008 (earlier?) • Codename: Velocity • “Dublin” was a separate effort, focused on providing a hosting and management environment around WCF/WF • November 2009: Technologies grouped under heading of “Windows Server AppFabric” • RTW in June 2010…
Relationship to Windows Azure AppFabric • Service bus: Handle communication and authentication for accessing applications • Expose apps through firewalls, NAT gateways, etc. • Assist cloud-based apps talking to on-premise apps • Other composite app scenarios; pub/sub • Access Control Service: Allow you to avoid setting up federated identity agreements just to grant partner/customer access to your cloud-based or on-premise apps. • Today: Only common marketing/branding with Windows Server AppFabric. • Later: Common services for both
Cache-Aside Pattern • In the current version, the out-of-box support is for the “cache-aside” pattern. • Check cache • If miss, retrieve data, then populate the cache • Lots of other patterns you might contemplate (and simulate) with what is provided • Read-through/Write-through • Refresh-ahead/Write-behind
Logical Hierarchy Client apps work with a single logical unit of cache Cache Cluster Regions can be implicit or explicit. Use explicit only for bulk gets or searching. Server C Server B Server A Cache Host C Cache Host B Cache Host A Caches explicitly created with TTL, expiration, HA policy Named Cache: Product Catalog Region: Sports Default Cache Region 1 Region 3 Regions represent a partition of data (subset of key/value pairs). Live on one node. Unit of replication/failover. Server process is DistributedCacheService.exe
Logical Hierarchy Named Cache: Product Catalog Region: Sports Default Cache Region 1
Physical Layout Cache Cluster Web Server A Cache Server A IIS 7.x Cache Host Web Server B Load Balancer Cache Server B IIS 7.x Cache Host Web Server C Cache Server C IIS 7.x Cache Host
Combined Deployment Web Server A IIS 7.x Cache Host Web Server B Load Balancer IIS 7.x Cache Host Web Server C IIS 7.x Cache Host
Physical Layout Cache Cluster • Configuration store contains cache policies and global partition map (how keys divide into regions, which servers have which regions) • If Sqlconfig store, servers will send heartbeat to Sql. Otherwise, heartbeat goes to one or more “lead hosts” • Partition map used by “Global Partition Manager” (one node in the cluster, but auto failover) to communicate routing information to Velocity clients Web Server A Cache Server A Config Store (File share or Sql Server) IIS 7.x Cache Host Web Server B Load Balancer Cache Server B IIS 7.x Cache Host Web Server C Cache Server C IIS 7.x Cache Host
Regions as unit of replication/failover(Global Partition Manager in action) Cache Cluster Server C Server B Server A Cache Host C Cache Host B Cache Host A Named Cache: Product Catalog Region: Sports Default Cache Region 1
Regions as unit of replication/failover(When using Secondaries) Cache Cluster Server C Server B Server A Cache Host C Cache Host B Cache Host A Named Cache: Product Catalog Region: Sports Sports secondary Default Cache Region 1 Region 1 secondary (Updates done synchronously)
Local Cache Cache Cluster • Local cache is an option that can be enabled when creating the cache client (DataCacheFactory) • Allows a local cache to be populated that will prevent network hop (and serialization) if request can be satisfied locally • Best when data set is (relatively) small, changes infrequently, and stale data is acceptable • Can expire via TTL or notifications (which might be late/lost) • Can specify max object count before evicting LRU Web Server A Cache Server A IIS 7.x Local Cache Cache Host Web Server B Load Balancer Cache Server B IIS 7.x Local Cache Cache Host Web Server C Cache Server C IIS 7.x Local Cache Cache Host
Data Types and Caching Considerations • Reference Data: Product catalogs, “lookup” tables, other slow-moving content • Safe to cache for a defined period of time because you probably live with staleness already • “Local” cache option might be desirable for small data sets • Activity Data: Shopping carts or other transient transaction state • Accessed for read and write operations, but not shared. Low/No concurrency considerations – exclusive write. • Safe to cache for reads and keep in cache for writes • Resource Data: Inventory, Orders, and other core transactional data • Accessed concurrently for read and write • Caching will require a concurrency model to be chosen and managed
What we’ll cover • What motivates this product/technology • Terms / Pictures / Concepts • Deploy / Install Process • A lap around the API & Admin model • Demos • Gotchyas
Deploy/Install Considerations • Windows “Application Server” Role required • A few critical updates (see install guide) • .NET3.5SP1 for cache clients; .NET4 for servers • You’ll need Powershell 2 (already in Win7/Win2k8R2) • Windows XP cannot be a client… • “Install” and “Configure” for AppFabric are two distinct steps
Deploy/Install Considerations • Primary screen of interest is choosing your configuration store: • XML/File share • Sql-Based • File share avoids the need for Sql Server, but requires that some nodes in the cache cluster be special (“Lead Hosts”) • Using Sql as the configuration store is the better engineering choice for production – you may have other reasons to avoid it.
Deploy/Install Considerations • As you build out your AppFabric Cache Cluster, you will do “New Cluster” on the first node, and “Join Cluster” on subsequent nodes • Ultimately, all of Windows Server AppFabric is a set of features underneath the Application Server Role – so standard command line installations work. • Setup.exe /install /icachingservice,cacheclient,cacheadmin /l:c:\temp\setup.log
Deploy/Install Considerations • Can do a “Cache client” install for clients, or for internal apps, just incorporate client assemblies in your own build/deploy process Microsoft.ApplicationServer.Caching.Core.dll Microsoft.ApplicationServer.Caching.Client.dll Microsoft.WindowsFabric.Common.dll Microsoft.WindowsFabric.Data.Common.dll
What we’ll cover • What motivates this product/technology • Terms / Pictures / Concepts • Deploy / Install Process • A lap around the API & Admin model • Demos • Gotchyas
Caching Classes DataCacheFactory DataCache DataCacheFactory() DataCacheFactory(configuration) DataCacheGetCache(string cache) GetDefaultCache() DataCacheFactoryConfiguration LocalCacheProperties NotificationProperties SecurityProperties DataCacheServerEndpoint[] Servers (Can set these via configuration)
DataCache with DataCacheItemVersion • GetCacheItem: returns tags and version info • GetIfNewer: lets you use that version info! • Put and Remove have overloads that takes version info • Allows for an optimistic concurrency model • Will only succeed if version information matches what is current for the cached item
DataCache and Locking • GetAndLock: Allows you to lock a cache item for a specified time period, even if not present • (Will fail if already locked) • public Object GetAndLock (string key, TimeSpan timeout, out DataCacheLockHandlelockHandle, boolforceLock) • Useful when attempting to get multiple servers to coordinate “cache pre-load” activity • PutAndUnlock: Unlock an item, with given key and lock handle • Unlock: Explicitly unlock, optional extend TTL
DataCache and Tags/Regions • Explicitly created regions live on a single node…can create a hot spot for both call volume and memory growth • But they offer bulk retrieval and flexible tag-based retrieves • For secondary indexes, instead of regions: simulate secondary indexes with your own secondary-to-primary mapping cache
Administrative Model • Administration for AppFabric Caching done purely through PowerShell • Can administrate entire Cache Cluster from wherever administrative portion of install has been done – all nodes addressable from single command line location • Use-CacheCluster points the shell at a particular cluster to administrate • Get-Command -module DistributedCacheAdministration
What we’ll cover • What motivates this product/technology • Terms / Pictures / Concepts • Deploy / Install Process • A lap around the API & Admin model • Demos • Gotchyas
What we’ll cover • What motivates this product/technology • Terms / Pictures / Concepts • Deploy / Install Process • A lap around the API & Admin model • Demos • Gotchyas
Gotchyas • Balance number of nodes in cluster with memory per node. • Too many nodes = cluster overhead, too much memory per node = GC overhead • If you don’t use SqlConfig Store, you need to manually run Start-CacheHost after reboot • Consider the nature of data stored in cache, and secure appropriately (don’t let cache be weakest link) • SqlConfig Store requires high Sql privileges right now at point of install • Currently service runs as network service account • Consider what you will do when cache is down • You can go after source of truth • How do you avoid leaving stale data in the cache?
Resources • AppFabric Caching and Deployment Guide • http://bit.ly/AppFabMgmt • AppFabric Development Center • http://bit.ly/AppFabDevCtr • AppFabric Forums • http://bit.ly/AppFabForum • NHibernate integration • http://sourceforge.net/projects/nhcontrib/files/NHibernate.Caches/ • Entity Framework integration (basis for) • http://code.msdn.microsoft.com/EFProviderWrappers • Recent MSDN: http://msdn.microsoft.com/en-us/magazine/ff714581.aspx
Thank you - Questions?