840 likes | 968 Views
Maximizing IT Savings from Effective Capacity Planning. Ron Kaminski. Foreign speaker rules. Please feel free to stop me to ask any questions Raise your hand or clap if I am going too fast or if my Mississippi accent becomes impossible for y ’all to understand
E N D
Maximizing IT Savings from Effective Capacity Planning Ron Kaminski
Foreign speaker rules • Please feel free to stop me to ask any questions • Raise your hand or clap if I am going too fast or if my Mississippi accent becomes impossible for y’all to understand • This is not rude, and I will not take it that way • The paper and all slides will be furnished to my hosts
Reducing Total IT Costs are the goal • IT management has a really tough job • They have to manage organizations that deal with so much complexity that subdivision by specialty, like DBAs, or VMware support, or indeed, Capacity Planning is the norm • They cannot hope to master all of the complexity below them in the organization, and it would be irrational to try • Still, even though the firm’s demand for IT support seems to be ever increasing, their management and the business folks have to have a yardstick to measure their effectiveness and a number that they cannot escape is total IT costs
The Capacity Planning Expert’s Trap • We capacity planners delight in learning and applying the picky minutia that make accurate predictions of future usage needs and joining that information to available hardware • This complex task, done well, is a source of great pride • However, with current tools, it can also be • immensely time consuming • and our methods often require expensive vendor tools, and are often difficult to explain to others in IT, much less management • So, we often don’t get credit for the savings that we can create
The value of Capacity Planning • When management views Capacity Planning (hereafter CP) as a partner in • decreasing risk, • increasing customer satisfaction and • cost reductions, • we will be valued members of that IT organization for a long time to come
As Adam says… • “Case studies where capacity planning actually saved money are hugely exceeded by case studies where unnecessary spending was avoided. • The latter is less news worthy; the former is hard to admit – “I used to be stupid but I’m better now” • Capacity Planning is major performance insurance and, like taking exercise, often only shows its merit when you stop doing it. • ‘Slow time’ is almost as bad as ‘down time’ for the business.” 1
Introduction: • What are the ways that effective capacity planning can save firms money? • A survey of peers found a strong demand for a list of ideas, but not a lot of ideas themselves • The goal of this paper then becomes clear • Categorize and list many of the ways that we can apply capacity planning techniques to lower both IT and total firm costs.
Build Trust • Build trust in management and key, politically powerful technical specialists and user communities in the value of capacity planning • via a continuing commitment to accuracy and innovation, and • share the credit for the success! • “Love all, trust a few, do wrong to none.” ― William Shakespeare, All's Well That Ends Well2
Build Trust • While folks can hear stories like: “Capacity Planning saved their firm $28 million dollars” they need to realize that great teams working well often results in great savings • These great teams require visionary leadership that develops talent into skilled professionals that they can rely on for both accuracy and fairness • Truly great management also helps their geeky IT staff navigate the often tricky political waters to be effective • The best coach their staff to keep the end game in sight
Build Trust • For example, the IT folks may” discover some clearly (to them) wasteful resource consumption due to a politically powerful application • Our goal becomes getting application expert Bob to look at and repair his program logic
Build Trust • The smart manager might know from experience that this group does not react well to being shown as less than perfect publicly, and instead helps the IT folks change their initial message from • “Look at the incredibly wasteful way you are wasting CPU in your daily processing Bob!” • to a much more effective • “Hey Bob, I am having trouble understanding this application’s resource consumption. Could you please help?
Build Trust • Simply by ceding the expertise high ground to Bob, you will get his assistance quicker • He looked at the code on the morning of the 22nd
Build Trust Key tool:The occasional “Thank you letter” to their boss • To get Bob’s, or any other expert’s assistance for the long term, it often helps to make sure that they get the credit for the savings that result from their “fixes” • Send them, and their management, a “thank you letter • ideally with before and after graphics that highlight the savings that result from their fixes
Build Trust • Used judiciously and not every time, this can be particularly powerful when dealing with contract staff, as proof of their value is particularly treasured • The next time that you ask them to look into something, you won’t believe how much faster and easier it happens • Often the difference between a bunch of angry squabbling IT groups and smooth professionals is a few coaching sessions like that from smart management
Reduce Production Outages • Investigate IT outages, and use your tools to reduce their frequency or remove them • If it weren’t for the importance of the political in getting IT done well, this one would have been first on my list • Reducing shutdown frequencies, ideally to zero, reduces machine shutdowns and the risks to employee safety that might result from a large production machine suddenly stopping due to IT problems. Nothing is more important than your co-workers in the plant having a safe day and going home to help their kids with their homework
Reduce Production Outages • Nothing is more important than your co-workers in the plant having a safe day and going home to help their kids with their homework • The business often has another way to value it • An IT outage that shuts down a diaper mill for an hour was calculated to cost the firm $50,000 an hour, due to lost production
Reduce Production Outages • When you discover that 16 mill outages last year were due to disks filling up, • and you use your capacity planning data to take a few hours to write a “full disk forecasting” system,
Reduce Production Outages • and take those outages down to zero the next year, the business will notice, valuing the improvement at close to a half a million dollars per year • The diaper mills are great examples, but you don’t have to be an old line smokestack firm to take advantage of this idea • A peer at an internet payment processor says that they value down time at a half million per hour3. • Think of the payback for looking into outage reduction there
Reduce Production Outages • Key tool:Think about production outages that your IT systems might cause, and ways that you could see them coming, and act in advance to reduce outage length or possibly avoid the outages entirely • Stories of these real savings will be heard at the highest levels of IT management, and that isn’t bad for your careers
Use Advanced Tools • Effectively use network queuing theory modeling tools and/or statistical profiles to reduce business risks and reduce hardware costs • “Prediction is very difficult, especially if it's about the future. “ ~ Niels Bohr4
Use Advanced Tools • There are so many ways to do this. Here is a short list of savings opportunities: • Right size hardware purchases by modeling production hardware needs from functioning development machines • From experience, I can tell you that the first time that you introduce this idea in a firm, they can be quite skeptical • After a few times when you predict the exact hardware needed to give great application performance and happy users at minimal cost, and on time, smart management will quickly see the light • Soon they will be asking you to resize all sorts of legacy applications
Use Advanced Tools • Buy only development and QA machines needed to develop functioning code and test functions • It is astonishing how many people think that only full size hardware and full production volumes can make valid tests for IT systems • They will drone on for hours about various labor intensive and expensive load creation software tools, declaring that using it is the only possible way to know that software will work at scale
Use Advanced Tools • This is simply incorrect • There are many experts at CMG conferences who can show you that a sample on a small development machine with a realistic transaction mix that accurately depicts a mere fraction of the intended production load can be used to accurately size full size production hardware with (Ron’s opinion) … high accuracy
Adam Weighs In… • …adequate accuracy. It requires no more skill than the skill required to ensure that a workload generator environment is not being skewed by cached files accesses in a load test • Ron tells some stories…
Adam Weighs In… • He is right to mention the legion of errors that the “load test” crowd are prone to • I always found it amazing that I have to justify modeling software with books of research, but some clown testing software that the firm will rely on every day can’t come up with more than a few hundred test data records • Then, the tests run like lightning because that tiny amount of data gets cached into memory in any semi-modern database • but the application goes much more slowly when millions of records in normal usage render the cache ineffective
Use Advanced Tools • When you add up all the hardware, software license and labor costs saved by avoiding full scale load tests and add on other benefits, like • speedy analysis of alternative assumptions effects on application performance, • the savings on even smaller projects quickly get into digits that make management smile
Use Advanced Tools • When sizing production systems, test not only what hardware will meet planned production needs, but also what the next few bottlenecks will be and when • Great network queuing theory based tools have to create rather complete views of what each workload consumes • Once you have that, analyzing what would happen if the situation changes is rather straightforward • Imagine the happy times when your products are introduced to new overseas markets and demand grows quickly
Use Advanced Tools • Network queuing theory based tools will always be less risky than guessing, and • building your firm’s growth plans based on scientifically accurate analysis will be more defensible to the board of directors too • Don’t you see all the smiling managers in your mind’s eye?
Use Advanced Tools • When debugging production system performance problems, what some of my vendor pals refer to as “triage” use the timing of production slowdowns and workload characterized views to quickly pinpoint the real causes of user issues • It is only after you gain organizational credibility from successfully sizing some applications that management will be more supportive of this process • Ron tells some stories
Use Advanced Tools • When that happens, be prepared for a deluge of requests asking you to look at performance issues on their systems • With some good workloads and a good queuing theory based tool set, there is nothing more rewarding than a call with a project leader’s application users • discussing times when the application felt “slow” and pointing out exactly why and exactly what to change to fix it
Use Advanced Tools • Some of our vendor partners take the next step to predict application performance out into the future, based on business provided growth estimates • Well, to be honest, it is also nice when the application owners take you to dinner to celebrate it too. Some of these rewards can surprise you • Over the years I have made a lot of folks very happy by discovering and removing bottlenecks that improved the performance of their applications
Use Advanced Tools • I have received considerable positive regard, often in the form of thoughtful gifts from internal and external customers • One of my favorites was when a nice lady baked me cookies and sent them to my home, just because she could work so much faster • Sure I smiled while I ate the cookies (which were great, believe me) but her manager smiled even more at the increased productivity
Use Advanced Tools • Forecasting multiple applications’ performance into the future is a tool that management will use to prioritize and shift resources to keep the business users happy • “The most reliable way to forecast the future is to try to understand the present.” ~ John Naisbitt6
Use Advanced Tools • Any tool that gives accurate advance warning of application performance in changing business conditions will be treasured by management immediately • Tools that automate the production of this information free the busy capacity planning team up for more application analysis
Use Advanced Tools • Key tool: When you effectively use network queuing theory based tools to shift the focus from • how slow the application machines are to • what changes are needed to remove each roadblock, • you will have improved the process maturity level of your work environment from a reactive victim to a proactive analytic process that can not only put out the fires, but also prevent them from starting
Use Advanced Tools • As all expert capacity planners know, almost 90% of the time, the last remaining bottlenecks will end up being slower physical storage • This is because semi-permanent long term data storage devices, which are getting better all of the time, are still a lot more expensive than CPUs and volatile memory.
Use Advanced Tools • Certainly folks with clouds in their eyes tout cheap and gigantic storage that will be available, • but the problem with all data storage is not only purchase cost, but also service time, i.e. the time that it takes to get from the storage to the machine that needs those IOs • A giant cheap cloud storage solution that is in the next city or country will still be really slow IOs, because of all the network travel and switching times
Some IO ideas to try • Get closer • If you have a virtual environment like VMware, you can test these ideas for yourself, and possibly really speed up some applications • Most firms have multiple data centers, spread out to be on separate power grids for redundancy and safety • Imagine your classic distributed systems architecture of a primary and backup database server, and also primary and secondary application servers, and maybe some Citrix servers that let far flung users appear to be on the application servers and run faster • We have hundreds of applications like this, spread over virtual machines all over our datacenters
Some IO ideas to try • When we examined many of our application’s performance on similarly sized CPU machines, we noticed that some were noticeably faster than others • Nothing gets folk’s attention like a production system running the same software as a development system that is 4 times faster • What we found was that faster VMware applications tended to have the databases, applications and user presence machines are as physically close as possible, • ideally on the same backplane
Some IO ideas to try • We tested the idea out by “VMotioning” all the machines in several applications onto the exact same VMware hardware • There is nothing closer than the database and application machines on the exact same hardware • Without exception, the applications were 6 to 8 times faster when placed on the same physical hardware
Some IO ideas to try • Of course you still want to maintain data safety and keep data copies in several data centers for safety, • but it is important to run a production application that needs great speed in the same boxes, or as close as you can • If you are even smarter, you will connect the data storage as close as possible to those same VMware hosts • With this idea, and some easy VMotions, you can really speed things up
Some IO ideas to try • Now do you know why data stored in a cloud doesn’t really get me too excited? • They could store photos that you might want only once in a while, • …and you are content to rely on others to ensure efficiency, privacy, integrity and security),7 • …but when a production plant needs speed, keep things close • Are you still reading this paper, • or have you already run off to try it?
Some IO ideas to try • Again, be sure to maintain all of the redundancy and safety needed for business continuity, • but that is often as simple as keeping primary application processing in one data center and all copies and redundancy in another • Then you are still safe, but you get “same frame” speeds
Some IO ideas to try • Get smaller • Let’s face it, the application folks are all afraid to delete old useless data • They would rather delete shareholder profits than delete data • The number of excess cycles expended to index and get past all that old data to process today’s makes most capacity planners want to scream • All the “get closer” efforts in the world can’t win against big application sloth
Some IO ideas to try • “Moore’s Law refers to the growth of hardware (originally the number of effective transistors in a chip) as more or less doubling every year or two (or 18 months) • in an exponential growth that has been true for almost 50 years, whether applied to chips, memory, disk or other hardware items • Parkinson’s Law has been true for even longer, where work expands to fill the time available • or software developer’s ideas for exploiting available resources grow even faster.”8
Some IO ideas to try • Many of us have tried, repeatedly, to get boated applications to delete, purge, compress and/or summarize data • In most firms, most of us fail • When big application groups have influence in the organization, they usually exercise it to set priorities • Guess what priority data management usually gets?
Some IO ideas to try • Granted, there is the occasional forward thinking leader who plans a viable delete strategy from the beginning, during the design phase and gets the application group to proactively address this bloat • You tend to find this smart folk in high volume or trading environments, where experience has proven that managing data bloat in high transaction environments is a key to success
Some IO ideas to try • So what do you do to deal with the data pigs in your non-trading firm? Usually, putting a dollar cost on the bloat and telling management, you can get some movement. If you’re forward thinking management would devote a fraction of those cost estimates into bonuses for successfully shrinking things, it is amazing how “priorities” change
Some IO ideas to try • So what do you do to deal with the data pigs in your non-trading firm? • Usually, putting a dollar cost on the bloat and telling management, you can get some movement • If you’re forward thinking management would devote a fraction of those cost estimates into bonuses for successfully shrinking things, it is amazing how “priorities” change
Some IO ideas to try • Embarrassing remnants of ideas long past, they run on for years, • usually consuming even more resources due to bloat as time passes. • Sure some might take a while to rise high enough in in the priority to displace some current need, • but often some are not judged to hurt enough yet to stop