320 likes | 403 Views
Lecture Two. Data Centre’s Government & Maintenance Work & People Organisation. Changes Governance (1/5). … then you better start swimmin ' or you'll sink like a stone for the times they are a- changin ‘ … A Data Centre is a living thing, experiencing continual changes
E N D
Lecture Two Data Centre’s Government & Maintenance Work & People Organisation
Changes Governance (1/5) … then you better start swimmin'or you'll sink like a stonefor the times they are a-changin‘ … • A Data Centre is a living thing, experiencing continual changes • A good Data Center’s Government requires to forecast the changes, to analyze their impact, to plan and control the corrective and complying actions, to verify the results … … then you better start swimmin‘ …
Changes Governance (2/5) “… Dad, my PC doesn’t respond ! …” • 80% of service interruption is caused by operator error or poor change control (Gartner) What did you change?
Changes Governance (3/5) • Changes may concern all the Data Centre components (building, hardware, software, people, …) and may be originated by internal or external reasons • As an early summary classification we may distinguish between “ordinary” and “extraordinary” changes • A main difference between these categories lies in the approach to face them: while the ordinary changes are generally managed through well-defined and consolidated procedures, for the extraordinary changes must often be established an “ad hoc” project
Changes Governance (4/5) • Common causes for “ordinary” changes are: • Users’ requests • Legislation requirements • Technological innovations • Actions for budget control • Accidents and mistakes • Ordinary changes are very frequent (hourly/daily) and their life-cycle is generally medium-short (hours to few weeks). They impact limited components of the Data Centre. Their management involves few resources with a medium-low effort
Changes Governance (5/5) • Examples of causes for “extraordinary” changes are: • Great technical or regulatory transformations • Wide company reorganizations • Site relocations and consolidations • Big and unpredicted accidents (“disasters”) • Extraordinary changes are sporadic and their life-cycle is certainly long (many months to years). Their impact usually crosses all the Data Centre components. Their management involves many resources and requires a huge effort. These resources are generally organized as a specific project-team, with a dedicated leader
Organising the Work (2/11) • Service continuity must be always protected … • … so changes must be tested in a similar, but separated “environment”. Common Data Centre environments:
Organising the Work (3/11) Development Environment • It’s a “laboratory” where new changes are designed and developed. • This environment is generally used for software changes and – more usually – for application software changes. However a development environment may be used for system software changes as well. It’s extremely rare to use it for hardware changes. • The environment is geared with the tools used by the technicians to produce and modify the software. It usually contains a library (“Repository”) where all the versions of the software are stored: the old, current and underdeveloped ones. • In the smaller Data Centers the development environment is often joined with the test environment
Organising the Work (4/11) Test Environment (1/3) • This environment is used to test the changes built in the development environment • The changes must be tested to verify that: • They fit the purposes they were designed for. • They do not generate problems. • To test the changes, the changed components (usually software) must work in similar conditions as they work in the production environment: so the test environment is required to be “similar” to the production one
Organising the Work (5/11) Test Environment (2/3) • The test environment is usually just similar (not “equal”) to the production one for economical reasons. To duplicate the production environment only to test the changes should be extremely expensive and isn’t usually necessary. For example to test a new software for a bank’s cash-dispenser network with 1.500 devices, it’s enough to set up a test network with 4-5 devices (better if including all the different used models of the production network). • Isn’t rare to find different “parallel” environments to test different changes at the same time.
Organising the Work (6/11) Test Environment (3/3) • The data are one of the main topics to be studied during a test environment design • As a start we could think that the best solution is to run the test with a perfect copy of the production data. However this choice is subjected to three shortcomings: • Cost: often the amount of the production data is excessive for test purpose • Security: some production data are confidential and must not be accessed by the technicians running the test • Reliability: sometimes the set of the true data is a “subset” of all the possible data. So some theoretical possible occurrences are not tested
Organising the Work (7/11) Trial Environment (1/2) • The trial is a sort of “test PLUS” environment. Its purpose is a “dry run” of the changes, that’s the last complete test of the system before its delivery in the production environment. • The main characteristic of a trial vs. a test environment is its stronger affinity to the production environment: it’s a requirement to guarantee the test effectiveness. As an example, “stronger affinity” means a latest copy of the production environment (a test environment may have been generated not much recently). Furthermore in a trial environment may be present characteristics missing in a test one: an example is the presence of security systems usually “disabled” in the test environment, with the aim to speed the test runs.
Organising the Work (8/11) Trial Environment (2/2) • Another usual characteristic of a trial environment (not always present in a test environment) is the capacity to simulate the production “workload”. Specific tools are available that can stress the systems generating “transaction flows” comparable to the true workload (from the volumes and from the statistical distribution as well points of view) • In the smaller Data Centers the trial environment is often missing and the last run before the delivery is usually done in the test environment
Organising the Work (9/11) Production Environment • It’s the environment where the true services are delivered to the true users • Its main characteristic must be a perfect isolation from the other environments (development, test and trial), if present. Usually indeed the other environments are much less protected and reliable and if the isolation is not enough confident the production environment may be somehow effected by the problems occurring elsewhere • The best isolation is achieved using two completely distinct Data Centers: one for production and the second one for development, test and trial together. However less expensive and anyway working solutions may be designed using distinct hardware in the same site, or even distinct virtual environments on the same hardware
Organising the Work (10/11) The Software Lifecycle (1/2) • The “Lifecycle” of the software is characterized by some typical phases: • Design • Development • Test • Delivery and possible deploy • Errors correction and functional changes • Disuse • Usually a software is delivered in different “releases” and the phases follow cyclically, release after realease
Organising the Work (11/11) The Software Lifecycle (2/2) • Replacing the actual release of a software with the new one, is often important to choose between two approaches: “phased” vs. “big-bang” delivery. • Consider: • Release preparation time • Concurrent changes • Interactions with other internal/external systems • Test complexity • Is the date your own choice? (… hardly ever !) • Phased approach is generally less “painful” but requires more work PREP-1 PREP-2 PREP
Organising the People (1/8) • The People teams working in a Data Centre are typically organized with the following structure:
Organising the People (2/8) Applications • They deal with the lifecycle of the Application Software • It’s usually possible to distinguish two kinds of figures: • Analysts: who analyze the users requests and design the general characteristics of the software to be built. They choose the software functionalities and its technical general architecture as well (the tools to be used, the structure of the modules, etc.) • Programmers: who, following the general design depicted by the analysts, “write the code” • Usually, in a medium-great organization, the Applications “division” is structured in two or more “departments”, one for each “Applications Family” (as an example, for a bank, it’s usual to find the departments “Accounts”, “Financial”, “Loans”, “Web-banking”, etc.)
Organising the People (3/8) Systems (1/2) • The people working in this division deals with the “Systems”, i.e. hardware, system software, network. In a medium-great organization the division is usually structured in three “departments”: • “Software” and “Hardware” usually deal with “not-network” SW & HW (i.e. computers, storage, etc.), while “Network” deals with both HW & SW for network. That’s because network components are each other more tightly linked than not-network ones.
Organising the People (4/8) Systems (2/2) • Each department, mainly in big organizations, may be structured in smaller high-specialized teams (for example Software people may be organized in teams dealing with operating systems, data base systems, middleware, etc.) • In greater organizations it’s usually present a team dedicated to “Peripheral Systems”. Sometimes it’s located inside the Network department, sometimes not. It’s dealing with systems out of the Data Centre (i.e. personal computers, “branch servers”, etc.) • For the Systems specialists too – just like for the Applications ones – it’s usually possible to distinguish between System Analysts (dealing with the general structure of the systems they manage) and System Programmers (with more technical and operational skills)
Organising the People (5/8) Operations (1/2) • While Applications and Systems divisions deal with design, development and maintenance of the Data Centre components, Operations division is responsible for its day-by-day functioning. • The Operation division is responsible for the “Service Levels” negotiated with the users, in terms of service time, performance, problem resolution times, etc. • Because of this responsibilities, the Operation division must be the “only and absolute owner” of the production environment. No other else can apply any change to production components without the Operation division authorization.
Organising the People (6/8) Operations (2/2) • The Operations division too, in medium-great organizations, is often structured in smaller teams. Usually: • Computer Room: dealing with systems and applications starting, stopping and properly working. This team is usually working 7H24 • Storage: dealing with data maintenance and data recovery • NOC: or “Network Operation Centre” dealing with network components functioning • Help Desk: responsible for the communications between the users and the Data Centre. The Help Desk phone number must be the only one dialed by the users to notify malfunctions or other problems
Organising the People (7/8) Staff (1/2) • The “Staff” is not always present (but it is, for sure, in the grater organizations) and represents one or more teams with miscellaneous tasks. These tasks have two characteristics: • they concern the whole Data Centre (i.e. they’re crossing more or all the components or functions) • The Data Centre Management must have full and direct view and control over them (and that’s the reason why the Staff teams are directly subordinated to the management) • Each Staff team is generally very thin, composed by two or three professionals extremely skilled in the matter they deal with
Organising the People (8/8) Staff (2/2) • Some Staff teams usually (even not always) present are: • Security: dealing with the physical and logical Security Systems, users authentication and authorizations, etc. When present, this team usually deals with Disaster Recovery systems and procedures as well • Procurement: dealing with all the procurement life-cycle, including the costs budget preparation, the negotiation with the suppliers (sometimes by means of specific invitations to tender), the contracts stipulation and control, the payments supervision, etc. • Standards and Documentation: is a team responsible to set, maintain and document all the “working rules” about the Data Centre functioning. For example: what are the responsibilities of each team in each division, what technical architectures and tools are eligible as “Standard”, what are the “naming conventions” for all the Data Centre components, etc.
Data Centres actually … … a few numbers about environments … • The case of a medium-great Italian P.A. Data Centre … and an example of extraordinary project … • 2 Firms merge: Application unification and Site consolidation
Environments in a medium-great Italian P.A. Data Centre (1/3) • The Site:
Environments in a medium-great Italian P.A. Data Centre (2/3) • The Hardware:
Environments in a medium-great Italian P.A. Data Centre (3/3) • The Environments: 614 virtual environments
2 Firms merge: Application unification and Site consolidation – (1/3) Application unification: • From 2 different Application Systems to an unified one • “Application System” unification means “Application Software” + “System Software” unification • Usually the unified system is x% of Firm-A system + y% of Firm-B system + z% brand-new Site consolidation: • From 2 different Sites to an unified one • Usually the unified site is the Firm-A or the Firm-B site; very infrequent a third brand-new site
2 Firms merge: Application unification and Site consolidation – (2/3) Strategies (PRO/CON): (A) appl.unif. & then site cons. VS (B) site cons. & then appl.unif.
2 Firms merge: Application unification and Site consolidation – (3/3) Consider: • HW & SW equipment in peripheral branches (Application Unification usually requires mass upgrade or substitution with long time consuming processes) • People education, both in the Data Centers and in peripheral branches: the latter may require many months • Reconversion of one of the original sites as a Disaster Recovery site or as a development/test site (or both)