Tuesday, July 27, 2010

Choose Your DR Solutions in Cloud computing

After you’ve set your RTOs and RPOs, it is time to choose the specific solution
that will protect each of your applications and data. Your selection criteria
should include the following:
• The total cost of ownership. Cost is always a factor, but as stated
above, be sure to consider the total cost of ownership rather than
just the sticker price. A product that requires considerable daily
manual maintenance, or that is a significant resource hog, may end up
costing significantly more over its lifetime than one that is almost fully
automated and resource-efficient, even if the former comes with a lower
purchase price.
• The level of RTO and RPO supported. The difference in the speed
and/or completeness of data and process recovery typically varies
greatly from one class of DR solution to another. However, even within
a solution class, these measures vary somewhat among the products
available in the marketplace. Because, at this point, you will have an
estimate of the costs of downtime and lost data, you will be in a good
position to evaluate whether a price premium (if any) will be justified by
the faster recovery times and/or more complete protection of your data

Performance. Every DR solution will consume some resources, and
resources aren’t free. Thus consider the performance characteristics
of the solutions you’re evaluating. How stingy are they in their use of
resources?
• Reliability. Obviously, you want all of your technologies and processes
to be reliable, but probably none more so than your DR solution. A trait
of an unreliable DR solution that is not shared with other unreliable
technology is that you may not discover the problem until it’s too late
to do anything about it. After a disaster has destroyed your production
data and applications, there is no fix that you can apply if your DR
solution failed to create complete and accurate backups.
You can get first-hand knowledge of the reliability of a product only
after you’ve bought it, used it extensively and tested it thoroughly.
Unfortunately, you don’t have that first-hand experience when you’re
evaluating products that you are considering buying. Consequently, as
part of the evaluation process, ask for references and then follow up
by asking the reference accounts about the reliability of the products
under consideration.
• Vendor Support. Check out the level and hours of support offered by
the vendor. If you need some assistance from the vendor while you
desperately struggle to recover a lost system, you don’t want to learn
only then that, for example, the vendor provides support solely via
email. And if you use your systems 24x7 and one of them is destroyed
at 6:00 a.m. Saturday, if you have a question about your recovery
operations you shouldn’t have to wait until 9:00 a.m. Monday morning
for the vendor’s support team to get back to work after the weekend.
The vendor’s support team should be available to you whenever you
need their help.
• Vendor Stability. All products need ongoing maintenance and
enhancements. For example, DR solutions may have to be upgraded to
support new operating system versions as they become available. And
as new DR techniques and technologies are developed, you will want
your DR solution provider to incorporate them into their products so you
can reap the rewards.
Vendors can provide this ongoing support and maintenance only if they
are still in business. Thus, it is important consider the vendor’s stability
as a factor in your product evaluation. Does the vendor have a track
record of stability? Is it profitable? Is it well financed? If the answer to
any of these questions is “no,” proceed with caution.

Wednesday, July 21, 2010

What Disaster Recovery as a Service (RaaS) Means for Enterprise

One of the primary benefits of a business continuity (BC) or disaster recovery (DR) solution is the ability to continue working through unexpected outages.

Organizations that have experienced downtime due to a mail server outage or UPS failure that brings down its entire SAN are well aware of the value of investing in technology that keeps operations online and active during these types of failures (and many others!). There are, however, other benefits inherent in DR/BC solutions that are frequently overlooked and in some cases provide even better ROI and value to an organization. “Anytime” maintenance windows, the ability to test application upgrades, and the opportunity to confirm that your environment is operating as effectively and efficiently as possible are ancillary benefits that in some cases trump the recovery event all together.

The emergence of the cloud deployment model and its “killer” Recovery as a Service (RaaS) application has made it possible for organizations of all shapes and sizes to capitalize on these benefits.

Enterprise and mid-market companies often rely on backend databases or customer-written applications that can control anything from assembly lines to core information distribution systems. Invariably, these systems are frequently patched or updated. Companies that make it possible for their IT departments, development teams or core operations group to test new versions, patches, upgrades and modifications to these key infrastructures and systems without touching the production environments risk no downtime and deliver so much value that it’s hard to even put a price tag on it.

Progressive companies are adopting the Recovery as a Service model not only to protect their systems from unscheduled outages, but to test changes to these systems in a “sandbox” type of environment without risking any downtown or interruption to their key applications.

I have personally seen more “sandbox” roll-backs at customers then I can remember and all because the patch being applied simply did not work properly. These “sandbox” patches were applied during the day without any interruption to the ongoing operations, allowing development or IT to evaluate stability and usability without taking anything offline. And think of all those weekend and overtime hours that were avoided (and the related lifestyle improvement for key IT employees)!

A quick case study: A hospital, which is arguably the best example of a 24/7 operation, relies on a database system to deliver the proper food trays to thousands of patients every day of the week, 3-4 times a day. When they need to upgrade their database system or perform maintenance what can they do? What precautions can they take to ensure the upgrade will not effect the current distributions in place as well as the upcoming distributions? By utilizing RaaS, this hospital was able to test all upgrades, patches, etc. in a sandbox environment in the Cloud, in the middle of the day with zero interruption to their existing system. Its IT team also could work with confidence, knowing it could easily roll-back any changes that were made that didn’t deliver the required result.

The cloud facilitates the establishment of a complete replica environment through Recovery as a Service technology. No longer do organizations need to establish offsite data center locations, invest in substantial hardware, duplicate licenses, and build a complete test environment that can be utilized for sandbox-like testing. With the cloud and RaaS, organizations are able to effortlessly establish offsite DR presences that not only protect them and their key technologies from going down, but also provide a complete sandbox for the maintenance window, application upgrade and/or testing processes. All of this can be done at economies of scale and with financial investment that is shockingly low (that same hospital, with 500 employees could RaaS its database server for roughly 3 cents/day/employee).

Enterprises that are contemplating testing out the cloud are well advised to put their collective toes in the water (or maybe the sky) by adopting RaaS, which is a no-risk, high-value solution that can deliver more comfort the next time you are about to click the upgrade button….

RPO and RTO problems with Tape

The Trouble with Tape
Tape backup can provide for the long-term archival needs of the virtual servers; however tape cannot provide the level of recoverability
required for critical business applications. Rebuilding one application from tape can be a difficult and lengthy process. Recovering four or more
applications at the same time from tape to rebuild one physical server will result in an excessive period of downtime, likely more than the
business can afford.
Organizations may not understand how vulnerable their data and business remain to disaster - even after they've made a huge up-front and
ongoing investment in tape-based disaster recovery. An article in SearchSecurity reports that in a survey of 500 IT departments, as many as
20% of routine nightly backups fail to capture all data. Among participants of another survey cited in this article, 40% of IT managers were
unable to recover data from a tape when they needed it.1 This is a significant concern for corporations that are regulated as they can face
the risk of being out of compliance if they cannot produce required data when they need it

Tape backup also places limits on your recovery point objective (RPO), the point in time to which you can recover your systems should disaster
strike. Periodic tape backup guarantees hours of lost data in the event of a disaster. Suppose, for example, that a critical system fails anytime
today; the best you can do is recover to yesterday's data, which will be at least twelve hours old. The later in the day disaster strikes, the older
the data from which you'll recover. In addition, recovering from a disaster, any data not backed up is lost for good - unless you recreate it

The cost of permanently lost data is high and includes the cost of the revenue that the data represents, the business value you can extract
from it, and the cost to recreate it. Consider:
● How much money would your agency lose if you lost all your transaction data for the last twelve hours, or even the last ten minutes?
● What is the value of the knowledge contained in your agency's last twelve hours worth of e-mails and e-mail attachments? What would it
cost to have your engineers recreate the last twelve hours worth of original or edited CAD/CAM drawings?
In The Cost of Lost Data, a Pepperdine University report updated in 2003 - before the advent of Sarbanes-Oxley - Dr. David Smith estimates
the average cost of irrecoverably lost data at more than $10,000 per megabyte lost.2 But if the data lost is business transaction data or data
that's especially expensive to reproduce and key to your company's disaster recovery plan, your costs could be much, much higher.

The Cost of Downtime
When a large-scale disaster strikes, with tape backup you're out of business until you can restore your systems and your data from your tapes.
This kind of restoration takes a minimum of several hours, and can easily take days or even weeks.
Gartner Group estimates that the average cost of network downtime for larger corporations is $42,000 per hour; Contingency Planning
Research pegs the average hourly downtime costs for many businesses at roughly $18,000. The key to a successful disaster recovery plan is to
focus not just on the data (RPO) but also on the applications that end users run to gain access to that data. Recovery Time Objective (RTO) is
generally defined as the amount of time it takes to regain access to business-critical data. Solutions like tape backup, which have an RTO of
hours or days, don't provide the level of recoverability that most companies require.
Full system rebuilds and tape restores are unacceptable recovery methods for meeting the RPO/RTO of mission critical applications and leave
organizations vulnerable to lengthy recovery times and potential data loss. Architecting for maximum availability throughout various types of
outages presents a challenge that can be solved through a combination of real-time replication, application availability and virtualization
technology. Leveraging virtualization along with high availability storage solutions and data protection software like Double-Take can help
businesses economize

Replication-Based Technologies

Replication-based technologies offer the promise of capturing a data set at a particular point in time with minimal overhead required to
capture the data or to restore it later. There are four main methods of interest in today's storage environments:
● Whole-file replication copies files in their entirety. This is normally done as part of a scheduled or batch process since files copied
while their owning applications are open will not be copied properly. The most prevalent use of this technology is for login scripts or other
files that don't change frequently.
● Application replication copies a specific application's data. The implementation method (and general usefulness) of this method
varies dramatically based on the feature set of the application, the demands of the application and the way in which replication is
implemented. This model is almost exclusively implemented for database-type applications.
● Hardware replication copies data from one logical volume to another and copying is typically done by the storage unit
controller. Normally, replication occurs when data is written to the original volume. The controller writes the same data to the original
volume and the replication target at the same time. This replication is usually synchronous, meaning that the I/O operation isn't considered
complete until the data has been written to all destination volumes. Hardware replication is most often performed between storage devices
attached to a single storage controller, making it poorly suited to replicating data over long distances. Most hardware replication is built out
of SAN-type storage or proprietary NAS filers.
● Software replication integrates with the Windows® operating system to copy data by capturing file changes as they pass
to the file system. The copied changes are queued and sent to a second server while the original file operation is processed normally
without impact to application performance. Protected volumes may be on the same server, separate servers on a LAN, connected via
storage-area network (SAN), or across a wide-area network. As long as the network infrastructure being used can accommodate the rate of
data change, there is no restriction on the distance between source and target. The result is cost-effective data protection.

To best understand how to protect data, it's important to consider what the data is being protected from. Evaluating the usefulness of
replication for particular conditions requires us to examine four separate scenarios in which replication might lead to better business
continuity:
● Loss of a single resource - In this scenario, a single important resource fails or is interrupted. For example, losing the web server that
end-users use for product ordering would cripple any agency that depends on electronic procurement. Likewise, many agencies would be
seriously affected by the loss of one of their primary e-mail servers. For these cases, some agencies will investigate fault-tolerant
architectures, don't invest in fault-tolerance technology for file and print servers-even though the failure of a single file server may
simultaneously prevent several departments' employees from accessing their data. Planning for this case usually revolves around providing
improved availability and failover for the production resources.
● Loss of an entire facility - In this scenario, entire facilities, and all of their resources, are unavailable. This can happen as the result of
natural disasters, extended power outages, failure of the facility's environmental conditioning systems, and persistent loss of
communications or terrorist acts. For many agencies, the normal response to the loss of a facility is to initiate a disaster recovery plan and
resume operations at another physical site.
● Loss of user data files - This unfortunately common scenario involves the accidental or intentional loss of important data files. The most
common mitigation is to restore the lost data from a backup, but this normally involves going back to the previous RPO - often with data loss.
● Planned outages for maintenance or migration - The goal of planned maintenance or migrations is usually to restore or repair
service in a way that's transparent to the end users.

Top 6 Disaster Recovery Software Tools

Information technology is at the core of almost every organization today. The computer data is one of the invaluable assets for a company. Any computer related disaster can result in irreversible losses for the company. To avoid such disasters out of the blue most companies have a disaster recovery planning as a part of a business continuity planning. This generally include planning for resumption of applications, data, hardware, communications (such as networking) and other IT infrastructure. Even for mid sized companies the system downtime ranges anywhere between $5,000 per hour to over $15,000 per minute. To over this its important for every organization to plan a disaster recovery using with the help of disaster recovery tool. This is my bit of effort to inform about the top 6 disaster recovery software tools that would allow the efficient disasters recovery. Disaster recovery tools are essentially a part of Disaster Recovery Planning(DRP). The DRP documents chalk out the plan of action prior to, during and following a disaster. The DRP helps a business to minimize its losses caused by a system crash and helps it to recover from a disaster in the shortest possible time by identifying critical systems, processes and methods for restoring the processes.
1. Acronis True Image Echo Server for Windows

acronis

This is an enterprise class disaster solution to back up Windows and Linux servers. This tool enables disaster recovery as well as complete system restoration to both an existing system or a new system with different hardware and virtual server.

Link
2. UltraBac's Image based disaster recovery
ultrabac

This image based disaster recovery technology works by taking scheduled snapshots of one or more disk partitions. These images are replica image of the partition frozen at a scheduled time. It ensures a good backup for the files which are open and in use. With the help of this disaster recovery tool the failed machine can be restored using minimum tool. UltraBac offers two versions of disaster recovery tool -UBDR Pro for small to medium businesses and UBDR Gold for larger environments.

Link
3. Living Disaster Recovery Planning System (LDRPS)ldrps

This is a business continuity software designed to offer disaster recovery planning. The LDRPS can also be hosted as Software as a Service (SaaS) solution that hosts Strohl Systems applications in the data center. Some of the key features in the LDRPS are customizable best-practices-based plan navigators, customizable reports, dependency maps and location resource management.

Link
4. LBL ContingencyPro Software
lbl

This is a web-based browser software tool that provides the best practices for business continuity planning. It also includes hundreds of electronic tools guides, templates, and samples. This tool offers a proven methodology to recover from events of disaster.

Link
5. TAMP DRS (Disaster recovery tool)

tamp-drs

This tool creates and distributes business contingency plans that includes disaster recovery. It allows the user to manage and roll up documents, developmental plans, inventory lists, spreadsheets,graphics and flowcharts into one plan. It is completely functional in a disaster afflicted environment.

Link
6. BIA Professional

strohlsystem

This is a disaster recovery tool from Strohl Systems that provides the guidelines for developing Business Impact Analysis(BIA). It figures out the financial and operational vulnerabilities, as well as the disaster-related impacts, and possible strategies for disaster recovery .

Link

managed services in Disaster Recovery using Cloud computing

TCS managed services include:
• Dedicated "Cloud" Hosting
• Shared Infrastructure "Cloud" Hosting
• Remote Infrastructure Monitoring and Management
• Disaster Recovery & Business Continuity
• 24x7 Help Desk
• Application Management
• Enterprise Outsourcing
• IT Professional Services
Benefits:
The credibility of commercial and government clients lends credence to the power and benefits of TCS's services. Working with TCS:
• Reduces total operating costs: Alleviates the need to provide in-house IT capability and avoids the costly capital investment of doing so.
• Increases IT longevity: Our time-proven methods and services provide long-term, sustainable results.
• Enhances flexibility: Our solutions are adaptable and scalable-they accommodate changing processes and needs.
• Introduces refinement and efficiency: TCS integrates off-the-shelf products and/or customized applications that augment clients' core operations and allows efficiency in managing additional client platforms.
• Provides access to a deep knowledge-base: TCS has subject matter experts in diverse industry-leading technologies. We continuously evaluate enterprise technology solutions, and only adopt those that provide overall performance and value.
• Improves IT risk management: Improves security by placing the development of IT solutions in the hands of experienced security experts.
• Sharpens focus: Allows organizations to concentrate on their core business functions.
• Utilizes best-in-class processes: Our longevity has led to the development of refined processes that best serve clients. One example: our software delivery lifecycle process framework. This tailored approach incorporates an iterative, business-process-driven methodology that delivers and enhances software applications over several phases to allow solutions that continue to evolve over time to meet clients' changing needs.
• Provides assurance: TCS's 24/7 support, along with its proven and reliable technologies and sustainable methodologies, are there when you need them-any time, any day.
View our client case studies for examples of our managed services at work.

________________________________________
Dedicated "Cloud" Hosting
By delivering enterprise-class computing and support in a "private cloud," TCS offers best-in-class dedicated infrastructure functionality and performance without having clients worry about supporting the underlying hardware or software or concern about technology refresh/replacement costs. Whether the solution includes the entire suite of hardware, software, facilities and support provided as a flat monthly fee (Infrastructure as a Service) or management of client owned infrastructure, TCS can support either or within a hybrid delivery model.
TCS IT Management Services keep business applications, databases, storage devices, and network systems running reliably and securely. They optimally deploy, monitor, and manage core infrastructure through standardization, virtualization, monitoring, automation, and ITIL-based best practices. Services include:
• Managed Infrastructure as a Service (IaaS): Application and Database Hardware/ Software, Data Center Facilities, Core network and security infrastructure, Storage
• Application Hosting
• Nightly and off site backups
• Diverse carrier access
• Disaster Recovery
• Virtualization
Shared Infrastructure "Cloud" Hosting
To reduce total cost of ownership and minimize carbon footprint, TCS multi-tenant managed services allows clients to decouple processing and storage capabilities from physical servers so they can boost server utilization while minimizing power consumption and space requirements. Clients receive best in class hardware, software and support bundled at a low fixed month fee, while making available technology experts on a 24x7 basis.

High levels of security, reliability, performance, customization and service are inherent and clients can scale quickly, while optimizing hardware resources and achieving "pay as you go" billing for services. Hardware, software, skilled technology resources and facilities are bundled at a low fixed monthly fee, allowing business owners to accurately predict IT operations cost while allowing them to focus on their core business. Services include:
• Application and Database Servers
• Custom built Virtual Servers
• Storage Area Network (SAN)/Network Attached Storage (NAS)
• Nightly and off site backups
• Core network infrastructure - Switching, Routers, Firewall, Intrusion Detection Systems, Security Log Analysis
• Software as a Service (SaaS)
• Unified Communications/Voice over IP
• Disaster Recovery/Business Continuity
• Virtual Desktop
• Messaging/Collaboration
• Diverse carrier access
Remote Infrastructure Monitoring and Management (RIMM)
TCS delivers 24x7 remote infrastructure monitoring and management. TCS helps many global enterprises cut down the costs of infrastructure monitoring and management while gaining access to expert skill sets. The EOC utilizes industry leading monitoring and management tools to proactively manage remote client environments including, but not limited to servers, applications, network infrastructure, and storage appliances. System alerts and events are analyzed and remediated 24x7 so client resources can focus on their core business functions. TCS remote proactive monitoring service helps to maximize performance, reduce mean time to respond, and decrease downtime. Services include:
• Redundant 24x7 Enterprise Operations Center (NOC)
• Servers
• Virtualized Environments
• Network Infrastructure - Hardware, Carrier management, WAN Acceleration
• Storage Appliances
• Security Systems
• Applications
Disaster Recovery & Business Continuity
TCS designs and implements full disaster recovery and business continuity solutions, with a focus to provide the infrastructure necessary for businesses to restore their critical IT systems and processes after a disaster. Many organizations have more than one disaster recovery strategy in place because different business processes have different costs and service level agreements. For many organizations, the availability of a low-cost, managed disaster recovery facility enables a recovery site strategy without the associated costs of a dedicated secondary site and operational expenses.

TCS offers a range of disaster recovery and business continuity managed services customized to cost-effectively meet the individual requirements of organizations. While there are many disaster recovery models offered commercially, they generally fall into three categories: hot sites, warm sites and cold sites.
24x7 Help Desk
Clients leveraging TCS's Managed Help Desk realize full time resources for ticketing, third party escalation, end user PC support and application support. Scaling upon demand, TCS Help Desk pricing models are offered on a per call or unlimited basis. Services include:
• Redundant facilities
• Service level guarantees
• Remote PC/desktop management tools to assist end users in real time, avoiding the need to deploy resources
• Ticketing platform, tracking all events followed by comprehensive monthly reporting.
• Secure customer portal
• Per call or volume discount pricing
• Comprehensive event resolution or third-party escalation
Application Management
Many companies lack the resources to cover all competencies along the application lifecycle. TCS offers clients the choice and flexibility of support for specific applications or for the entire application portfolio, including custom, off-the-shelf and enterprise solutions (i.e. PeopleSoft, Oracle, Citrix, MS Exchange, Cisco VoIP, etc). We specialize in the evolution, operation, and maintenance of mission-critical applications. The goal is to optimize client application investments: going beyond cost savings by deploying efficient process and methodologies.

TCS's comprehensive approach delivers significant improvements in operations, risk mitigation, and tangible performance enhancements including a reduction of support costs by 15% to 40%. Clients achieve transparency and increased response KPIs, thereby providing the confidence and flexibility to free budgets, redirect high performing staff, and pursue strategic initiatives. Through proven processes, tools and expertise, clients are able to optimize their application portfolio, resulting in seamless operation and dependable performance. Services include:
• Change Management (Service Desk, Production Support, and User Support)
• Application Maintenance
• Enhancements and Upgrades
• Application Functional Support
• Application Report Enhancement
• Application Updates/ Patching
• Custom Development
Enterprise Outsourcing Services
For organizations considering comprehensive management of systems, business processes, staffing as well as leadership, TCS offers enterprise outsourcing programs on a global scale. TCS tailors each engagement to meet specific client business needs by carefully considering size and complexity. TCS leverages its broad spectrum of dedicated/shared managed services, technology experts and program management methodologies to provide a solution that is focused on client objectives while seeking to reduce overall program operating costs. "On-demand" business service engagements can include an entire function or discrete activities within multiple business functions.

TCS has successfully executed wide-scale projects that achieved standardized, repeatable processes under one integrated governance structure. Organizations capitalize on an outsourcing methodology that enables better administration, management and measurement of business KPIs. Savings from enterprise outsourcing result from:
• Standardized Operations
• Simplified Governance
• Lowered Risks through Operational Efficiencies
• Reduced Redundancy Costs
• Cost Maintenance and Budgeting
• Mapping IT to Performance Matrixes
• Consolidated Vendor Management
• Improved Workforce Performance through Proven Process Design
• Customers served more effectively
IT Professional Services
TCS's IT consulting services address aspects of reducing cost, increasing agility and enabling IT system transformation. TCS offers focused solutions over a diverse number of infrastructure environments and leverages its proven IT infrastructure assessment tools and methodologies to design solutions that are closely aligned to the client's business strategy.

TCS specialists deliver solutions that focus on the dependencies between business and technology objectives. Whether clients require assistance assessing IT application architecture, managing a complex wide area network implementation, or upgrading legacy platforms, TCS can help. All IT professional services are overseen by TCS's award winning project management.
• Project and Program Management
• Enterprise Strategic Planning
• Business Continuity and Disaster Recovery Planning
• IT Audits and Assessments
• Wide/ Local Area Network Architecture & Design
• System upgrade and integration
• Short and long-term on site skilled resources
• ERP Planning and Strategy
• Security Assessments & Remediation

Q&A: Business Continuity and Disaster Recovery

nterprise Systems: What is business continuity?

Rich Schiesser: Business continuity is a program of plans and activities that ensures critical business processes can be resumed within agreed-upon time frames if a sustained outage occurs. The agreed-upon time frames are referred to as recovery time objectives (RTOs), and the agreed amounts of associated data to be restored are called the recovery point objectives (RPOs). The RTOs and the RPOs are determined from a business impact analysis.

How does business continuity differ from disaster recovery?

Disaster recovery (DR) had its origins in the 1970s and referred to the recovery of a company’s IT infrastructure in general and its IT data center in particular. Business continuity (BC) had its origins in the 1990s and emphasized the continuity of all critical business operations across the entire enterprise, not just IT. Where IT tends to be reactive, BC is more proactive. DR focuses on technical recovery, whereas BC focuses on business recovery.

DR involves mostly technicians, whereas BC involves mostly business users. Finally, DR is usually part of IT with no specific career path or certifications as part of it. BC can be a part of risk management or an entity on its own and has widely accepted career paths and certifications.

What is a business impact analysis (BIA)? Who prepares it and what does it cover?


A business impact analysis is an enterprise-wide activity in which the effect of prolonged outages to business processes is determined. The purpose of a BIA is to identify and prioritize the most critical business processes in terms of the amount of time a process can be idled before significant business impact is felt. For some processes the amount of allowable time down might be only minutes or a few hours, whereas for others it might be days.

These estimated times are called recovery time objectives and are closely related to the point at which data must be restored (recovery point objectives) to support the recovered business process. The results of the BIA, RTOs, and RPOs are combined with risk management to determine appropriate recovery strategies.

A BIA is usually prepared by a group of business continuity planners from within an organization or by outside consultants. A full BIA covers all major departments of an enterprise, including core competencies, finance, administration, and IT.


What role does IT play in preparing the BIA? What is the business users' role?

IT plays a major role in preparing a BIA. Most business processes today depend on various IT services to operate. This means that if a major disaster disrupts a business process, the IT services that support the business process must first be restored before the business process can be recovered. If a process needs to be recovered in 4 hours, the IT services supporting it, and the associated data, might need to be recovered in 3 hours.

IT’s main role in a BIA is to determine the feasibility and costs of recovering IT services in time to meet the RTOs and RPOs of the business processes. Sometimes the RTOs need to be extended because the cause of the IT recovery can be prohibitive. Another role of IT in a BIA is to identify the IT dependencies that a particular IT service might have. These dependencies could influence the feasibility and cost of recovery.

The main role of the business users in a BIA is to identify their critical business processes and dependencies and to estimate how long a process can be down before significant impact occurs. Impacts can be financial or legal (among other categories) and need to be quantified by the users.

What tools are available to help an enterprise create the BIA? If created in-house, what are the steps in preparing it? Who's on the team? What expertise is needed?

Three of the major disaster recovery service providers are IBM, HP, and SunGard. Each of these provides software tools that help an enterprise create a BIA. Last year, SunGard acquired Strohl Software Systems, which had developed one of the premiere tools of this type called BIA Pro that, among other features, has Web interfaces. A few other vendors also supply BIA tools that give users a variety of alternatives based on function, cost, and ease of use.

An in-house created BIA consists of five major steps:


Step 1: Acquire executive support to ensure appropriate priority and resources are dedicated to the effort. Included in this step is a clear agreement as to the objectives and scope of the effort. This step needs to occur regardless of the BIA created in-house or by outside consultants.

Step 2: Develop a questionnaire and interview form for planners to use in gathering data about processes from users.

Step 3: Schedule and conduct the interviews with users to determine RTOs, RPOs, and dependencies.

Step 4: Analyze the results and prioritize all processes across the enterprise.

Step 5: Compile the final report and present recommendations and costs of recovery strategies.

Business continuity planners, business user sponsors, and IT recovery specialists are usually on the BIA team. Excellent analytical and communication skills and knowledge of business and technical recoveries are the types of expertise needed for this effort.


What is the impact of the BIA on business continuity?

The BIA has significant impact on business continuity. A properly conducted BIA determines the viability and costs of recovering within reasonable time frames for most types of calamities. The BIA helps prioritize business processes for recovery and identify the dependent processes and IT services needed for restoration.

What is meant by risk management?

Risk management involves three major steps: identification, analysis, and recommendation:

Step 1: Identify the threats (causes of major outages) and vulnerabilities (probabilities of the causes occurring) an organization has to the stability of its operations. This is sometimes called a risk assessment.

Step 2: Analyze the levels of threats and vulnerabilities, and propose countermeasures (and their costs) to these exposures. This is often referred to as risk analysis.

Step 3: Weigh the costs and benefits of implementing these countermeasures and recommend and implement appropriate responses.

For each risk, one of three actions is typically taken: the risk is either eliminated, ignored, or mitigated.

The combination of risk assessment, risk analysis, and proposing and implementing recommendations is collectively referred to as risk management.

What role does risk management play in business continuity?

Risk management, in collaboration with the BIA, helps to determine appropriate recovery strategies for business continuity. Understanding the threats and vulnerabilities an organization has for normal business operations can help minimize these exposures by implementing cost-effective countermeasures.

How are recovery strategies generated?

Recovery strategies are generated by compiling the results of the BIA and the risk assessments and risk analysis. This compilation should identify the appropriate recovery strategies needed to meet the agreed-upon RTOs and RPOs. For example, if a business processes has an agreed to RTO of four hours, the recovery strategy must be such that all dependent processes and IT services are recovered in less than four hours to ensure the primary business process is operational within the four-hour RTO.


What types of testing are performed, and how often should they be done?

There are three types of testing performed in support of business continuity: verification, simulation, and operational.

A verification test updates the factual contents of a business continuity plan. These contents include current participants, their contact information, call trees, hardware model numbers, software versions and releases, and other types of data that is likely to change over relatively short periods of time. A verification test should be done once every three to six months depending on the dynamics of the environment.

A simulation test, sometimes called a table-top exercise, consists of assembling the business continuity planners, recovery team members, appropriate business users, and other participants in a single room to act out the response to a simulated disaster. The purpose is to validate the accuracy, sequence, and dependencies of the recovery steps. Simulation tests should be performed once every 6 to 12 months.

In an operational test, critical business processes and the IT services that support them are stopped as if a major calamity had rendered them inoperable. IT services and business processes are restored at a designated recovery site. The purpose is to confirm the viability of restoring all critical processes, and to compare the actual recovery times and recovery points to the RTOs and RPOs.

What are the biggest mistakes enterprises make in their business continuity plans?

The three biggest mistakes involve participants, dependencies, and testing. Organizations sometimes involve only technical participants in developing technical recovery plans instead of including business users to address the recovery of business processes. Both groups need to participate collaboratively as a team to ensure the business continuity plan covers both the business and technical aspects of recovery.


Another frequent mistake companies make is to omit the dependencies that many business processes and IT services require to make them operational. If a particular IT service needs to be recovered within four hours and it depends on two other services to function, then the two other dependent services need to be recovered at the same time.

The last mistake is failing to test the plans. So much effort is often spent on developing the plans that there is little time or few resources left over to actually plan and conduct testing. Validation, simulation, and operational testing should be conducted approximately every three, six, and 12 months, respectively. Seldom are these done.

What best practices can you recommend to avoid these mistakes?

The best practice to avoid the mistake of improper participation is to ensure the effort to develop business continuity plans has the executive support from both the business community and IT. This support is critical to ensuring both groups collaborate as a team to develop the most comprehensive recovery plan possible.

The best practice for identifying dependencies is to thoroughly review every recovery step with several pairs of eyes to ensure all input and output dependencies are identified. The best practice for testing is to establish a schedule by which validation, simulation, and operational testing is conducted approximately every 3, 6, and 12 months, respectively.

Can You Leverage Cloud Services For Disaster Recovery?

IT is great at some things, but out of its league in many cases. Business continuity planning is an example of the latter: No matter how well we set up our applications and systems, the human element is always a roadblock. Sure, we can build a complex system to return our CRM system to operation in Duluth, but will anyone be able to use it? Even the best disaster recovery (DR) infrastructure is useless without a business continuity (BC) strategy for everything else.

All IT can offer is to do its best to hold up its side of the deal. IT can design systems with return-to-operations in mind, replicating data and documenting configurations. IT can deploy remote systems and keep them warm and ready should we need them. And IT can create operational plans to rapidly get everything working when disaster strikes.

Although technology alone cannot solve the BC/DR conundrum, new technical solutions to help close the gap do occasionally appear. Data replication was one such key technology, as was server virtualization. Cloud computing will soon be added to the BC/DR hot list.

What do cloud computing and cloud storage services offer to help DR?

  1. Cloud resources are inherently flexible, giving needed capacity on demand. This is especially important for compute resources, since BC operations often have unpredictable usage spikes as systems come online and resume operations.
  2. Cloud resources scale based on usage, reducing the expense when there is no disaster. This is one of the main reasons companies don't invest in disaster recovery capacity: It's so expensive on a daily basis "just" to be prepared!
  3. Cloud resources are available anywhere. Rather than trying to keep displaced employees in close proximity to technology, public cloud systems can be used from anywhere during a disaster.

Legal Requirements for Disaster Recovery Planning in Cloud computing

o Who, Legally, MUST Plan?

With the caveats above, let’s cover a few of the common laws where there is a duty to have a disaster recovery plan. I will try to include the basis for that requirement, where there is an implied mandate to do so, and what the difference is between the two

Banks and Financial Institutions MUST Have a Plan

The Federal Financial Institutions Examination Council (Council) was established on March 10, 1979, pursuant to Title X of the Financial Institutions Regulatory and Interest Rate Control Act of 1978 (FIRA), Public Law 95-630. In 1989, Title XI of the Financial Institutions Reform, Recovery and Enforcement Act of 1989 (FIRREA) established the Examination Council (the Council).

The Council is a formal interagency body empowered to prescribe uniform principles, standards, and report forms for the federal examination of financial institutions by the Board of Governors of the Federal Reserve System (FRB), the Federal Deposit Insurance Corporation (FDIC), the National Credit Union Administration (NCUA), the Office of the Comptroller of the Currency (OCC), and the Office of Thrift Supervision (OTS); and to make recommendations to promote uniformity in the supervision of financial institutions. In other words, every bank, savings and loan, credit union, and other financial institution is governed by the principles adopted by the Council.

In March of 2003, the Council released its Business Continuity Planning handbook designed to provide guidance and examination procedures for examiners in evaluating financial institution and service provider risk-management processes.

Stockbrokers MUST Have a Plan

The National Association of Securities Dealers (NASD) has adopted rules that require all its members to have business continuity plans. The NASD oversees the activities of more than 5,100 brokerage firms, approximately 130,800 branch offices and more than 658,770 registered securities representatives.

As of June 14, 2004, the rules apply to all NASD member firms. The requirements, which are specified in Rule 3510, begin with the following:

3510. Business Continuity Plans. (a) Each member must create and maintain a written business continuity plan identifying procedures relating to an emergency or significant business disruption. Such procedures must be reasonably designed to enable the member to meet its existing obligations to customers. In addition, such procedures must address the member’s existing relationships with other broker-dealers and counter-parties. The business continuity plan must be made available promptly upon request to NASD staff.

Electric Utilities WILL Need a Plan

The disaster recovery function relating to the electric utility grid is presently undergoing a change. Prior to 2005, the Federal Energy Regulatory Commission (FERC) could only coordinate volunteer efforts between utilities. This has changed with the adoption of Title XII of the Energy Policy Act of 2005 (16 U.S.C. 824o). That new law authorizes the FERC to create an Electric Reliability Organization (ERO).

The ERO will have the capability to adopt and enforce reliability standards for "all users, owners, and operators of the bulk power system" in the United States. At this time, FERC is in the process of finalizing the rules for the creation of the ERO. Once the ERO is created, it will begin the process of establishing reliability standards.

It is very safe to assume that the ERO will adopt standards for service restoration and disaster recovery, particularly after such widespread disasters as Hurricane Katrina.

Telecommunications Utilities SHOULD Have Plans, but MIGHT NOT

Telecommunications utilities are governed on the federal level by the Federal Communications Commission (FCC) for interstate services and by state Public Utility Commissions (PUCs) for services within the state.

The FCC has created the Network Reliability and Interoperability Council (NRIC). The role of the NRIC is to develop recommendations for the FCC and the telecommunications industry to "insure [sic] optimal reliability, security, interoperability and interconnectivity of, and accessibility to, public communications networks and the internet." The NRIC members are senior representatives of providers and users of telecommunications services and products, including telecommunications carriers, the satellite, cable television, wireless and computer industries, trade associations, labor and consumer representatives, manufacturers, research organizations, and government-related organizations.

There is no explicit provision that we could find that says telecommunications carriers must have a Disaster Recovery Plan. As I have stated frequently in this series of articles on disaster recovery, however, telecommunications facilities are tempting targets for terrorism. I have not changed my mind in that regard and urge caution.

You might also want to consider what the liability of a telephone company is if it does have a disaster that causes loss to your organization. In three words: It’s not much. The following is the statement used in most telephone company tariffs with regard to its liability:

The Telephone Company’s liability, if any, for its gross negligence or willful misconduct is not limited by this tariff. With respect to any other claim or suit, by a customer or any others, for damages arising out of mistakes, omissions, interruptions, delays or errors, or defects in transmission occurring in the course of furnishing services hereunder, the Telephone Company’s liability, if any, shall not exceed an amount equivalent to the proportionate charge to the customer for the period of service during which such mistake, omission, interruption, delay, error or defect in transmission or service occurs and continues. (Source, General Exchange Tariff for major carrier)

All Health Care Providers WILL Need a Disaster Recovery Plan

HIPAA is an acronym for the Health Insurance Portability and Accountability Act of 1996, Public Law 104-191, which amended the Internal Revenue Service Code of 1986. Also known as the Kennedy-Kassebaum Act, the Act includes a section, Title II, entitled Administrative Simplification, requiring "Improved efficiency in healthcare delivery by standardizing electronic data interchange, and protection of confidentiality and security of health data through setting and enforcing standards."

The legislation called upon the Department of Health and Human Services (HHS) to publish new rules that will ensure security standards protecting the confidentiality and integrity of "individually identifiable health information," past, present, or future.

The final Security Rule was published by HHS on February 20, 2003 and provides for a uniform level of protection of all health information that is housed or transmitted electronically and that pertains to an individual.

The Security Rule requires covered entities to ensure the confidentiality, integrity, and availability of all electronic protected health information (ePHI) that the covered entity creates, receives, maintains, or transmits. It also requires entities to protect against any reasonably anticipated threats or hazards to the security or integrity of ePHI, protect against any reasonably anticipated uses or disclosures of such information that are not permitted or required by the Privacy Rule, and ensure compliance by their workforce.

Required safeguards include application of appropriate policies and procedures, safeguarding physical access to ePHI, and ensuring that technical security measures are in place to protect networks, computers and other electronic devices.

Companies with More than 10 Employees

The United States Department of Labor has adopted numerous rules and regulations in regard to workplace safety as part of the Occupational Safety and Health Act. For example, 29 USC 654 specifically requires:

(a) Each employer:

(1) shall furnish to each of his employees employment and a place of employment which are free from recognized hazards that are causing or are likely to cause death or serious physical harm to his employees;

(2) shall comply with occupational safety and health standards promulgated under this Act.

(b) Each employee shall comply with occupational safety and health standards and all rules, regulations, and orders issued pursuant to this Act which are applicable to his own actions and conduct.

Other Considerations or Expensive Research Topics for Lawyers (Sorry, Eddie!)

  • The Foreign Corrupt Practices Act of 1977
  • Internal Revenue Service (IRS) Law for Protecting Taxpayer Information
  • Food and Drug Administration (FDA) Mandated Requirements
  • Homeland Security and Terrorist Prevention
  • Pandemic (Bird Flu) Prevention
  • ISO 9000 Certification
  • Requirements for Radio and TV Broadcasters
  • Contract Obligations to Customers
  • Document Protection and Retention Laws
  • Personal Identity Theft...and MORE!

Suffice it to say you will need to check with your legal department for specific requirements in your business and industry!

Windows Server 2003 Disaster Recovery - Cloud computing

Planning for High Availability

Windows Server Disaster Recovery Planning can be a chore, but if you have the details and a plan, it can go smooth to setup, and will be a life saver when your systems start to smoke, and your VP’s are knocking on your office door asking what the heck is going on! In this section we will look at how to plan for High Availability.

Taking the time to plan and design is the key to your success, and it’s not only the design, but also the study efforts you put in. I always joke with my administrators and tell them they’re doctors of technology. I say, “When you become a doctor, you’re expected to be a professional and maintain that professionalism by educational growth through constant learning and updating of your skills.” Many IT staff technicians think their job is 9 to 5, with no studying done after hours. I have one word for them: Wrong! You need to treat your profession as if you’re a highly trained surgeon except, instead of working on human life, you’re working on technology. And that’s how planning for High Availability solutions needs to be addressed. You can’t simply wing it and you can’t guess at it. You must be precise, otherwise, your investment goes down the drain – and all the work you put in will be not only useless, but also wasteful.

Plan Your Downtime

You need to achieve as close to 100 percent uptime as possible. You know a 100 percent uptime isn’t realistic, though, and it can never be guaranteed. Breakdowns occur because of disk crashes, power or UPS failure, application problems resulting in system crashes, or any other hardware or software malfunction. So, the next best thing is 99.999 percent, which is still somewhat reasonable with today’s technology. You can also define in a Service Level Agreement (SLA) what 99.999 percent means to both parties. If you promised 99.999 percent uptime to someone for a single year, that translates to a downtime ratio of about five to ten minutes. I would strive for a larger number, one that’s more realistic to scheduled outages and possible disaster-recovery testing performed by your staff. Go for 99.9 percent uptime, which allots for about nine to ten hours of downtime per year. This is more practical and feasible to obtain. Whether providing or receiving such a service, both sides should test planned outages to see if delivery schedules can be met. You can figure this formula by taking the amount of hours in a day (24) and multiplying it by the number of days in the year (365). This equals 8,760 hours in a year. Use the following equation: percent of uptime per year = (8,760 - number of total hours down per year) / 8,760 If you schedule eight hours of downtime per month for maintenance and outages (96 hours total), then you can say the percentage of uptime per year is 8,760 minus 96 divided by 8,760. You can see you’d wind up with about 98.9 percent uptime for your systems. This should be an easy way for you to provide an accurate accounting of your downtime. Remember, you must account for downtime accurately when you plan for high availability. Downtime can be planned or, worse, unexpected. Sources of unexpected downtime include the following:

  • Disk crash or failure
  • Power or UPS failure
  • Application problems resulting in system crashes
  • Any other hardware or software malfunction

Building the Highly Available Solutions’ Plan

Let’s look at the plan to use a Highly Available design in your organization and review the many questions you need to ask before implementing it ‘live’. Remember, if the server is down, people can’t work, and millions of dollars can be lost within hours. The following is a list of what could happen in sequence:

  1. A company uses a server to access an application that accepts orders and does transactions.
  2. The application, when it runs, serves not only the sales staff, but also three other companies who do business-to-business (B2B) transactions. The estimate is, within one hour’s time, the peak money made exceeded 2.5 million dollars.
  3. The server crashes and you don’t have a Highly Availability solution in place. This means no failover, redundancy, or load balancing exists at all. It simply fails.
  4. It takes you (the systems engineer) 5 minutes to be paged, but about 15 minutes to get onsite. You then take 40 minutes to troubleshoot and resolve the problem.
  5. The company’s server is brought back online and connections are reestablished.

Everything appears functional again. The problem was simple this time—a simple application glitch that caused a service to stop and, once restarted, everything was okay. Now, the problem with this whole scenario is this: although it was a true disaster, it was also a simple one. The systems engineer happened to be nearby and was able to diagnose the problem quite quickly. Even better, the problem was a simple fix. This easy problem still took the companies’ shared application down for at least one hour and, if this had been a peak-time period, over 2 million dollars could have been lost. They want to become aware, so the possibility of 2 million in sales evaporating never occurs again. Worse still, the companies you connect to and your own clientele start to lose faith in your ability to serve them. This could also cost you revenue and the possibility of acquiring new clients moving forward. People talk and the uneducated could take this small glitch as a major problem with your company’s people, instead of the technology. Let’s look at this scenario again, except with a Highly Available solution in place:

  1. A company uses a Server to access an application that accepts orders and does transactions
  2. The application, when it runs, serves not only the sales staff, but also three other companies who do business-to-business (B2B) transactions. The estimate is, within one hour’s time, the peak money made exceeded 2.5 million dollars.
  3. The server crashes, but you do have a Highly Available solution in place. (Note, at this point, it doesn’t matter what the solution is. What matters is that you added redundancy into the service.)
  4. Server and application are redundant, so when a glitch takes place, the redundancy spares the application from failing.
  5. Customers are unaffected. Business resumes as normal. Nothing is lost and no downtime is accumulated.
  6. The ‘one hour’ you saved your business in downtime just paid for the entire Highly Available solution you implemented.

Human Resources and Highly Available Solutions

Human Resources (people) need to be trained and work on site to deal with a disaster. They also need to know how to work under fire. As a former United States Marine, I know about the “fog of war,” where you find yourself tired, disoriented, and probably unfocused on the job. These characteristics don’t help your response time with management. In any organization, especially with a system as complex as one that’s highly available, you need the right people to run it.

Managing Your Services

In this section, you see all the factors to consider while designing a Highly Available solution. The following is a list of the main services to remember:

• Service Management is the management of the true components of Highly

Available solutions: the people, the process in place, and the technology needed to create the solution. Keeping this balance to have a truly viable solution is important. Service Management includes the design and deployment phases.

  • Change Management is crucial to the ongoing success of the solution during the production phase. This type of management is used to monitor and log changes on the system.
  • Problem Management addresses the process for Help Desks and Server monitoring.
  • Security Management as discussed in Chapter 7, is tasked to prevent unauthorized penetrations of the system.
  • Performance Management is discussed in greater detail in this chapter. This type of management addresses the overall performance of the service, availability, and reliability. Other main services also exist, but the most important ones are highlighted here. Service management is crucial to the development of your Highly Available solution. You must cater to your customer’s demands for uptime. If you promise it, you better deliver it.

Highly Available System Assessment Ideas

The following is a list of items for you to use during the postproduction-planning phase. Make sure you covered all your bases with this list:

  • Now that you have your solution configured, document it! A lack of documentation will surely spell disaster for you. Documentation isn’t difficult to do, it’s simply tedious, but all that work will pay off in the end if you need it.
  • Train your staff. Make sure your staff has access to a test lab, books to read, and advanced training classes. Go to free seminars to learn more about High Availability. If you can ignore the sales pitch, they’re quite informative.
  • Test your staff with incident response drills and disaster scenarios. Written procedures are important, but live drills are even better to see how your staff responds. Remember, if you have a failure on a system, it could failover to another system, but you must quickly resolve the problem on the first system that failed. You could have the same issue on the other nodes in your cluster and if, that’s the case, you’re on borrowed time. Set up a scenario and test it.
  • Assess your current business climate, so you know what’s expected of your systems at all times. Plan for future capacity especially as you add new applications, and as hardware and traffic increase.
  • Revisit your overall business goals and objectives. Make sure what you intend to do with your high-availability solution is being provided. If you want faster access to the systems, is it, in fact, faster? When you have a problem, is the failover seamless? Are customers affected? You don’t want to implement a high-availability solution and have performance that gets worse. This won’t look good for you!

Do a data-flow analysis on the connections the high availability uses. You’d be surprised that damaged NICs, the wrong drivers, excessive protocols, bottlenecks, mismatched port speeds, and duplex, to name a few problems, have on the system. I’ve made significant differences in networks by simply running an analysis on the data flow on the wire and, through this analysis, have made great speed differences. A good example could be if you had old ISA-based NIC cards that only ran at 10 Mbps. If you plugged your system into a port that uses 100 Mbps, then you will only run at 10, because that’s as fast as the NIC will go. What would happen if the switch port was set to 100 Mbps and not to autonegotiate? This would create a problem because the NIC wouldn’t communicate on the network because of a mismatch in speeds. Issues like this are common on networks and could quite possibly be the reason for poor or no data flow on your network.

  • Monitor the services you consider essential to operation and make sure they’re always up and operational. Never assume a system will run flawlessly unless a change is implemented . . . at times, systems choke up on themselves, either by a hung thread or process. You can use network-monitoring tools like GFI, Tivoli, NetIQ, or Argent’s software solutions to monitor such services.
  • Assess your total cost of ownership (TCO) and see if it was all worth it.

Cost Analysis

Do a final cost analysis to check if you made the right decision. The best way to determine TCO is to go online and use a TOC calculator program that shows you TCO based on your own unique business model. Because, for the most part, all business models will be different, the best way to determine TCO is to run the calculator and figure TCO based on your own personal answers to the calculator’s questions. Here’s an example of a specific one, but many more are available to use online - just run a search in a search engine (like Google.com) on ROI/TCO calculators, and you will see them.

Testing a High Availability System

Now that you have the planning and design fundamentals down, let’s discuss the process of testing your high-availability systems. You need to assure the test is run for a long enough time, so you can get a solid sampling of how the system operates normally without stress (or activity) and how it runs with activity. Then, run a test long enough to obtain a solid baseline, so you know how your systems operate normally on a daily basis. Use that for a comparison during times of activity.

In Sum

This should give you a good running start on advanced planning for high availability, and it gives you many things to check and think about, especially when you’re done with your implementation.

Dictionary - Disaster Recovery

Glossary of Business Continuity & Technology Terms

This glossary defines technical terms and abbreviations used on this site as well as referenced in some of the linked documents. If you do not find the term you are looking for, please refer to one of these additional resources:

Click on any letter below to jump to that section of the glossary

A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z

You may also use your web browser's built-in searching capability to find any word or phrase in this glossary


A
Term Definition
abend Abnormal end of task; the termination of a task before its completion because of an error condition that cannot be resolved by recovery facilities while the task is executing.

Acceptance

Formal agreement that an IT Service, Process, Plan, or other Deliverable is complete, accurate, Reliable and meets its specified Requirements. Acceptance is usually preceded by Evaluation or Testing and is often required before proceeding to the next stage of a Project or Process.

Access Management

The Process responsible for allowing Users to make use of IT Services, data, or other Assets. Access Management helps to protect the Confidentiality, Integrity and Availability of Assets by ensuring that only authorized Users are able to access or modify the Assets. Access Management is sometimes referred to as Rights Management or Identity Management.

Accounting

The Process responsible for identifying actual Costs of delivering IT Services, comparing these with budgeted costs, and managing variance from the Budget.

Accredited

Officially authorized to carry out a Role. For example an Accredited body may be authorized to provide training or to conduct Audits.

Activity

A set of actions designed to achieve a particular result. Activities are usually defined as part of Processes or Plans, and are documented in Procedures.

Agreement

A Document that describes a formal understanding between two or more parties. An Agreement is not legally binding, unless it forms part of a Contract. See also Service Level Agreement, Operational Level Agreement.

ANTASnnn A generic address space identifier that refers to any one of the six Global Mirror for z/Series (XRC) address spaces running concurrently in a single LPAR. ANTAS000 is used solely for communication and control functions while ANTAS001, ANTAS002, ANTAS003, ANTAS004, or ANTAS005 may be active data movers.
Alternate Site A location, other than the normal facility, used to process data and/or conduct critical business functions in the event of a disaster.

Alert

A warning that a threshold has been reached, something has changed, or a Failure has occurred. Alerts are often created and managed by System Management tools and are managed by the Event Management Process.

Application An application is
  1. A particular customer use to which an information processing system is put - for example, a payroll or general ledger application.
  2. A program, set of programs, or software package designed for a particular purpose such as payroll or general ledger.
  3. Software that provides Functions that are required by an IT Service. Each Application may be part of more than one IT Service. An Application runs on one or more Servers or Clients. See also Application Management. (ITIL V3)
An application may be made up of many different types of data, such as multiple database components, data feeds from other applications or other data sources, flat files and electronic transmissions.
Application Consistency Application consistency is the state in which all related data components (databases, flat-files, etc.) are in a transaction-consistent state and, are in synchronization based upon the application design and requirements.

Application Management

The Function responsible for managing Applications throughout their Lifecycle.

Application Recovery A component of Disaster Recovery that deals with the restoration of business system software and data, after the operating system environment has been restored or replaced.

Application Sizing

The Activity responsible for understanding the Resource Requirements needed to support a new Application, or a major Change to an existing Application. Application Sizing helps to ensure that the IT Service can meet its agreed Service Level Targets for Capacity and Performance.

Application system A system made up of one or more host systems that perform the main set of functions for an establishment. This is the system that updates the primary DASD volumes that are being mirrored.

Architecture

The structure of a System or IT Service, including the Relationships of Components to each other and to the environment they are in. Architecture also includes the Standards and Guidelines that guide the design and evolution of the System.

Assessment

Inspection and analysis to check whether a Standard or set of Guidelines is being followed, that Records are accurate, or that Efficiency and Effectiveness targets are being met. See also Audit.

Asset

Any Resource or Capability. Assets of a Service Provider including anything that could contribute to the delivery of a Service. Assets can be one of the following types: Management, Organization, Process, Knowledge, People, Information, Applications, Infrastructure, and Financial Capital.

Asset Management

Asset Management is the Process responsible for tracking and reporting the value and ownership of financial Assets throughout their Lifecycle. Asset Management is part of an overall Service Asset and Configuration Management Process.

Asynchronous copy Any type of copy operation in which the remote copy function copies updates to the secondary volume at some time after the primary volume is updated.

Asynchronous Data Replication

A process for copying data from one source to another while the application processing continues; an acknowledgement of the receipt of data at the copy location is not required for processing to continue. Consequently, the content of databases stored in alternate facilities may differ from those at the original storage site, and copies of data may not contain current information at the time of a disruption in processing as a result of the time (in fractions of a second) required to transmit the data over a communications network to the alternate facility. This technology is typically used to transfer data over greater distances than that allowed with synchronous data replication.

Attribute

A piece of information about a Configuration Item. Examples are: name, location, Version number, and Cost. Attributes of CIs are recorded in the Configuration Management Database (CMDB). See also Relationship.

Audit

Formal inspection and verification to check whether a Standard or set of Guidelines is being followed, that Records are accurate, or that Efficiency and Effectiveness targets are being met. An Audit may be carried out by internal or external groups. See also Certification, Assessment.

Authority Matrix

See RACI.

Automatic Call Distribution

Use of Information Technology to direct an incoming telephone call to the most appropriate person in the shortest possible time. ACD is sometimes called Automated Call Distribution.

Availability

Ability of a Configuration Item or IT Service to perform its agreed Function when required. Availability is determined by Reliability, Maintainability, Serviceability, Performance, and Security. Availability is usually calculated as a percentage. This calculation is often based on Agreed Service Time and Downtime. It is Best Practice to calculate Availability using measurements of the Business output of the IT Service.

Availability Management

The Process responsible for defining, analyzing, Planning, measuring and improving all aspects of the Availability of IT services. Availability Management is responsible for ensuring that all IT Infrastructure, Processes, Tools, Roles, etc. are appropriate for the agreed Service Level Targets for Availability.

Availability Management Information System

A virtual repository of all Availability Management data, usually stored in multiple physical locations. See also Service Knowledge Management System.

B Go To: Top-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition
Backup The process of creating a copy of data to ensure against its accidental loss.
Backup Window A term much more prevalent in the earlier days of Data Processing - typically before 24x7 processing was required - used to describe the period of time that a system is available to perform a dedicated backup process. In order to obtain a consistent backup of application data, normal processing used to be halted periodically (either daily or weekly) to enable backups to be created without the application files being updated.

Balanced Scorecard

A management tool developed by Drs. Robert Kaplan (Harvard Business School) and David Norton. A Balanced Scorecard enables a Strategy to be broken down into Key Performance Indicators. Performance against the KPIs is used to demonstrate how well the Strategy is being achieved. A Balanced Scorecard has four major areas, each of which has a small number of KPIs. The same four areas are considered at different levels of detail throughout the Organization.

Baseline

A Benchmark used as a reference point. For example:

  • An ITSM Baseline can be used as a starting point to measure the effect of a Service Improvement Plan
  • A Performance Baseline can be used to measure changes in Performance over the lifetime of an IT Service
  • A Configuration Management Baseline can be used to enable the IT Infrastructure to be restored to a known Configuration if a Change or Release fails.

Benchmark

The recorded state of something at a specific point in time. A Benchmark can be created for a Configuration, a Process, or any other set of data. For example, a benchmark can be used in:

  • Continual Service Improvement, to establish the current state for managing improvements
  • Capacity Management, to document performance characteristics during normal operations.

See also Benchmarking, Baseline.

Benchmarking

Comparing a Benchmark with a Baseline or with Best Practice. The term Benchmarking is also used to mean creating a series of Benchmarks over time, and comparing the results to measure progress or improvement.

Best Practice

Proven Activities or Processes that have been successfully used by multiple Organizations. ITIL is an example of a collection of Best Practices that have been collected from IT organizations around the world.

Budget

A list of all the money an Organization or Business Unit plans to receive, and plans to pay out, over a specified period of time. See also Budgeting, Planning.

Budgeting

The Activity of predicting and controlling the spending of money. Consists of a periodic negotiation cycle to set future Budgets (usually annual) and the day-to-day monitoring and adjusting of current Budgets.

Business

An overall corporate entity or Organization formed of a number of Business Units. In the context of ITSM, the term Business includes public sector and not-for-profit organizations, as well as companies. An IT Service Provider provides IT Services to a Customer within a Business. The IT Service Provider may be part of the same Business as its Customer (Internal Service Provider), or part of another Business (External Service Provider).

Business Capacity Management

In the context of ITSM, Business Capacity Management is the Activity responsible for understanding future Business Requirements for use in the Capacity Plan. See also Service Capacity Management.

Business Case

Justification for a significant item of expenditure. Includes information about Costs, benefits, options, issues, Risks, and possible problems.

Business Continuity Business Continuity is an extension of disaster recovery, aimed at allowing an organization to continue functioning after (and ideally, during) a disaster, rather than simply being able to recover following a catastrophic event. This is accomplished through the deployment of redundant hardware and software, the use of fault tolerant systems and data replication techniques as well as a solid backup and recovery strategy.

Business Continuity Management

The Business Process responsible for managing Risks that could seriously impact the Business. BCM safeguards the interests of key stakeholders, reputation, brand and value-creating activities. The BCM Process involves reducing Risks to an acceptable level and planning for the recovery of Business Processes should a disruption to the Business occur. BCM sets the Objectives, Scope and Requirements for IT Service Continuity Management.

Business Continuity Plan (BCP)

A comprehensive written plan to maintain or resume business in the event of a disruption. BCP includes both the technology recovery capability (often referred to as disaster recovery) and the business unit(s) recovery capability.

Business Continuity Planning (BCP) An all encompassing term covering both disaster recovery planning and business resumption planning.

Business Continuity Strategy

Comprehensive strategies to recover, resume, and maintain all critical business functions.

Business Customer

A recipient of a product or a Service from the Business. For example, if the Business is a car manufacturer then the Business Customer is someone who buys a car.

Business Impact Analysis

The process of analyzing all business functions and the effect that a specific disaster may have upon them.

BIA is the Activity in Business Continuity Management that identifies Vital Business Functions and their dependencies. These dependencies may include Suppliers, people, other Business Processes, IT Services, etc. BIA defines the recovery requirements for IT Services. These requirements include Recovery Time Objectives, Recovery Point Objectives and minimum Service Level Targets for each IT Service.

Business Objective

The Objective of a Business Process, or of the Business as a whole. Business Objectives support the Business Vision, provide guidance for the IT Strategy, and are often supported by IT Services.

Business Operations

The day-to-day execution, monitoring and management of Business Processes.

Business Perspective

An understanding of the Service Provider and IT Services from the point of view of the Business, and an understanding of the Business from the point of view of the Service Provider.

Business Process

A Process that is owned and carried out by the Business. A Business Process contributes to the delivery of a product or Service to a Business Customer. For example, a retailer may have a purchasing Process that helps to deliver Services to its Business Customers. Many Business Processes rely on IT Services.

Business Recovery Process The common critical path that all companies follow during a recovery effort. There are major nodes along the path which are followed regardless of the organization. The process includes: Immediate response, Environmental restoration or relocation, Functional restoration, Data recovery and synchronization, Restore business functions, and return to normal.

Business Recovery Test / Exercise

An activity that tests an institution’s BCP.

Business Relationship Management

The Process or Function responsible for maintaining a Relationship with the Business. Business Relationship Management usually includes:

  • Managing personal Relationships with Business managers
  • Providing input to Service Portfolio Management
  • Ensuring that the IT Service Provider is satisfying the Business needs of the Customers

This Process has strong links with Service Level Management.

Business Resumption Planning (BRP) The operational piece of business continuity planning.

Business Service

An IT Service that directly supports a Business Process, as opposed to an Infrastructure Service, which is used internally by the IT Service Provider and is not usually visible to the Business.

The term Business Service is also used to mean a Service that is delivered to Business Customers by Business Units. For example, delivery of financial services to Customers of a bank, or goods to the Customers of a retail store. Successful delivery of Business Services often depends on one or more IT Services.

Business Unit

A segment of the Business that has its own Plans, Metrics, income and Costs. Each Business Unit owns Assets and uses these to create value for Customers in the form of goods and Services.

C Go To: Top-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition
Cache A random access electronic storage (memory) in selected storage controls used to retain frequently used data for faster access.

Call

A telephone call to the Service Desk from a User. A Call could result in an Incident or a Service Request being logged.

Capability

The ability of an Organization, person, Process, Application, Configuration Item or IT Service to carry out an Activity. Capabilities are intangible Assets of an Organization. See also Resource.

Capability Maturity Model Integration

Capability Maturity Model® Integration (CMMI) is a process improvement approach developed by the Software Engineering Institute (SEI) of Carnegie Melon University, US. CMMI provides organizations with the essential elements of effective processes. It can be used to guide process improvement across a project, a division, or an entire organization. CMMI helps integrate traditionally separate organizational functions, set process improvement goals and priorities, provide guidance for quality processes, and provide a point of reference for appraising current processes. See www.sei.cmu.edu/cmmi for more information. See also Maturity.

Capacity

The maximum Throughput that a Configuration Item or IT Service can deliver whilst meeting agreed Service Level Targets. For some types of CI, Capacity may be the size or volume, for example a disk drive.

Capacity Management

The Process responsible for ensuring that the Capacity of IT Services and the IT Infrastructure is able to deliver agreed Service Level Targets in a Cost Effective and timely manner. Capacity Management considers all Resources required to deliver the IT Service, and plans for short-, medium- and long-term Business Requirements.

Capacity Management Information System

A virtual repository of all Capacity Management data, usually stored in multiple physical locations. See also Service Knowledge Management System.

Capacity Planning

The Activity within Capacity Management responsible for creating a Capacity Plan.

Capital Expenditure

The cost of purchasing something that will become a financial Asset, for example computer equipment and buildings. The value of the Asset is Depreciated over multiple accounting periods.

CapEx

See Capital Expenditure.

Category

A named group of things that have something in common. Categories are used to group similar things together. For example, Cost Types are used to group similar types of Cost. Incident Categories are used to group similar types of Incident, CI Types are used to group similar types of Configuration Item.

CEC Central electronics complex.

Certification

Issuing a certificate to confirm Compliance to a Standard. Certification includes a formal Audit by an independent and Accredited body. The term Certification is also used to mean awarding a certificate to verify that a person has achieved a qualification.

Central processor complex (CPC) The unit within a cluster that provides the management function for the storage server. It consists of cluster processors, cluster memory, and related logic.
CFIA Component Failure Impact Analysis. A process of analyzing a particular hardware/software configuration to determine the true impact of any individual failed component.

Change

The addition, modification or removal of anything that could have an effect on IT Services. The Scope should include all IT Services, Configuration Items, Processes, Documentation, etc.

Change Advisory Board

A group of people that advises the Change Manager in the Assessment, prioritization and scheduling of Changes. This board is usually made up of representatives from all areas within the IT Service Provider, representatives from the Business and Third Parties such as Suppliers.

Change Management

The Process responsible for controlling the Lifecycle of all Changes. The primary objective of Change Management is to enable beneficial Changes to be made, with minimum disruption to IT Services.

Change Request

See Request for Change.

Change Schedule

A Document that lists all approved Changes and their planned implementation dates. A Change Schedule is sometimes called a Forward Schedule of Change, even though it also contains information about Changes that have already been implemented.

Channel (1) A path along which signals can be sent; for example, data channel and output channel. (2) A functional unit, controlled by the processor, that handles the transfer of data between processor storage and local peripheral equipment.
Channel Interface The circuitry in a storage control that attaches storage paths to a host channel.

Charging

Requiring payment for IT Services. Charging for IT Services is optional, and many Organizations choose to treat their IT Service Provider as a Cost Centre.

Chronological Analysis

A technique used to help identify possible causes of Problems. All available data about the Problem is collected and sorted by date and time to provide a detailed timeline. This can make it possible to identify which Events may have been triggered by others.

CHPID Channel Path ID.

Client

A generic term that means a Customer, the Business or a Business Customer. For example, Client Manager may be used as a synonym for Account Manager.

The term client is also used to mean:

  • A computer that is used directly by a User, for example a PC, Handheld Computer, or Workstation
  • The part of a Client-Server Application that the User directly interfaces with. For example an e-mail Client.
CLIST TSO command list.

Closed

The final Status in the Lifecycle of an Incident, Problem, Change, etc. When the Status is Closed, no further action is taken.

Closure

The act of changing the Status of an Incident, Problem, Change, etc. to Closed.

Cluster See storage cluster
CNT Formerly Computer Network Technology, now known as Brocade. A provider of storage networking solutions including channel extension devices.

COBIT

Control Objectives for Information and related Technology (COBIT) provides guidance and Best Practice for the management of IT Processes. COBIT is published by the IT Governance Institute. See www.isaca.org for more information.

Code of Practice

A Guideline published by a public body or a Standards Organization, such as ISO or BSI. Many Standards consist of a Code of Practice and a Specification. The Code of Practice describes recommended Best Practice.

COLD SITE An alternate facility that is void of any resources or equipment except air-conditioning, raised flooring and power. Equipment and resources must be installed in such a facility to duplicate the critical business functions of an organization.

Compliance

Ensuring that a Standard or set of Guidelines is followed, or that proper, consistent accounting or other practices are being employed.

Component

A general term that is used to mean one part of something more complex. For example, a computer System may be a component of an IT Service, an Application may be a Component of a Release Unit. Components that need to be managed should be Configuration Items.

Component Capacity Management

The Process responsible for understanding the Capacity, Utilization, and Performance of Configuration Items. Data is collected, recorded and analyzed for use in the Capacity Plan. See also Service Capacity Management.

Component Failure Impact Analysis

A technique that helps to identify the impact of CI failure on IT Services. A matrix is created with IT Services on one edge and CIs on the other. This enables the identification of critical CIs (that could cause the failure of multiple IT Services) and of fragile IT Services (that have multiple Single Points of Failure).

Concurrency

A measure of the number of users or processes engaged in the same operation at the same time.

Concurrent Copy An ESS function that increases the availability of data by creating an exact copy of the data concurrent with regular processing. This is accomplished by mirroring disk writes to two separate logical volumes within the ESS.

Confidentiality

A security principle that requires that data should only be accessed by authorized people.

Configuration

A generic term, used to describe a group of Configuration Items that work together to deliver an IT Service, or a recognizable part of an IT Service. Configuration is also used to describe the parameter settings for one or more CIs.

Configuration Item

Any Component that needs to be managed in order to deliver an IT Service. Information about each CI is recorded in a Configuration Record within the Configuration Management System and is maintained throughout its Lifecycle by Configuration Management. CIs are under the control of Change Management. CIs typically include IT Services, hardware, software, buildings, people, and formal documentation such as Process documentation and SLAs.

Configuration Management

The Process responsible for maintaining information about Configuration Items required to deliver an IT Service, including their Relationships. This information is managed throughout the Lifecycle of the CI. Configuration Management is part of an overall Service Asset and Configuration Management Process.

Configuration Management Database

A database used to store Configuration Records throughout their Lifecycle. The Configuration Management System maintains one or more CMDBs, and each CMDB stores Attributes of CIs, and Relationships with other CIs.

Configuration Management System

A set of tools and databases that are used to manage an IT Service Provider’s Configuration data. The CMS also includes information about Incidents, Problems, Known Errors, Changes and Releases; and may contain data about employees, Suppliers, locations, Business Units, Customers and Users. The CMS includes tools for collecting, storing, managing, updating, and presenting data about all Configuration Items and their Relationships. The CMS is maintained by Configuration Management and is used by all IT Service Management Processes. See also Configuration Management Database, Service Knowledge Management System.

Consistency Group A consistency group is a means of creating a consistent point-in-time copy across multiple logical volumes and multiple ESSs. Consistency Groups are supported by FlashCopy Version 2 and Global Mirror for z/Series (XRC) as a means of managing the consistency of dependent writes.
Consistency group time The time, expressed as a primary system time-of-day (TOD) value, to which XRC secondary volumes have been updated. This term was previously referred to as “consistency time”.
Consistent copy A copy of a data entity (for example a set of logical volumes) that contains the contents of the entire data entity from a single instant in time.

Continual Service Improvement

A stage in the Lifecycle of an IT Service and the title of one of the Core ITIL Version 3 publications. Continual Service Improvement is responsible for managing improvements to IT Service Management Processes and IT Services. The Performance of the IT Service Provider is continually measured and improvements are made to Processes, IT Services and IT Infrastructure in order to increase Efficiency, Effectiveness, and Cost Effectiveness. See also Plan–Do–Check–Act.

Contract

A legally binding Agreement between two or more parties.

Control

A means of managing a Risk, ensuring that a Business Objective is achieved, or ensuring that a Process is followed. Example Controls include Policies, Procedures, Roles, RAID, door locks, etc. A control is sometimes called a Countermeasure or safeguard. Control also means to manage the utilization or behavior of a Configuration Item, System or IT Service.

Control data set A data set that contains consistent group information on the secondary volumes and the journal data set. It contains information necessary for recovery operations and acts as the table of contents for the session. The control data set keeps track of data written to secondary volumes, the location of unwritten data in the journal set, and which group to start recovery with.

Control Objectives for Information and related Technology (COBIT)

See COBIT.

Control Perspective

An approach to the management of IT Services, Processes, Functions, Assets, etc. There can be several different Control Perspectives on the same IT Service, Process, etc., allowing different individuals or teams to focus on what is important and relevant to their specific Role. Example Control Perspectives include Reactive and Proactive management within IT Operations, or a Lifecycle view for an Application Project team.

Cost

The amount of money spent on a specific Activity, IT Service, or Business Unit. Costs consist of real cost (money), notional cost such as people’s time, and Depreciation.

Cost Center

A Business Unit or Project to which costs are assigned. A Cost Centre does not charge for Services provided. An IT Service Provider can be run as a Cost Centre or a Profit Centre.

Cost Effectiveness

A measure of the balance between the Effectiveness and Cost of a Service, Process or activity, A Cost Effective Process is one that achieves its Objectives at minimum Cost. See also KPI, Return on Investment, Value for Money.

Cost Management

A general term that is used to refer to Budgeting and Accounting, sometimes used as a synonym for Financial Management.

Countermeasure

Can be used to refer to any type of Control. The term Countermeasure is most often used when referring to measures that increase Resilience, Fault Tolerance or Reliability of an IT Service.

Coupled data mover A function of Coupled Extended Remote Copy (CXRC) that supports several data movers running on independent MVS system images to be logically connected so that all volumes in all sessions are can be time consistent. This allows a SDM configuration to grow to support more volumes than can be supported by a single data mover.
Coupled extended remote copy (CXRC An enhancement of XRC that supports copy operations in large environments that exceed the number of primary volumes that can be supported by a single data mover. Thousands of volumes can be supported by multiple XRC sessions, with coordination between the sessions to ensure that all volumes can be recovered to a consistent point in time.

Course Corrections

Changes made to a Plan or Activity that has already started, to ensure that it will meet its Objectives. Course corrections are made as a result of Monitoring progress.

CRC Errors See Cyclic Redundancy Checking
CRISIS A critical event, which, if not handled in an appropriate manner, may dramatically impact an organization's profitability, reputation, or ability to operate.
CRITICAL FUNCTIONS Business activities or information which could not be interrupted or unavailable for several business days without significantly jeopardizing operation of the organization.
CRITICAL RECORDS Records or documents which, if damaged or destroyed, would cause considerable inconvenience and/or require replacement or recreation at considerable expense.

Critical Success Factor

Something that must happen if a Process, Project, Plan, or IT Service is to succeed. KPIs are used to measure the achievement of each CSV. For example a CSV of ‘protect IT Services when making Changes’ could be measured by KPIs such as ‘percentage reduction of unsuccessful Changes’, ‘percentage reduction in Changes causing Incidents’, etc.

Culture

A set of values that is shared by a group of people, including expectations about how people should behave, their ideas, beliefs, and practices. See also Vision.

Customer

Someone who buys goods or Services. The Customer of an IT Service Provider is the person or group that defines and agrees the Service Level Targets. The term Customers is also sometimes informally used to mean Users, for example ‘this is a Customer-focused Organization’.

CUA Control unit address.
CXRC Coupled extended remote copy.
D Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition
Dark fiber An optical fiber infrastructure using cabling and repeaters for communications. Usually a dedicated fiber link between two sites that is used by one client.
DASD Direct Access Storage Device.

Dashboard

A graphical representation of overall IT Service Performance and Availability. Dashboard images may be updated in real-time, and can also be included in management reports and web pages. Dashboards can be used to support Service Level Management, Event Management or Incident Diagnosis.

Data consistency Data consistency summarizes the validity, accuracy, usability and integrity of related data between applications and across the IT enterprise. This ensures that each user observes a consistent view of the data, including visible changes made by the user's own transactions and transactions of other users or processes.
Data can be considered consistent in one or more various states such as Point in Time, Transaction, and Application consistency.
Data in transit The data on primary system volumes that is being sent to the recovery system for writing to volumes on the recovery system.
Data mover See system data mover.

Data-to-Information-to-Knowledge-to-Wisdom

A way of understanding the relationships between data, information, knowledge, and wisdom. DIKE shows how each of these builds on the others.

Deliverable

Something that must be provided to meet a commitment in a Service Level Agreement or a Contract. Deliverable is also used in a more informal way to mean a planned output of any Process.

Demand Management

Activities that understand and influence Customer demand for Services and the provision of Capacity to meet these demands. At a Strategic level Demand Management can involve analysis of Patterns of Business Activity and User Profiles. At a tactical level it can involve use of Differential Charging to encourage Customers to use IT Services at less busy times. See also Capacity Management.

Deming Cycle

See Plan–Do–Check–Act.

Dependency

The direct or indirect reliance of one Process or Activity on another.

Dependent Write An application I/O that cannot be issued until a previous application I/O has completed.
Please click here for a detailed description of Dependent write.

Deployment

The Activity responsible for movement of new or changed hardware, software, documentation, Process, etc. to the Live Environment. Deployment is part of the Release and Deployment Management Process.

Design

An Activity or Process that identifies Requirements and then defines a solution that is able to meet these Requirements. See also Service Design.

Destage The asynchronous write of new or updated data from cache or nonvolatile storage to disk. The fast write, dual copy, and remote copy functions destage data.

Detection

A stage in the Incident Lifecycle. Detection results in the Incident becoming known to the Service Provider. Detection can be automatic, or can be the result of a user logging an Incident.

Development

The Process responsible for creating or modifying an IT Service or Application. Also used to mean the Role or group that carries out Development work.

Device address Three or four hexadecimal digits that uniquely define a physical I/O device on a channel path in System/390 mode.
Device blocking See Device pacing
Device pacing Device pacing is designed to react to a workload peak against a single volume in such a manner as to constrain update activity for that single volume, while allowing update activity against all other volumes to operate in an unrestricted manner. The SDM is designed to pace the application write I/Os against a single volume at a rate that matches the SDM capability to offload those write I/Os.
Device Support Facilities program (ICKDSF) An IBM utility program used to initialize DASD at installation and perform media maintenance.

Diagnosis

A stage in the Incident and Problem Lifecycles. The purpose of Diagnosis is to identify a Workaround for an Incident or the Root Cause of a Problem.

Disaster In General, defined as any damaging or destructive event that overwhelms available resources, causes serious loss, destruction, hardship, unhappiness, or death.
For IT environments, a disaster is thought of as any event that creates an inability on an organizations part to provide critical business functions for some extended period of time. It may entail the loss of data and processing capability.
Disaster Recovery Recovery after a disaster, such as a fire, earthquake, etc. that destroys or otherwise disables a system. Disaster recovery techniques typically involve restoring data to a second (recovery) system, then using the recovery system in place of the destroyed or disabled application system. See also recovery, backup, and recovery system
Disk Mirroring Disk mirroring is the replication of data on separate disks in real time to ensure continuous availability, currency and accuracy. Disk mirroring can function as a disaster recovery solution by performing the mirroring remotely and over distance. Depending on the technologies used, mirroring can be performed Synchronously, Asynchronously, Semi-synchronously, or point-in-time.
True synchronous mirroring can achieve a Recovery Point Objective of zero lost data. Asynchronous mirroring can achieve a Recovery Point Objective of just a few seconds while the remaining methodologies provide a Recovery Point Objective of a few minutes to perhaps several hours.
Related terms: data mirroring, data replication, file shadowing, journaling.

Document

Information in readable form. A Document may be paper or electronic. For example, a Policy statement, Service Level Agreement, Incident Record, diagram of computer room layout. See also Record.

Downtime

The time when a Configuration Item or IT Service is not Available during its Agreed Service Time. The Availability of an IT Service is often calculated from Agreed Service Time and Downtime.

Driver

Something that influences Strategy, Objectives or Requirements. For example, new legislation or the actions of competitors.

Dual copy A high availability function made possible by the nonvolatile storage in cached IBM storage controls. Dual copy maintains two functionally identical copies of designated DASD volumes in the logical storage subsystem, and automatically updates both copies every time a write operation is issued to the dual copy logical volume.
dump A capture of valuable storage information at the time of an error.
duplex pair A volume comprised of two physical devices within the same or different storage subsystems that are defined as a pair by a dual copy, PPRC, or XRC operation, and are in neither suspended nor pending state. The operation records the same data onto each volume.
DWDM Dense Wavelength Division Multiplexer. A technique used to transmit several independent bit streams over a single fiber link.
E Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition

Economies of scale

The reduction in average Cost that is possible from increasing the usage of an IT Service or Asset.

Effectiveness

A measure of whether the Objectives of a Process, Service or Activity have been achieved. An Effective Process or activity is one that achieves its agreed Objectives. See also KPI.

Efficiency

A measure of whether the right amount of resources has been used to deliver a Process, Service or Activity. An Efficient Process achieves its Objectives with the minimum amount of time, money, people or other resources. See also KPI.

Electronic Vaulting Transfer of data to an offsite storage facility via a communication link rather than via portable media.
Emergency A sudden, unexpected event requiring immediate action due to potential threat to health and safety, the environment, or property.
Emergency Preparedness The discipline which ensures an organization, or community's readiness to respond to an emergency in a coordinated, timely, and effective manner.

Environment

A subset of the IT Infrastructure that is used for a particular purpose. For Example: Live Environment, Test Environment, Build Environment. It is possible for multiple Environments to share a Configuration Item, for example Test and Live Environments may use different partitions on a single mainframe computer. Also used in the term Physical Environment to mean the accommodation, air conditioning, power system, etc.

Environment is also used as a generic term to mean the external conditions that influence or affect something.

ERP Error recovery procedure.

Error

A design flaw or malfunction that causes a Failure of one or more Configuration Items or IT Services. A mistake made by a person or a faulty Process that affects a CI or IT Service is also an Error.

Escalation

An Activity that obtains additional Resources when these are needed to meet Service Level Targets or Customer expectations. Escalation may be needed within any IT Service Management Process, but is most commonly associated with Incident Management, Problem Management and the management of Customer complaints. There are two types of Escalation, Functional Escalation and Hierarchic Escalation.

ESCON Enterprise System Connection. This is a set of IBM products and services that provides a dynamically connected environment within an enterprise.
ESS. Enterprise Storage Server IBM 2105 and 2107 DASD.

eSourcing Capability Model for Service Providers

A framework to help IT Service Providers develop their IT Service Management Capabilities from a Service Sourcing perspective. eSCM–SP was developed by Carnegie Mellon University, US.

Evaluation

The Process responsible for assessing a new or Changed IT Service to ensure that Risks have been managed and to help determine whether to proceed with the Change.

Evaluation is also used to mean comparing an actual Outcome with the intended Outcome, or comparing one alternative with another.

Event

A change of state that has significance for the management of a Configuration Item or IT Service.

The term Event is also used to mean an Alert or notification created by any IT Service, Configuration Item or Monitoring tool. Events typically require IT Operations personnel to take actions, and often lead to Incidents being logged.

Event Management

The Process responsible for managing Events throughout their Lifecycle. Event Management is one of the main Activities of IT Operations.

Expanded Incident Lifecycle

(Availability Management) Detailed stages in the Lifecycle of an Incident. The stages are Detection, Diagnosis, Repair, Recovery, Restoration. The Expanded Incident Lifecycle is used to help understand all contributions to the Impact of Incidents and to Plan how these could be controlled or reduced.

External Customer

A Customer who works for a different Business to the IT Service Provider. See also External Service Provider, Internal Customer.

External Service Provider

An IT Service Provider that is part of a different Organization to its Customer. An IT Service Provider may have both Internal Customers and External Customers.

eXtended Remote Copy (XRC) See Global Mirror for zSeries.
F Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition

Failure

Loss of ability to Operate to Specification, or to deliver the required output. The term Failure may be used when referring to IT Services, Processes, Activities, Configuration Items, etc. A Failure often causes an Incident.

Fault

See Error.

Fault Tolerance

The ability of an IT Service or Configuration Item to continue to Operate correctly after Failure of a Component part. See also Resilience, Countermeasure.

Fault Tree Analysis

A technique that can be used to determine the chain of events that leads to a Problem. Fault Tree Analysis represents a chain of events using Boolean notation in a diagram.

Fiber optic cable A fiber, or bundle of fibers, in a structure built to meet optic, mechanical, and environmental specifications.
File Shadowing The asynchronous duplication of the production database on separate media to ensure data availability, currency and accuracy. File shadowing can be used as a disaster recovery solution if performed remotely, to improve both the recovery time and recovery point objectives. SIMILAR TERMS: Data Replication, Journaling, Disk Mirroring.

Financial Management

The Function and Processes responsible for managing an IT Service Provider’s Budgeting, Accounting and Charging Requirements.

Fit for Purpose

An informal term used to describe a Process, Configuration Item, IT Service, etc. that is capable of meeting its objectives or Service Levels. Being Fit for Purpose requires suitable design, implementation, control and maintenance.

Fixed utility volume A simplex volume assigned by the storage administrator to a logical storage subsystem to serve as an address for XRC to use that volume for XRC functions.
FlashCopy A point-in-time copy services function that can quickly copy data from a source location to a target location.
For a more detailed description, please click on: FlashCopy
Floating utility volume Any volume of a pool of simplex volumes assigned by the storage administrator to a logical storage subsystem to serve as an address for which XRC functions.
Forward Recovery The process of recovering a data base to the point of failure by applying active journal or log data to the current backup files of the data base.

Fulfillment

Performing Activities to meet a need or Requirement. For example, by providing a new IT Service, or meeting a Service Request.

Function

A team or group of people and the tools they use to carry out one or more Processes or Activities. For example the Service Desk.

The term Function also has two other meanings:

  • An intended purpose of a Configuration Item, Person, Team, Process, or IT Service. For example one Function of an e-mail Service may be to store and forward outgoing mails, one Function of a Business Process may be to dispatch goods to Customers.
  • To perform the intended purpose correctly, ‘The computer is Functioning’.
G Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition

Gap Analysis

An Activity that compares two sets of data and identifies the differences. Gap Analysis is commonly used to compare a set of Requirements with actual delivery. See also Benchmarking.

GB Gigabyte.
GDPS Geographically Dispersed Parallel Sysplex.
Geographically Dispersed Parallel Sysplex (GDPS) A multi-site application availability solution that provides the capability to manage remote copy configuration storage subsystems and automate Parallel Sysplex operational tasks. All GDPS functions can be performed from a single point of control, thereby simplifying system resource management. GDPS is designed to minimize and potentially eliminate the impact of any failure or planned site outage.
Gigabyte 1,073,741,824 bytes
Global Copy Previously known as PPRC/XD (Peer-to-Peer Remote Copy Extended Distance). Now known as Global Copy or Global Copy for ESS. This is an asynchronous controller-based disk mirroring solution.
For a more detailed description, please click on: Global Copy
Global Mirror A term describing the functionality of asynchronous mirroring over distance with data consistency. There are two implementations of Global Mirror: Global Mirror for ESS and Global Mirror for z/Series.
For a more detailed description, please click on: Global Mirror
Global Mirror for ESS Was previously known as Asynchronous PPRC. Global Mirror for ESS combines PPRC-Extended Distance functionality with FlashCopy consistency groups to provide time consistent data at the secondary site. This is supported by a master session in the hardware configuration. The master session controls all updates to the secondary's of the PPRC-XD pairs and by issuing FlashCopy commands at the secondary site, the creation of time-consistent tertiary copies, at user specified intervals.
For a more detailed description, please click on: Global Mirror for ESS
Global Mirror for z/Series Was previously known as eXtended Remote Copy (XRC). This is a z/Series asynchronous disk mirroring technique which is effective over any distance. It keeps the data time consistent across multiple ESS (Enterprise Storage Server) or HDS (Hitachi Data Systems) disk subsystems at the recovery site.
XRC functions as a combination of disk (IBM ESS or HDS licensed) Microcode and application code running on a z/Series host and provides a recovery point that is time consistent across multiple disk subsystems.
For a more detailed description, please click on: Global Mirror for z/Series

Governance

Ensuring that Policies and Strategy are actually implemented, and that required Processes are correctly followed. Governance includes defining Roles and responsibilities, measuring and reporting, and taking actions to resolve any issues identified.

Guideline

A Document describing Best Practice, which recommends what should be done. Compliance with a guideline is not normally enforced. See also Standard.

H Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition
Health Insurance Portability and Accountability Act The Health Insurance Portability and Accountability Act (HIPAA) is a U.S. regulation affects the health care industry and regulates how patient information can be used. The regulation also addresses the obligations of healthcare providers and health plans to protect patient health information in both paper and electronic formats.

Help Desk

A point of contact for Users to log Incidents. A Help Desk is usually more technically focused than a Service Desk and does not provide a Single Point of Contact for all interaction. The term Help Desk is often used as a synonym for Service Desk.

High Availability Systems or applications requiring a very high level of reliability and availability. High availability systems typically operate 24x7 and usually require built-in redundancy to minimize the risk of downtime due to hardware and/or telecommunication failures.
HIPAA see Health Insurance Portability and Accountability Act
Hot site An alternate facility that has the equipment and resources to recover the business functions affected by the occurrence of a disaster. Hot-sites may vary in type of facilities offered (such as data processing, communication, or any other critical business functions needing duplication). Location and size of the hot-site will be proportional to the equipment and resources needed.
I Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition
identifier (ID) A sequence of bits or characters that identifies a program, device, storage control, or system.

Impact

A measure of the effect of an Incident, Problem or Change on Business Processes. Impact is often based on how Service Levels will be affected. Impact and Urgency are used to assign Priority.

Incident

An unplanned interruption to an IT Service or reduction in the Quality of an IT Service. Failure of a Configuration Item that has not yet impacted Service is also an Incident. For example Failure of one disk from a mirror set.

Incident Management

The Process responsible for managing the Lifecycle of all Incidents. The primary Objective of Incident Management is to return the IT Service to Customers as quickly as possible.

Incident Record

A Record containing the details of an Incident. Each Incident record documents the Lifecycle of a single Incident.

Indirect Cost

A Cost of providing an IT Service, which cannot be allocated in full to a specific customer. For example, the Cost of providing shared Servers or software licenses. Also known as Overhead.

Information Security Management

The Process that ensures the Confidentiality, Integrity and Availability of an Organization’s Assets, information, data and IT Services. Information Security Management usually forms part of an Organizational approach to Security Management that has a wider scope than the IT Service Provider, and includes handling of paper, building access, phone calls, etc., for the entire Organization.

Information Security Management System

The framework of Policy, Processes, Standards, Guidelines and tools that ensures an Organization can achieve its Information Security Management Objectives.

Information Technology

The use of technology for the storage, communication or processing of information. The technology typically includes computers, telecommunications, Applications and other software. The information may include Business data, voice, images, video, etc. Information Technology is often used to support Business Processes through IT Services.

Integrity

A security principle that ensures data and Configuration Items are modified only by authorized personnel and Activities. Integrity considers all possible causes of modification, including software and hardware Failure, environmental Events, and human intervention.

Internal Customer

A Customer who works for the same Business as the IT Service Provider. See also Internal Service Provider, External Customer.

Internal Service Provider

An IT Service Provider that is part of the same Organization as its Customer. An IT Service Provider may have both Internal Customers and External Customers.

International Organization for Standardization

The International Organization for Standardization (ISO) is the world’s largest developer of Standards. ISO is a non-governmental organization that is a network of the national standards institutes of 156 countries. See www.iso.org for further information about ISO.

International Standards Organization

See International Organization for Standardization.

I/O device An addressable input/output unit, such as a direct access storage device, magnetic tape device, or printer.
IPL Initial program load.

Ishikawa Diagram

A technique that helps a team to identify all the possible causes of a Problem. Originally devised by Kaoru Ishikawa, the output of this technique is a diagram that looks like a fishbone.

ISO 9000

A generic term that refers to a number of international Standards and Guidelines for Quality Management Systems. See www.iso.org for more information. See also International Organization for Standardization.

ISO/IEC 17799

ISO Code of Practice for Information Security Management. See also Standard.

ISO/IEC 20000

ISO Specification and Code of Practice for IT Service Management. ISO/IEC 20000 is aligned with ITIL Best Practice.

ISO/IEC 27001

ISO Specification for Information Security Management. The corresponding Code of Practice is ISO/IEC 17799. See also Standard.

IT Infrastructure

All of the hardware, software, networks, facilities, etc. that are required to develop, Test, deliver, Monitor, Control or support IT Services. The term IT Infrastructure includes all of the Information Technology but not the associated people, Processes and documentation.

IT Operations Management

The Function within an IT Service Provider that performs the daily Activities needed to manage IT Services and the supporting IT Infrastructure. IT Operations Management includes IT Operations Control and Facilities Management.

IT Service

A Service provided to one or more Customers by an IT Service Provider. An IT Service is based on the use of Information Technology and supports the Customer’s Business Processes. An IT Service is made up from a combination of people, Processes and technology and should be defined in a Service Level Agreement.

IT Service Continuity Management

The Process responsible for managing Risks that could seriously affect IT Services. ITSCM ensures that the IT Service Provider can always provide minimum agreed Service Levels, by reducing the Risk to an acceptable level and Planning for the Recovery of IT Services. ITSCM should be designed to support Business Continuity Management.

IT Service Management

The implementation and management of Quality IT Services that meet the needs of the Business. IT Service Management are performed by IT Service Providers through an appropriate mix of people, Process and Information Technology. See also Service Management.

IT Service Management Forum

The IT Service Management Forum is an independent Organization dedicated to promoting a professional approach to IT Service Management. The itSMF is a not-for-profit membership Organization with representation in many countries around the world (itSMF Chapters). The itSMF and its membership contribute to the development of ITIL and associated IT Service Management Standards. See www.itsmf.com for more information.

IT Service Provider

A Service Provider that provides IT Services to Internal Customers or External Customers.

ITIL

A set of Best Practice guidance for IT Service Management. ITIL is owned by the OGC and consists of a series of publications giving guidance on the provision of Quality IT Services, and on the Processes and facilities needed to support them. See www.itil.co.uk for more information.

J Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition
JCL Job Control Language
Job control language (JCL) A problem-oriented language used to identify the job or describe its requirements to an MVS operating system.

Job Description

A Document that defines the Roles, responsibilities, skills and knowledge required by a particular person. One Job Description can include multiple Roles, for example the Roles of Configuration Manager and Change Manager may be carried out by one person.

Journal A checkpoint data set that contains work to be done. For XRC, the work to be done consists of all changed records from the primary volumes. Changed records are collected and formed into a “consistency group”, and then the group of updates are applied to the secondary volumes.
Journal data set A checkpoint data set that contains data to be written to the secondary volume. For XRC, this data consists of all changed records from the primary volumes. Changed records are collected and formed into a “consistency group”, and then the group of updates is applied to the secondary volumes.
Journaling The process of logging changes or updates to a database since the last full backup. Journals can be used to recover previous versions of a file before updates were made, or to facilitate disaster recovery, if performed remotely, by applying changes to the last safe backup.
K Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition
KB Kilobyte

Kepner & Tregoe Analysis

A structured approach to Problem solving. The Problem is analyzed in terms of what, where, when and extent. Possible causes are identified. The most probable cause is tested. The true cause is verified.

Key Performance Indicator

A Metric that is used to help manage a Process, IT Service or Activity. Many Metrics may be measured, but only the most important of these are defined as KPIs and used to actively manage and report on the Process, IT Service or Activity. KPIs should be selected to ensure that Efficiency, Effectiveness, and Cost Effectiveness are all managed. See also Critical Success Factor.

keyword A name that identifies a parameter in a command string. Keywords can be entered in their entirety or as abbreviations identified in the syntax diagram for the command.
kilobyte (KB) 1,024 bytes.
km Kilometer

Knowledge Base

A logical database containing the data used by the Service Knowledge Management System.

Knowledge Management

The Process responsible for gathering, analyzing, storing and sharing knowledge and information within an Organization. The primary purpose of Knowledge Management is to improve Efficiency by reducing the need to rediscover knowledge. See also Data-to-Information-to-Knowledge-to-Wisdom and Service Knowledge Management System.

L Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition
Licensed Internal Code (LIC Microcode that is licensed to the customer to provide particular functions. LIC is implemented in a part of storage that is not addressable by user programs.

Lifecycle

The various stages in the life of an IT Service, Configuration Item, Incident, Problem, Change, etc. The Lifecycle defines the Categories for Status and the Status transitions that are permitted. For example:

  • The Lifecycle of an Application includes Requirements, Design, Build, Deploy, Operate, Optimize
  • The Expanded Incident Lifecycle includes Detect, Respond, Diagnose, Repair, Recover, Restore
  • The Lifecycle of a Server may include: Ordered, Received, In Test, Live, Disposed, etc.

Live Environment

A controlled Environment containing Live Configuration Items used to deliver IT Services to Customers.

logical partition (LPAR) The ESA/390 term for a set of functions that create the programming environment that is defined by the ESA/390 architecture. ESA/390 architecture uses this term when more than one LPAR is established on a processor. An LPAR is conceptually similar to a virtual machine environment except that the LPAR is a function of the processor. Also, the LPAR does not depend on an operating system to create the virtual machine environment.
logical storage subsystem A collection of addresses that are associated with the same logical subsystem.
logical subsystem The logical functions of a storage controller that allow one or more host I/O interfaces to access a set of devices. The controller aggregates the devices according to the addressing mechanisms of the associated I/O interfaces. One or more logical subsystems exist on a storage controller. In general, the controller associates a given set of devices with only one logical subsystem.
LPAR Logical PARtition. A logical segmentation of a mainframe’s memory and other resources that allows it to run its own copy of the operating system and associated applications. See also virtual server.
LSS Logical storage subsystem
M Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z /td>
Term Definition

Maintainability

A measure of how quickly and Effectively a Configuration Item or IT Service can be restored to normal working after a Failure. Maintainability is often measured and reported as MTRS.

Maintainability is also used in the context of Software or IT Service Development to mean ability to be Changed or Repaired easily.

Managed Services

A perspective on IT Services which emphasizes the fact that they are managed. The term Managed Services is also used as a synonym for Outsourced IT Services.

Management Information

Information that is used to support decision making by managers. Management Information is often generated automatically by tools supporting the various IT Service Management Processes. Management Information often includes the values of KPIs such as ‘Percentage of Changes leading to Incidents’, or ‘first-time fix rate’.

Management of Risk

The OGC methodology for managing Risks. M_o_R includes all the Activities required to identify and Control the exposure to Risk, which may have an impact on the achievement of an Organization’s Business Objectives. See www.m-o-r.org for more details.

Management System

The framework of Policy, Processes and Functions that ensures an Organization can achieve its Objectives.

Master data set The master data set ensures consistency among all XRC subsystems contained within the coupled XRC system.
Master session A logical entity that is used to coordinate session commands and data consistency across multiple XRC sessions. A master session exists as long as there is an XRC session coupled to the master session.

Maturity

A measure of the Reliability, Efficiency and Effectiveness of a Process, Function, Organization, etc. The most mature Processes and Functions are formally aligned to Business Objectives and Strategy, and are supported by a framework for continual improvement.

Maturity Level

A named level in a Maturity model such as the Carnegie Mellon Capability Maturity Model Integration.

Mb Megabit. 1,048,576 bits.
MB Megabyte. 1,048,576 bytes.

Mean Time Between Failures

A Metric for measuring and reporting Reliability. MTBF is the average time that a Configuration Item or IT Service can perform its agreed Function without interruption. This is measured from when the CI or IT Service starts working, until it next fails.

Megabyte (MB) 1,048,576 bytes.

Metric

Something that is measured and reported to help manage a Process, IT Service or Activity. See also KPI.

Metro Mirror (previously known as synchronous Peer-to-Peer Remote Copy, or PPRC) provides real-time mirroring of logical volumes between two ESSs that can be located up to 300 km from each other. It is a synchronous copy solution where write operations are completed on both copies (local and remote site) before they are considered to be complete.
For a more detailed description, please click on: Metro Mirror

Middleware

Software that connects two or more software Components or Applications. Middleware is usually purchased from a Supplier, rather than developed within the IT Service Provider.

MIP Million instructions per second.

Mission Statement

The Mission Statement of an Organization is a short but complete description of the overall purpose and intentions of that Organization. It states what is to be achieved, but not how this should be done.

Model

A representation of a System, Process, IT Service, Configuration Item, etc. that is used to help understand or predict future behavior.

Modeling

A technique that is used to predict the future behavior of a System, Process, IT Service, Configuration Item, etc. Modeling is commonly used in Financial Management, Capacity Management and Availability Management.

Monitor Control Loop

Monitoring the output of a Task, Process, IT Service or Configuration Item; comparing this output to a predefined Norm; and taking appropriate action based on this comparison.

Monitoring

Repeated observation of a Configuration Item, IT Service or Process to detect Events and to ensure that the current status is known.

Multiple eXtended Remote Copy (MXRC) An enhancement to XRC that supports up to five XRC sessions within a single LPAR.
MVS Multiple Virtual Storage. An IBM mainframe operating system.
N Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition
N + 1 A fault tolerant strategy that includes multiple systems or components protected by one backup system or component. (Many-to-one relationship).
Nonvolatile storage (NVS) Random access electronic storage with a backup battery power source, used to retain data during a power failure. Nonvolatile storage, accessible from all cached IBM storage clusters, stores data during remote copy operations.
NVS Nonvolatile storage.
O Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition

Objective

The defined purpose or aim of a Process, an Activity or an Organization as a whole. Objectives are usually expressed as measurable targets. The term Objective is also informally used to mean a Requirement. See also Outcome.

OC12 A high-speed telecommunications link supporting a speed of 622 megabits (Mb) per second. Equivalent bandwidth to four OC3s.
OC3 A high-speed telecommunications link supporting a speed of 155 megabits (Mb) per second.
OC48 A high-speed telecommunications link supporting a speed of 2.4 gigabits (Gb) per second. Equivalent bandwidth of four OC12s.

Office of Government Commerce

OGC owns the ITIL brand (copyright and trademark). OGC is a UK Government department that supports the delivery of the government’s procurement agenda through its work in collaborative procurement and in raising levels of procurement skills and capability with departments. It also provides support for complex public sector projects.

Off-Site Storage Any place physically located a significant distance away from the primary site, where duplicated and vital records (hard copy or electronic and/or equipment) may be stored for use during recovery.

Operate

To perform as expected. A Process or Configuration Item is said to Operate if it is delivering the Required outputs. Operate also means to perform one or more Operations. For example, to Operate a computer is to do the day-to-day Operations needed for it to perform as expected.

Operating system Software that controls the execution of programs. An operating system may provide services such as resource allocation, scheduling, input/output control, and data management.

Operation

Day-to-day management of an IT Service, System, or other Configuration Item. Operation is also used to mean any pre-defined Activity or Transaction. For example loading a magnetic tape, accepting money at a point of sale, or reading data from a disk drive.

Operational

The lowest of three levels of Planning and delivery (Strategic, Tactical, Operational). Operational Activities include the day-to-day or short-term Planning or delivery of a Business Process or IT Service Management Process. The term Operational is also a synonym for Live.

Operational Cost

Cost resulting from running the IT Services. Often repeating payments. For example staff costs, hardware maintenance and electricity (also known as ‘current expenditure’ or ‘revenue expenditure’). See also Capital Expenditure.

Operational Expenditure

See Operational Cost.

OpEx

See Operational Expenditure, Operational Cost.

Operational Level Agreement

An Agreement between an IT Service Provider and another part of the same Organization. An OLA supports the IT Service Provider’s delivery of IT Services to Customers. The OLA defines the goods or Services to be provided and the responsibilities of both parties. For example there could be an OLA:

  • Between the IT Service Provider and a procurement department to obtain hardware in agreed times
  • Between the Service Desk and a Support Group to provide Incident Resolution in agreed times.

See also Service Level Agreement.

Operations Management

See IT Operations Management.

Optimize

Review, Plan and request Changes, in order to obtain the maximum Efficiency and Effectiveness from a Process, Configuration Item, Application, etc.

Organization

A company, legal entity or other institution. Examples of Organizations that are not companies include International Standards Organization or itSMF. The term Organization is sometimes used to refer to any entity that has People, Resources and Budgets. For example a Project or Business Unit.

Outcome

The result of carrying out an Activity; following a Process; delivering an IT Service, etc. The term Outcome is used to refer to intended results, as well as to actual results. See also Objective.

Outsourcing

Using an External Service Provider to manage IT Services.

Overhead

See Indirect cost.

P Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition
Parallel access volume (PAV) PAVs are logical addresses that allow your system to access a single volume from a single host with multiple concurrent requests.
PAV Parallel access volume.

Partnership

A relationship between two Organizations that involves working closely together for common goals or mutual benefit. The IT Service Provider should have a Partnership with the Business, and with Third Parties who are critical to the delivery of IT Services.

Peer-to-peer remote copy A hardware-based remote copy option that provides disk volume copy across storage subsystems for Disaster Recovery, device migration, and workload migration.
Pending The initial state of a defined volume pair, before it becomes a duplex pair. During this state, the contents of the primary volume are copied to the secondary volume.

Performance

A measure of what is achieved or delivered by a System, person, team, Process, or IT Service.

Performance Management

The Process responsible for day-to-day Capacity Management Activities. These include monitoring, threshold detection, Performance analysis and Tuning, and implementing changes related to Performance and Capacity.

Pilot

A limited Deployment of an IT Service, a Release or a Process to the Live Environment. A pilot is used to reduce Risk and to gain User feedback and Acceptance. See also Test, Evaluation.

PiT Point-in-Time.

Plan

A detailed proposal that describes the Activities and Resources needed to achieve an Objective. For example a Plan to implement a new IT Service or Process. ISO/IEC 20000 requires a Plan for the management of each IT Service Management Process.

Plan–Do–Check–Act

A four-stage cycle for Process management, attributed to Edward Deming. Plan–Do–Check–Act is also called the Deming Cycle.

PLAN: Design or revise Processes that support the IT Services.

DO: Implement the Plan and manage the Processes.

CHECK: Measure the Processes and IT Services, compare with Objectives and produce reports.

ACT: Plan and implement Changes to improve the Processes.

Planning

An Activity responsible for creating one or more Plans. For example, Capacity Planning.

PMBOK

A Project management Standard maintained and published by the Project Management Institute. PMBOK stands for Project Management Body of Knowledge. See www.pmi.org for more information. See also PRINCE2.

Point-in-Time Consistency Data is Point in Time consistent if all of the related data components (either a group of data sets or a set of logical volumes) are as they were at any single instant in time.

Policy

Formally documented management expectations and intentions. Policies are used to direct decisions, and to ensure consistent and appropriate development and implementation of Processes, Standards, Roles, Activities, IT Infrastructure, etc.

Port (1) An access point for data entry or exit. (2) A receptacle on a device to which a cable for another device is attached.

Post-Implementation Review

A Review that takes place after a Change or a Project has been implemented. A PIR determines if the Change or Project was successful, and identifies opportunities for improvement.

PPRC Peer-to-peer remote copy. Now referred to as Metro Mirror for ESS.
PPRC/XD Peer-to-peer remote copy over an extended distance. Now referred to as Global Copy for ESS.
PPRC dynamic address switching (P/DAS) A software function that provides the ability to dynamically redirect all application I/O from one PPRC volume to another PPRC volume.

Practice

A way of working, or a way in which work must be done. Practices can include Activities, Processes, Functions, Standards and Guidelines. See also Best Practice.

Pricing

The Activity for establishing how much Customers will be Charged.

Primary device One device of a remote copy volume pair. All primary application channel commands are directed to the primary device. The data on the primary device is duplicated on the secondary device. See also secondary device.
Primary system A system made up of one or more host systems that perform the main set of functions for an establishment. This is the system that updates the primary disk volumes that are being copied by a copy services function. Also referred to as application system.

PRINCE2

The standard UK government methodology for Project management. See www.ogc.gov.uk/prince2 for more information. See also PMBOK.

Priority

A Category used to identify the relative importance of an Incident, Problem or Change. Priority is based on Impact and Urgency, and is used to identify required times for actions to be taken. For example the SLA may state that Priority 2 Incidents must be resolved within 12 hours.

Proactive Problem Management

Part of the Problem Management Process. The Objective of Proactive Problem Management is to identify Problems that might otherwise be missed. Proactive Problem Management analyses Incident Records, and uses data collected by other IT Service Management Processes to identify trends or significant problems.

Problem

A cause of one or more Incidents. The cause is not usually known at the time a Problem Record is created, and the Problem Management Process is responsible for further investigation.

Problem Management

The Process responsible for managing the Lifecycle of all Problems. The primary objectives of Problem Management are to prevent Incidents from happening, and to minimize the Impact of Incidents that cannot be prevented.

Problem Record

A Record containing the details of a Problem. Each Problem Record documents the Lifecycle of a single Problem.

Procedure

A Document containing steps that specify how to achieve an Activity. Procedures are defined as part of Processes. See also Work Instruction.

Process

A structured set of Activities designed to accomplish a specific Objective. A Process takes one or more defined inputs and turns them into defined outputs. A Process may include any of the Roles, responsibilities, tools and management Controls required to reliably deliver the outputs. A Process may define Policies, Standards, Guidelines, Activities, and Work Instructions if they are needed.

Process Control

The Activity of planning and regulating a Process, with the Objective of performing the Process in an Effective, Efficient, and consistent manner.

Process Manager

A Role responsible for Operational management of a Process. The Process Manager’s responsibilities include Planning and coordination of all Activities required to carry out, monitor and report on the Process. There may be several Process Managers for one Process, for example regional Change Managers or IT Service Continuity Managers for each data centre. The Process Manager Role is often assigned to the person who carries out the Process Owner Role, but the two Roles may be separate in larger Organizations.

Process Owner

A Role responsible for ensuring that a Process is Fit for Purpose. The Process Owner’s responsibilities include sponsorship, Design, Change Management and continual improvement of the Process and its Metrics. This Role is often assigned to the same person who carries out the Process Manager Role, but the two Roles may be separate in larger Organizations.

Production Environment

See Live Environment.

Programme

A number of Projects and Activities that are planned and managed together to achieve an overall set of related Objectives and other Outcomes.

Project

A temporary Organization, with people and other Assets required to achieve an Objective or other Outcome. Each Project has a Lifecycle that typically includes initiation, Planning, execution, Closure, etc. Projects are usually managed using a formal methodology such as PRINCE2.

PRojects IN Controlled Environments (PRINCE2)

See PRINCE2

PTAM Pick-up Truck Access method. Refers to the process of sending backup tapes physically offsite.
PTF Program temporary fix.
PtP VTS Peer-to-Peer Virtual Tape Server: A PtP VTS consists of two IBM automated tape libraries (3494 or 3584), two Virtual Tape Servers (VTS) and an additional frame that houses th4e VTCs (Virtual Tape Controllers). The VTS contains a disk cache and appears to the system as a large number of tape drives.
This type of configuration is frequently used to support enterprise-wide Disaster Recovery and Business Continuity. Data can be written into one or both VTSs and then automatically copied to the peer VTS.
Q Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition
QSAM Queued Sequential Access Method. A file access method for reading, writing and updating sequential data sets and partitioned data set members.

Qualification

An Activity that ensures that IT Infrastructure is appropriate, and correctly configured, to support an Application or IT Service. See also Validation.

Quality

The ability of a product, Service, or Process to provide the intended value. For example, a hardware Component can be considered to be of high Quality if it performs as expected and delivers the required Reliability. Process Quality also requires an ability to monitor Effectiveness and Efficiency, and to improve them if necessary. See also Quality Management System.

Quality Assurance

The Process responsible for ensuring that the Quality of a product, Service or Process will provide its intended Value.

Quality Management System

The set of Processes responsible for ensuring that all work carried out by an Organization is of a suitable Quality to reliably meet Business Objectives or Service Levels. See also ISO 9000.

Quick Win

An improvement Activity that is expected to provide a Return on Investment in a short period of time with relatively small Cost and effort.

R Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition

RACI

A Model used to help define Roles and Responsibilities. RACI stands for Responsible, Accountable, Consulted and Informed. See also Stakeholder.

Record

A Document containing the results or other output from a Process or Activity. Records are evidence of the fact that an activity took place and may be paper or electronic. For example, an Audit report, an Incident Record, or the minutes of a meeting.

Recovery

Returning a Configuration Item or an IT Service to a working state. Recovery of an IT Service often includes recovering data to a known consistent state. After Recovery, further steps may be needed before the IT Service can be made available to the Users (Restoration).

Redundancy

See Fault Tolerance.

The term Redundant also has a generic meaning of obsolete, or no longer needed.

RAS Reliability Availability Serviceability: A term coined by IBM to describe a computer system's overall reliability, its ability to continue processing following a component failure and its ability to undergo maintenance without shutting it down completely.
Read hit When data requested by the read operation is in the cache.
Read miss When data requested by the read operation is not in the cache.
Recovery The process of rebuilding data after it has been damaged or destroyed. In the case of remote copy, this involves applying data from secondary volume copies.
Recovery Point Objective (RPO) The point in time to which data must be restored in order to resume processing transactions. RPO is the basis on which a data protection strategy is developed.
Recovery Time The period from the disaster declaration to the recovery of the critical functions.
Recovery Time Objective (RTO) The period of time within which systems, applications, or functions must be recovered after an outage (e.g. one business day). RTOs are often used as the basis for the development of recovery strategies, and as a determinant as to whether or not to implement the recovery strategies during a disaster situation. Similar Terms: Maximum Allowable Downtime
Recovery system A system that is used in place of a primary application system that is no longer available for use. Data from the application system must be available for use on the recovery system. This is usually accomplished through backup and recovery techniques, or through various disk copying techniques, such as remote copy.

Relationship

A connection or interaction between two people or things. In Business Relationship Management it is the interaction between the IT Service Provider and the Business. In Configuration Management it is a link between two Configuration Items that identifies a dependency or connection between them. For example Applications may be linked to the Servers they run on, IT Services have many links to all the CIs that contribute to them.

Release

A collection of hardware, software, documentation, Processes or other Components required to implement one or more approved Changes to IT Services. The contents of each Release are managed, tested, and deployed as a single entity.

Release and Deployment Management

The Process responsible for both Release Management and Deployment.

Release Management

The Process responsible for Planning, scheduling and controlling the movement of Releases to Test and Live Environments. The primary Objective of Release Management is to ensure that the integrity of the Live Environment is protected and that the correct Components are released. Release Management is part of the Release and Deployment Management Process.

Release Record

A Record in the CMDB that defines the content of a Release. A Release Record has Relationships with all Configuration Items that are affected by the Release.

Reliability

A measure of how long a Configuration Item or IT Service can perform its agreed Function without interruption. Usually measured as MTBF or MTBSI. The term Reliability can also be used to state how likely it is that a Process, Function, etc. will deliver its required outputs. See also Availability.

Remote copy A storage-based disaster recovery and workload migration function that can copy data in real time to a remote location. Two options of remote copy are available. See peer-to-peer remote copy and extended remote copy.

Repair

The replacement or correction of a failed Configuration Item.

Request for Change

A formal proposal for a Change to be made. An RFC (Request For Change) includes details of the proposed Change, and may be recorded on paper or electronically. The term RFC is often misused to mean a Change Record, or the Change itself.

Requirement

A formal statement of what is needed. For example, a Service Level Requirement, a Project Requirement or the required Deliverables for a Process.

Resilience

The ability of an organization to absorb the impact of a business interruption, and continue to provide a minimum acceptable level of service.

The ability of a Configuration Item or IT Service to resist Failure or to Recover quickly following a Failure. For example an enterprise that has implemented an advanced recovery methodology such as data replication may be able to switch processing to an alternate site without any loss of service. See also Fault Tolerance.

Resolution

Action taken to repair the Root Cause of an Incident or Problem, or to implement a Workaround. In ISO/IEC 20000, Resolution Processes is the Process group that includes Incident and Problem Management.

Resource

A generic term that includes IT Infrastructure, people, money or anything else that might help to deliver an IT Service. Resources are considered to be Assets of an Organization. See also Capability, Service Asset.

Response Time

A measure of the time taken to complete an Operation or Transaction. Used in Capacity Management as a measure of IT Infrastructure Performance, and in Incident Management as a measure of the time taken to answer the phone, or to start Diagnosis.

Responsiveness

A measurement of the time taken to respond to something. This could be Response Time of a Transaction, or the speed with which an IT Service Provider responds to an Incident or Request for Change, etc.

Restoration of Service

See Restore.

Restore

Taking action to return an IT Service to the Users after Repair and Recovery from an Incident. This is the primary Objective of Incident Management.

Resynchronization A track image copy from the primary volume to the secondary volume of only the tracks which have changed since the volume was last in duplex mode.

Retire

Permanent removal of an IT Service, or other Configuration Item, from the Live Environment. Retired is a stage in the Lifecycle of many Configuration Items.

Return on Investment

A measurement of the expected benefit of an investment. In the simplest sense it is the net profit of an investment divided by the net worth of the assets invested. See also Value on Investment.

Review

An evaluation of a Change, Problem, Process, Project, etc. Reviews are typically carried out at predefined points in the Lifecycle, and especially after Closure. The purpose of a Review is to ensure that all Deliverables have been provided, and to identify opportunities for improvement. See also Post-Implementation Review.

Risk

A possible event that could cause harm or loss, or affect the ability to achieve Objectives. A Risk is measured by the probability of a Threat, the Vulnerability of the Asset to that Threat, and the Impact it would have if it occurred.

Risk Assessment

The initial steps of Risk Management. Analyzing the value of Assets to the business, identifying Threats to those Assets, and evaluating how Vulnerable each Asset is to those Threats. Risk Assessment can be quantitative (based on numerical data) or qualitative.

Risk Management

The discipline which ensures that an organization does not assume an unacceptable level of risk.

The Process responsible for identifying, assessing and controlling Risks. See also Risk Assessment.

Role

A set of responsibilities, Activities and authorities granted to a person or team. A Role is defined in a Process. One person or team may have multiple Roles, for example the Roles of Configuration Manager and Change Manager may be carried out by a single person.

Rolling disaster In disaster situations, it is unlikely that the entire complex will fail at the same moment. Failures tend to be intermittent and gradual, and a disaster can occur over many seconds, even minutes. Because some data may have been processed and other data lost in this transition, data integrity on the secondary volumes is exposed. This situation is called a rolling disaster. The mirrored data at the recovery site must be managed so that cross-volume or LSS data consistency is preserved during the intermittent or gradual failure.

Root Cause

The underlying or original cause of an Incident or Problem.

Root Cause Analysis

An Activity that identifies the Root Cause of an Incident or Problem. RCA typically concentrates on IT Infrastructure failures. See also Service Failure Analysis.

RPO Recovery Point Objective.
RTO Recovery Time Objective.
S Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition
SAM Sequential access method.
Sarbanes-Oxley Sarbanes-Oxley (SOx) is U.S. legislation that applies to public companies intended to strengthen corporate governance. SOx specifies requirements for audits, financial reporting and disclosure, conflicts of interest, and corporate governance. It also establishes new supervisory mechanisms for accountants and accounting firms that conduct external audits of public companies. This legislation also imposes a legal duty on directors of companies to ensure that they have taken adequate information security precautions to protect and preserve financial information.

Scalability

The ability of an IT Service, Process, Configuration Item, etc. to perform its agreed Function when the Workload or Scope changes.

Scope

The boundary, or extent, to which a Process, Procedure, Certification, Contract, etc. applies. For example the Scope of Change Management may include all Live IT Services and related Configuration Items, the Scope of an ISO/IEC 20000 Certificate may include all IT Services delivered out of a named data centre.

SDF Status Display facility.
SDM System Data Mover.
Secondary device One of the devices in a dual copy or remote copy logical volume pair that contains a duplicate of the data on the primary device. Unlike the primary device, the secondary device may only accept a limited subset of channel commands.

Security

See Information Security Management.

Security Management

See Information Security Management.

Server

A computer that is connected to a network and provides software Functions that are used by other Computers.

Service

A means of delivering value to Customers by facilitating Outcomes Customers want to achieve without the ownership of specific Costs and Risks.

Service Asset

Any Capability or Resource of a Service Provider. See also Asset.

Service Asset and Configuration Management

The Process responsible for both Configuration Management and Asset Management.

Service Capacity Management

The Activity responsible for understanding the Performance and Capacity of IT Services. The Resources used by each IT Service and the pattern of usage over time are collected, recorded, and analyzed for use in the Capacity Plan. See also Business Capacity Management, Component Capacity Management.

Service Catalogue

A database or structured Document with information about all Live IT Services, including those available for Deployment. The Service Catalogue is the only part of the Service Portfolio published to Customers, and is used to support the sale and delivery of IT Services. The Service Catalogue includes information about deliverables, prices, contact points, ordering and request Processes.

Service Culture

A Customer-oriented Culture. The major Objectives of a Service Culture are Customer satisfaction and helping Customers to achieve their Business Objectives.

Service Design

A stage in the Lifecycle of an IT Service. Service Design includes a number of Processes and Functions and is the title of one of the Core ITIL Version 3 publications. See also Design.

Service Desk

The Single Point of Contact between the Service Provider and the Users. A typical Service Desk manages Incidents and Service Requests, and also handles communication with the Users.

Service Failure Analysis

An Activity that identifies underlying causes of one or more IT Service interruptions. SFA identifies opportunities to improve the IT Service Provider’s Processes and tools, and not just the IT Infrastructure. SFA is a time-constrained, project-like activity, rather than an ongoing process of analysis. See also Root Cause Analysis.

Service Improvement Plan

A formal Plan to implement improvements to a Process or IT Service.

Service Knowledge Management System

A set of tools and databases that are used to manage knowledge and information. The SKMS includes the Configuration Management System, as well as other tools and databases. The SKMS stores, manages, updates, and presents all information that an IT Service Provider needs to manage the full Lifecycle of IT Services.

Service Level

Measured and reported achievement against one or more Service Level Targets. The term Service Level is sometimes used informally to mean Service Level Target.

Service Level Agreement

An Agreement between an IT Service Provider and a Customer. The SLA describes the IT Service, documents Service Level Targets, and specifies the responsibilities of the IT Service Provider and the Customer. A single SLA may cover multiple IT Services or multiple customers. See also Operational Level Agreement.

Service Level Management

The Process responsible for negotiating Service Level Agreements, and ensuring that these are met. SLM is responsible for ensuring that all IT Service Management Processes, Operational Level Agreements, and Underpinning Contracts, are appropriate for the agreed Service Level Targets. SLM monitors and reports on Service Levels, and holds regular Customer reviews.

Service Level Requirement

A Customer Requirement for an aspect of an IT Service. SLRs are based on Business Objectives and are used to negotiate agreed Service Level Targets.

Service Level Target

A commitment that is documented in a Service Level Agreement. Service Level Targets are based on Service Level Requirements, and are needed to ensure that the IT Service design is Fit for Purpose. Service Level Targets should be SMART, and are usually based on KPIs.

Service Management

Service Management is a set of specialized organizational capabilities for providing value to customers in the form of services.

Service Management Lifecycle

An approach to IT Service Management that emphasizes the importance of coordination and Control across the various Functions, Processes, and Systems necessary to manage the full Lifecycle of IT Services. The Service Management Lifecycle approach considers the Strategy, Design, Transition, Operation and Continuous Improvement of IT Services.

Service Manager

A manager who is responsible for managing the end-to-end Lifecycle of one or more IT Services. The term Service Manager is also used to mean any manager within the IT Service Provider. Most commonly used to refer to a Business Relationship Manager, a Process Manager, an Account Manager or a senior manager with responsibility for IT Services overall.

Service Operation

A stage in the Lifecycle of an IT Service. Service Operation includes a number of Processes and Functions and is the title of one of the Core ITIL Version 3 publications. See also Operation.

Service Owner

A Role that is accountable for the delivery of a specific IT Service.

Service Package

A detailed description of an IT Service that is available to be delivered to Customers. A Service Package includes a Service Level Package and one or more Core Services and Supporting Services.

Service Portfolio

The complete set of Services that are managed by a Service Provider. The Service Portfolio is used to manage the entire Lifecycle of all Services, and includes three Categories: Service Pipeline (proposed or in Development); Service Catalogue (Live or available for Deployment); and Retired Services. See also Service Portfolio Management.

Service Portfolio Management

The Process responsible for managing the Service Portfolio. Service Portfolio Management considers Services in terms of the Business value that they provide.

Service Provider

An Organization supplying Services to one or more Internal Customers or External Customers. Service Provider is often used as an abbreviation for IT Service Provider.

Service Reporting

The Process responsible for producing and delivering reports of achievement and trends against Service Levels. Service Reporting should agree the format, content and frequency of reports with Customers.

Service Request

A request from a User for information, or advice, or for a Standard Change or for Access to an IT Service. For example to reset a password, or to provide standard IT Services for a new User. Service Requests are usually handled by a Service Desk, and do not require an RFC to be submitted.

Service Strategy

The title of one of the Core ITIL Version 3 publications. Service Strategy establishes an overall Strategy for IT Services and for IT Service Management.

Service Transition

A stage in the Lifecycle of an IT Service. Service Transition includes a number of Processes and Functions and is the title of one of the Core ITIL Version 3 publications. See also Transition.

Serviceability

The ability of a Third-Party Supplier to meet the terms of its Contract. This Contract will include agreed levels of Reliability, Maintainability or Availability for a Configuration Item.

Shadow File Processing An approach to data backup in which real-time duplicates of critical files are maintained at a remote processing site. SIMILAR TERMS: Remote Mirroring.
SHARE A Large Systems Users Group.
Sidefile A storage area used to maintain copies of tracks within a concurrent copy domain. A concurrent copy operation maintains a sidefile in storage control cache and another in processor storage.

Simplex state A volume is in the simplex state if it is not part of a dual copy or a remote copy volume pair. Ending a volume pair returns the two devices to the simplex state. In this case, there is no longer any capability for either automatic updates of the secondary device or for logging changes, as would be the case in a suspended state.

Simulation modeling

A technique that creates a detailed model to predict the behavior of a Configuration Item or IT Service. Simulation Models can be very accurate but are expensive and time consuming to create. A Simulation Model is often created by using the actual Configuration Items that are being modeled, with artificial Workloads or Transactions. They are used in Capacity Management when accurate results are important. A simulation model is sometimes called a Performance Benchmark.

Single Point of Contact

Providing a single consistent way to communicate with an Organization or Business Unit. For example, a Single Point of Contact for an IT Service Provider is usually called a Service Desk.

Single Point of Failure

A term used to describe a unique hardware component, data path or source of a service, activity, and/or process. There is no alternate component and a loss of that element could lead to a catastrophic failure of a critical function.

Any Configuration Item that can cause an Incident when it fails, and for which a Countermeasure has not been implemented. A SPOF may be a person, or a step in a Process or Activity, as well as a Component of the IT Infrastructure. See also Failure.

SLAM Chart

A Service Level Agreement Monitoring Chart is used to help monitor and report achievements against Service Level Targets. A SLAM Chart is typically color coded to show whether each agreed Service Level Target has been met, missed, or nearly missed during each of the previous 12 months.

SMART

An acronym for helping to remember that targets in Service Level Agreements and Project Plans should be Specific, Measurable, Achievable, Relevant and Timely.

SMF System Management Facilities.
SMS Storage Management Subsystem.

Snapshot

The current state of a Configuration as captured by a discovery tool. Also used as a synonym for Benchmark. See also Baseline.

SOx

See Sarbanes-Oxley

Specification

A formal definition of Requirements. A Specification may be used to define technical or Operational Requirements, and may be internal or external. Many public Standards consist of a Code of Practice and a Specification. The Specification defines the Standard against which an Organization can be Audited.

SPOF See Single Point of Failure.
SRM System resources manager.
SSID Subsystem identifier.

Stakeholder

All people who have an interest in an Organization, Project, IT Service, etc. Stakeholders may be interested in the Activities, targets, Resources, or Deliverables. Stakeholders may include Customers, Partners, employees, shareholders, owners, etc. See also RACI.

Standard

A mandatory Requirement. Examples include ISO/IEC 20000 (an international Standard), an internal security standard for Unix configuration, or a government standard for how financial Records should be maintained. The term Standard is also used to refer to a Code of Practice or Specification published by a Standards Organization such as ISO or BSI. See also Guideline.

stage The process of writing data from a disk to the cache.
State data set A data set that contains status of the XRC session and of associated volumes that XRC is managing.

Status

The name of a required field in many types of Record. It shows the current stage in the Lifecycle of the associated Configuration Item, Incident, Problem, etc.

Storage cluster A power and service region that runs channel commands and controls the storage devices. Each storage cluster contains both channel and device interfaces. Storage clusters also perform the disk control functions.
Storage control The component in a storage subsystem that handles interaction between processor channel and storage devices, runs channel commands, and controls storage devices.
Storage control session A logical entity that is created for the purpose of processing updates to the XRC primary volumes. It is used to group sets of primary XRC volumes that are being processed by an XRC session within the storage control.

Storage Management

The Process responsible for managing the storage and maintenance of data throughout its Lifecycle.

Storage Management Subsystem (SMS) A component of MVS/DFP that is used to automate and centralize the management of storage by providing the storage administrator with control over data class, storage class, management class, storage group, aggregate group and automatic class selection routine definitions.

Strategic

The highest of three levels of Planning and delivery (Strategic, Tactical, Operational). Strategic Activities include Objective setting and long-term Planning to achieve the overall Vision.

Strategy

A Strategic Plan designed to achieve defined Objectives.

Subsystem identifier (SSID) A user-assigned number that identifies a disk subsystem. This number is set by the service representative at the time of installation and is included in the vital product data.

Supplier

A Third Party responsible for supplying goods or Services that are required to deliver IT services. Examples of suppliers include commodity hardware and software vendors, network and telecom providers, and outsourcing Organizations. See also Underpinning Contract, Supply Chain.

Supplier Management

The Process responsible for ensuring that all Contracts with Suppliers support the needs of the Business, and that all Suppliers meet their contractual commitments.

Supply Chain

The Activities in a Value Chain carried out by Suppliers. A Supply Chain typically involves multiple Suppliers, each adding value to the product or Service.

Support Group

A group of people with technical skills. Support Groups provide the Technical Support needed by all of the IT Service Management Processes. See also Technical Management.

Suspended state When only one of the devices in a dual copy or remote copy volume pair is being updated because of either a permanent error condition or an authorized user command. All writes to the remaining functional device are logged. This allows for automatic resynchronization of both volumes when the volume pair is reset to the active duplex state.

SWOT Analysis

A technique that reviews and analyses the internal strengths and weaknesses of an Organization and of the external opportunities and threats that it faces. SWOT stands for Strengths, Weaknesses, Opportunities and Threats.

Synchronous operation A type of operation in which the remote copy function copies updates to the secondary volume pair at the same time that the primary volume is updated. Contrast with asynchronous operation.
Synchronization An initial volume copy. This is a track image copy of each primary track on the volume to the secondary volume.
Sysplex A set of MVS or z/OS systems that are communicating and cooperating with each other through certain multisystem hardware components and software services, such as CXRC, to process workloads. This term is derived from “system complex”.

System

A number of related things that work together to achieve an overall Objective. For example:

  • A computer System including hardware, software and Applications
  • A management System, including multiple Processes that are planned and managed together. For example, a Quality Management System
  • A Database Management System or Operating System that includes many software modules that are designed to perform a set of related Functions.
System Data Mover (SDM) A system that interacts with storage controls that have attached XRC primary volumes. The SDM copies updates made to the XRC primary volumes to a set of XRC-managed secondary volumes.
System-managed data set A data set that has been assigned a storage class and is managed by SMS.
T Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition

Tactical

The middle of three levels of Planning and delivery (Strategic, Tactical, Operational). Tactical Activities include the medium-term Plans required to achieve specific Objectives, typically over a period of weeks to months.

Technical Management

The Function responsible for providing technical skills in support of IT Services and management of the IT Infrastructure. Technical Management defines the Roles of Support Groups, as well as the tools, Processes and Procedures required.

Technical Observation

A technique used in Service Improvement, Problem investigation and Availability Management. Technical support staff meet to monitor the behavior and Performance of an IT Service and make recommendations for improvement.

Technical Support

See Technical Management.

Tension Metrics

A set of related Metrics, in which improvements to one Metric have a negative effect on another. Tension Metrics are designed to ensure that an appropriate balance is achieved.

Tertiary volumes Volumes that the recovery site uses when it takes over the primary site’s workload or as a backup of the secondary volumes to ensure that there is a consistent copy of all recoverable volumes in place while XRC secondary volumes are resynchronized. Rather than have the recovery site use the XRC secondary volumes when taking over the primary site’s workload, the restart will use the tertiary volumes.

Test

An Activity that verifies that a Configuration Item, IT Service, Process, etc. meets its Specification or agreed Requirements. See also Acceptance.

Test Environment

A controlled Environment used to Test Configuration Items, Builds, IT Services, Processes, etc.

Third Party

A person, group, or Business that is not part of the Service Level Agreement for an IT Service, but is required to ensure successful delivery of that IT Service. For example, a software Supplier, a hardware maintenance company, or a facilities department. Requirements for Third Parties are typically specified in Underpinning Contracts or Operational Level Agreements.

Threat

Anything that might exploit a Vulnerability. Any potential cause of an Incident can be considered to be a Threat. For example a fire is a Threat that could exploit the Vulnerability of flammable floor coverings. This term is commonly used in Information Security Management and IT Service Continuity Management, but also applies to other areas such as Problem and Availability Management.

Threshold

The value of a Metric that should cause an Alert to be generated, or management action to be taken. For example ‘Priority 1 Incident not solved within four hours’, ‘more than five soft disk errors in an hour’, or ‘more than 10 failed changes in a month’.

Throughput

A measure of the number of Transactions, or other Operations, performed in a fixed time. For example, 5,000 e-mails sent per hour, or 200 disk I/Os per second.

TOD Time of day.
Tiered Storage A data storage environment consisting of two or more kinds of storage delineated by differences in at least one of these four attributes: Price, Performance, Capacity and Function.

Any significant difference in one or more of the four defining attributes can be sufficient to justify a separate storage tier.

Examples:

  • Disk and Tape: Two separate storage tiers identified by differences in all four defining attributes.
  • Old technology disk and new technology disk: Two separate storage tiers identified by differences in one or more of the attributes.
  • High performing disk storage and less expensive, slower disk of the same capacity and function: Two separate tiers.
  • Identical Enterprise class disk configured to utilize different functions such as raid level or replication: A separate storage tier for each set of unique functions.

Note: Storage Tiers are NOT delineated by differences in vendor, architecture, or geometry except where those differences result in clear changes to Price, Performance, Capacity and Function.

Time Sharing Option (TSO) An MVS and Z/OS operating system option that provides interactive time sharing from remote terminals.
Timestamp The affixed value of the system time-of-day clock at a common point of reference for all write I/O operations directed to active XRC primary volumes. The UTC format is yyyy.ddd hh:mm:ss.thmiju.

Total Cost of Ownership

A methodology used to help make investment decisions. TCO assesses the full Lifecycle Cost of owning a Configuration Item, not just the initial Cost or purchase price.

Total Quality Management

A methodology for managing continual Improvement by using a Quality Management System. TQM establishes a Culture involving all people in the Organization in a Process of continual monitoring and improvement.

Transaction

(1) A series of programmed tasks within an application that accomplishes a particular result, often to satisfy a user request. Example: A single CICS transaction might entail dispensing $40 from an ATM and updating the customers account balance.

(2) In processing terms, the work that occurs between the beginning of a unit of work and commit or rollback. A transaction defines the set of operations that is part of an integral set for which consistency must be maintained.

Transaction Consistency The state of all data related to a transaction
  • Before the transaction has begun or,
  • After the transaction has successfully completed or been backed out.
Transaction Consistency is not maintained during the life of a transaction, but should be present both before and after each transaction.

Transition

A change in state, corresponding to a movement of an IT Service or other Configuration Item from one Lifecycle status to the next.

Trend Analysis

Analysis of data to identify time-related patterns. Trend Analysis is used in Problem Management to identify common Failures or fragile Configuration Items, and in Capacity Management as a Modeling tool to predict future behavior. It is also used as a management tool for identifying deficiencies in IT Service Management Processes.

TSO Time Sharing Option.

Tuning

The Activity responsible for Planning changes to make the most efficient use of Resources. Tuning is part of Performance Management, which also includes Performance monitoring and implementation of the required Changes.

U Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition

Underpinning Contract

A Contract between an IT Service Provider and a Third Party. The Third Party provides goods or Services that support delivery of an IT Service to a Customer. The Underpinning Contract defines targets and responsibilities that are required to meet agreed Service Level Targets in an SLA.

Unit Cost

The Cost to the IT Service Provider of providing a single Component of an IT Service. For example the Cost of a single desktop PC, or of a single Transaction.

UltraNet Storage Director Brocade (formerly McData, formerly CNT’s) Channel Extension equipment.
Universal Time, Coordinated Replaces Greenwich Mean Time (GMT) as a global time reference. The format is yyyy.ddd hh:mm:ss.thmiju.

Urgency

A measure of how long it will be until an Incident, Problem or Change has a significant Impact on the Business. For example a high Impact Incident may have low Urgency, if the Impact will not affect the Business until the end of the financial year. Impact and Urgency are used to assign Priority.

User

A person who uses the IT Service on a day-to-day basis. Users are distinct from Customers, as some Customers do not use the IT Service directly.

Utility

Functionality offered by a Product or Service to meet a particular need. Utility is often summarized as ‘what it does’.

utility volume A volume that is available to be used by the extended remote copy function to perform SDM I/O for a primary site storage control’s XRC-related data.
UTC Universal Time, Coordinated.
V Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition

Validation

An Activity that ensures a new or changed IT Service, Process, Plan, or other Deliverable meets the needs of the Business. Validation ensures that Business Requirements are met even though these may have changed since the original design. See also Verification, Acceptance, Qualification.

Value Chain

A sequence of Processes that creates a product or Service that is of value to a Customer. Each step of the sequence builds on the previous steps and contributes to the overall product or Service.

Value for Money

An informal measure of Cost Effectiveness. Value for Money is often based on a comparison with the Cost of alternatives.

Value on Investment

A measurement of the expected benefit of an investment. VOI considers both financial and intangible benefits. See also Return on Investment.

Variance

The difference between a planned value and the actual measured value. Commonly used in Financial Management, Capacity Management and Service Level Management, but could apply in any area where Plans are in place.

Verification

An Activity that ensures a new or changed IT Service, Process, Plan, or other Deliverable is complete, accurate, Reliable and matches its design specification. See also Validation, Acceptance.

Version

A Version is used to identify a specific Baseline of a Configuration Item. Versions typically use a naming convention that enables the sequence or date of each Baseline to be identified. For example Payroll Application Version 3 contains updated functionality from Version 2.

Vision

A description of what the Organization intends to become in the future. A Vision is created by senior management and is used to help influence Culture and Strategic Planning.

Vital Business Function

A Function of a Business Process that is critical to the success of the Business. Vital Business Functions are an important consideration of Business Continuity Management, IT Service Continuity Management and Availability Management.

Vital Records

Records or documents, for legal, regulatory, or operational reasons, cannot be irretrievably lost or damaged without materially impairing the organization's ability to conduct business.

volser Volume serial number.
volume The disk space identified by a common serial number and accessed by any of a set of related addresses. See also device.
VSM VSM and Virtual Storage Manager are trademarks or registered trademarks of Storage Technology Corp Sun/STK). Sun's StorageTek Virtual Storage Manager (VSM) system is a virtual tape solution for mainframe environments consisting of a disk front-end, automated-tape back-end, and STK's proven robotics.
VTL Virtual Tape Library: VTL is a generic (non-vendor specific) term that describes a storage technology that makes it possible to save data as if it were being stored on magnetic tape without the constraints normally seen by writing directly to physical tape. In the large systems environment, this is generally considered to be a disk "front-end" buffer combined with an automated tape library and code that makes it appear as if a great many tape drives were defined to the system. Benefits of virtual tape systems include improvements to the RTO and RPO as well as reduced operating costs.
VTOC Volume table of contents.
Virtual Tape Server (VTS) A storage device that combines a disk cache and automated tape library to improve performance and maximize the use of the physical tape media. The disk cache appears to the system as a large number of tape drives. Data intended for tape storage is written to these virtual tape drives (disk cache) and later written to physical tape in the background. Generally, the most recently used data remains in the disk cache to be available for immediate reuse.

Vulnerability

A weakness that could be exploited by a Threat. For example an open firewall port, a password that is never changed, or a flammable carpet. A missing Control is also considered to be a Vulnerability.

W Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition
WARM SITE An alternate processing site which is only partially equipped (As compared to Hot Site which is fully equipped).

Work Instruction

A Document containing detailed instructions that specify exactly what steps to follow to carry out an Activity. A Work Instruction contains much more detail than a Procedure and is only created if very detailed instructions are needed.

Workaround

Reducing or eliminating the Impact of an Incident or Problem for which a full Resolution is not yet available. For example by restarting a failed Configuration Item. Workarounds for Problems are documented in Known Error Records. Workarounds for Incidents that do not have associated Problem Records are documented in the Incident Record.

Workload

The Resources required to deliver an identifiable part of an IT Service. Workloads may be Categorized by Users, groups of Users, or Functions within the IT Service. This is used to assist in analyzing and managing the Capacity, Performance and Utilization of Configuration Items and IT Services. The term Workload is sometimes used as a synonym for Throughput.

Workload migration The process of moving an application’s data from one set of disk to another for the purpose of balancing performance needs, moving to new hardware, or temporarily relocating data.
X Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Term Definition
XRC Extended remote copy. See Global Mirror for z/Series.
Y Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Z Go To: TOP-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
z/OS The IBM operating system that includes and integrates functions previously provided by many IBM software products (including the MVS operating system). Runs on the z/Series processors.
z/OS Global Mirror See Global Mirror for z/Series.
z/Series IBM enterprise servers based on z/Architecture.

Recovery Specialties, LLC:  Business Continuity and Disaster  Recovery Consulting Services
Recovery Specialties, LLC
Enterprise Storage Solutions,
Business Continuity and Disaster Recovery consulting
for z/OS environments

This document was printed from http://recoveryspecialties.com/



Sitemap | Privacy Statement | Personal Information | Ethics Policy | Conflict of Interest Policy
Recovery Specialties, LLC. All rights reserved. 2007
Recovery  Specialties Storage and Business Continuity consulting for z/Series  environments
Recovery Specialties, LLC
recoveryspecialties.com
Recovery Specialties logo, XRC, PPRC and data mirroring protect  your enterprise
Data  Replication over distance ensures Business Continuity no matter what  happens