DISASTER RECOVERY & BUSINESS CONTINUITY

Overview

In today’s market conditions, business continuity and disaster recovery are a top priority for senior management of most global enterprises. A global enterprise has to deal with economic fluctuations, rapid changes in market trends and competitive pressures on a 24x7 basis, and must be able to swiftly and efficiently deal with unforeseen business interruptions.

VESL Technologies with global experience of more than 15 years in the implementation of Oracle Technologies offer Oracle Active Data Guard as the most effective solution available today to provide Business Continuity and Disaster Recovery by protecting the core asset of any enterprise – its data, and make it available on a 24x7 basis even in the face of disasters and other calamities. This document describes Oracle Active Data Guard technology, demonstrates a brief on what has been implemented, and how it is one of the most important elements in the business continuity infrastructure of any enterprise.

IMPACT OF DISASTERS

With the proliferation of E-Business and Corporate Governance, an enterprise today operates in an extremely complex and a highly networked, global economy, and is more susceptible to interruptions than in the past. The cost of such interruptions, or downtimes, varies across industries and can be as much as millions of dollars. While that number is staggering, the reasons are quite obvious. The Internet has brought millions of customers directly to the electronic storefronts. Critical and interdependent business issues such as customer relationships, competitive advantages, legal obligations, industry reputation and shareholder confidence are even more critical now because of their increased vulnerability to business disruptions and downtimes.

HIGH AVAILABILITY CHALLENGES

disaster_1.jpgDowntime that affects a business could be either unplanned or planned. Unplanned downtimes may be caused by hardware or system failures, data/storage failures, human errors, computer viruses, software glitches, natural disasters and malicious acts. A business may also have to undergo planned downtimes because of scheduled maintenances such as system upgrades.

A company designing its business continuity strategy must create a Business Continuity Plan (BCP) that can effectively deal with each of these challenges. One of the underlying foundations of such a BCP is that it must offer protection of company data, since data is one of the most critical company assets – whether it is payroll/employee information, customer records, valuable research, financial records, historical operations information, etc. If a company loses its data, it cannot be replaced, and rebuilding/regenerating that data will likely be an extremely expensive, if not an impossible task, critically impacting the company’s ability to stay in business.

MAXIMUM PROTECTION

Maximum protection mode offers the highest level of data availability for the primary database, ensuring a comprehensive zero-data loss disaster recovery solution. When operating in maximum protection mode, redo records are synchronously disaster 2transmitted from the primary database to the standby database, and a transaction is not committed on the primary database until it has been confirmed that the transaction data is available on at least one standby database. This mode should be configured with at least two standby databases, thus offering double failure protection. If the last participating standby database becomes unavailable, processing stops on the primary database as well. This ensures that no transactions are lost when the primary database loses contact with all of its standby databases.

The maximum protection mode provides the highest degree of data protection at the standby site. However, it can potentially impact primary database performance. The impact on performance can be minimized by configuring a network with sufficient throughput for peak transaction load and with low round trip latency. Stock exchanges, currency exchanges, and financial institutions are examples of businesses that require maximum protection.

MAXIMUM AVAILABILITY

Maximum availability mode has the next highest level of data availability for the primary database, offering zero data loss and protecting against single component failures. As with the maximum protection mode, redo data are synchronously transmitted from the primary database to the standby database. The transaction is not complete on the primary database until it has been confirmed that the transaction data is available on the standby database. However, if a standby database becomes unavailable – e.g. because of network connectivity problems, processing continues on the primary database. The standby database may temporarily diverge from the primary database, but when it is available again, the databases will automatically synchronize with no data loss.

MAXIMUM PERFORMANCE

Maximum performance mode is the default protection mode. It offers slightly less primary data protection, but higher performance, than maximum availability mode. In this mode, as the primary database processes transactions, redo data is asynchronously shipped to the standby database. The commit operation of the primary database does not wait for the standby to acknowledge receipt before completing the write on the primary. If any standby destination becomes unavailable, processing continues on the primary database and there is little or no effect on performance.

In the case of a failure of the primary, there may be some transactions that were committed on the primary that had not completed shipping to the standby. If the network has sufficient throughput to keep up with peaks in redo traffic, the number of lost transactions should be very small or zero.

FAILOVER AND SWITCHOVER

Oracle Active Data Guard offers two easy-to-use methods to handle planned and unplanned outages of the production site. These methods are called switchover and failover, and they can be easily initiated using the Data Guard Manager GUI interface, the broker’s command line interface, or directly through SQL.

Failover is the operation of taking the primary database offline on one site and bringing one of the standby databases online as the new primary database. A failover operation can be invoked when an unplanned catastrophic failure occurs on the primary database, and there is no possibility of recovering the primary database in a timely manner.

The failover operation is invoked on the standby database that will assume the primary role. After a failover, the original primary database must be re-instantiated to assume its new role as a standby database.

The following figure shows the result of a failover operation from a primary database in San Francisco to a physical standby database in Boston.

disaster_3.jpg

 Failover to a Standby Database

The switchover option, on the other hand, is the planned role reversal of the primary and standby databases, to handle planned maintenance on the primary database. The main difference between a switchover operation and a failover operation is that switchover is performed when the primary database is still available, and it does not require re-instantiation of the database. This allows the primary database to assume the role of a standby database almost immediately. As a result, scheduled maintenance can be performed more easily and frequently. For example, switchover may be used to perform an upgrade on the primary site by switching over all of the database clients to the standby site as hardware is upgraded on the primary site.

Both failover and switchover operations guarantee zero data loss if Data Guard was being run in the maximum protection mode or maximum availability mode at the time of the failover or switchover.

Failover and switchover operations also work seamlessly when multiple standby databases are included in the configuration. For example, if multiple standby databases are configured and the primary database goes down, the administrator has the flexibility to choose one of the available standbys to become the primary. Data Guard fully automates the process of redirecting the other standby databases to use the new primary, including shipping any missing or incomplete redo data.