Business managers and IT managers generally agree on the importance of achieving their organization’s goals, and on the strategies and methods to be used to achieve those goals – with one glaring and potentially calamitous exception: backups.
Yes, the “b” word once again arises as a point of contention between business units and IT departments. A recent survey by security firm Bluelock found that one out of three companies has experienced a “technology-related disruption” in the past two years, yet 11.5 percent of the firms surveyed have no disaster-recovery plan at all. Zip. Nada. Goose egg.
This despite the fact that 73 percent of business executives have “high confidence” in their company’s ability to recover lost systems in a timely manner. Only 45 percent of IT professionals share the assurance of the business managers, according to the survey, which eWeek’s Nathan Eddy reports on in an August 10, 2016, article.
Translating data-loss risk into disaster-recovery planning… and spending
The Bluelock survey highlights the continuing disconnect between the expectations of business units and the reality of data risks that IT managers face every day. While 80 percent of the IT managers surveyed said they place a “high value” on protecting their organizations from disruptions, only half of the business managers indicated the same.
Perhaps the most telling finding of the survey is that 60 percent of IT managers said it is extremely important to protect against technology-related disruptions, yet none of the VP and C-level executives participating in the survey placed disaster recovery in the “extremely important” category. Tell that to the top brass of companies now attempting to bounce back after the recent 100-year flood that struck Louisiana.
While most system outages have only a negligible impact on the organization’s long-term success, those lasting several days can push a company to the brink of bankruptcy. Source: Evolving Solutions
Considering the budget-stretching features of cloud services, it’s no surprise that IT departments would expect similar savings by moving their backup and disaster-recovery operations to the cloud. After all, offsite backup has been a key component of disaster planning for about as long as there have been IT departments.
Still, a little hesitancy on the part of IT managers about relying on third parties as their last line of defense is understandable. When it comes to backup, no single characteristic is more important than dependability – not cost, not timeliness, not simplicity. Whatever your disaster-recovery strategy, it has to be rock-solid. Slowly, cloud-based disaster recovery is amassing a track record for reliability.
Consider disaster recovery an investment, not insurance
You would think that a Fortune 100 company such as Delta Airlines would have a disaster-recovery plan in place that could weather any misfortune. Yet all it took to knock the airline’s data systems offline for more than a day was a single cascading power outage, as The Next Platform’s Nicole Hemsoth explains in an August 9, 2016, article.
Delta is far from the only large organization to be shut down by system-wide outage, nor is it the only company to find itself locked out of its data following a failure, with no real-time disaster-recovery or failover plan ready to be activated. Hemsoth states the matter plainly: Any company that can’t countenance even a moment of downtime has no choice but to implement geo-replication across multiple zones with high availability and resiliency built in.
Disaster-recovery expert Joe Rodden blames the widespread lack of disaster preparation on companies’ failure to conduct adequate testing of their restoration process on “real machines.” The testing isn’t being done, according to Rodden, because the process is so difficult. For example, a company may believe its systems are protected because it uses two separate network providers in different locations. But when they trace the links back far enough, they find that both networks originate from the same source, which creates an invisible single point of failure.
Another common mistake is to correlate “high availability” and “resiliency” with disaster recovery. A company with multiple redundant databases may believe it is protected against failure when in fact all of the databases are housed in a single data center.
Justifying the cost of real-time DR remains an uphill battle
The problem is, few companies can afford the most-reliable backup approach: duplicate clusters running the same workload in different locations, plus replication in a cloud-based data center for an added layer of protection. Also, convincing the people who control the purse strings to spend even a fraction of the potential loss from a system outage on disaster recovery is all but impossible, according to Rodden. Spend $2 million to prevent a network failure resulting in $20 million in losses? No, thanks. Most companies would rather accept the risk of an outage that may not happen rather than invest in disaster recovery.
Perhaps the best way to make the case for disaster-recovery spending is to point out that all companies are tech companies these days. No matter your industry, your organization runs on data, and when that data becomes inaccessible, your operation grinds to a halt. At today’s rapid pace of business, even a brief outage can have long-lasting negative repercussions, and an extended failure can truly be disastrous to the company’s bottom line.
A 2015 survey of IT and business professionals found that improving the efficiency and recovery time of their DR projects were the most important goals, while cost is – by far – the most important consideration. Source: TechTarget
Taking the risk rather than taking precautions becomes more difficult to justify considering the availability of such services as the Morpheus cloud application management platform, which automatically backs up every new database or app stack component. You decide the time, day, and frequency of the backups, as well as the destination targets for the backups (locally or in the cloud), without requiring that any custom cron jobs be written.
Morpheus makes it easy to define roles and access privileges for teams or individuals based on geographic location, server groups, individual apps, or databases. The service’s automatic logging and monitoring lets you identify and address trouble spots quickly and simply. Every time you provision a system, uptime monitoring is configured automatically, including proactive targeted alerts when performance or uptime issues arise.
Picture a data backup you can ‘snap’ like a photograph
Snapshot backups are gaining in popularity, but as an August 19, 2016, article on CIOReview explains, snapshots were not initially considered a viable backup option for VMware because of incompatibilities with application servers. Also, snapshots were perceived as lacking application-level support. These days, snapshots are much more application-aware and are able to retain such app information as state, required resources, and use patterns. In particular, redirect-on-write (ROW) snapshots are much more efficient than copy-on-write techniques because ROW minimizes the impact of the process on app performance.
Flat backups replicate backups to other secure locations to protect against losing a snapshot due to media corruption. For disaster recovery, the standard approach is to use three storage locations: two onsite and one offsite specifically for disaster recovery. The offsite system is configured to accept input from and to output to both onsite systems, thus it will likely be a higher-performance system than either of the onsite systems.
Since a snapshot can be taken at any time, administrators can determine the frequency of snapshots based on available storage, bandwidth, and processing power, as well as on the nature of the workload itself in terms of sensitivity, timeliness, and value. Snapshot metadata serves as a virtual database detailing your backup history.
The size of a snapshot will be affected by the update pattern: the greater the number of pages updated during the life of the snapshot, the larger the sparse file used to store the original pages. Source: Microsoft Developer Network
In some instances, companies may prefer to stick with traditional backup systems, particularly if they are concerned about security and longevity of backup media. They may also lack the bandwidth required for cloud-based backup and disaster recovery, or they may encounter memory mismatches that prevent reliable storage and recovery of their data and apps. However, even in these settings, the most effective approach is likely to combine the best features of onsite and cloud backup.
The unique backup and recovery needs of big-data operations
The increasing reliance of organizations on data analytics makes it imperative to have a plan in place to ensure that customers’ access to these critical analysis tools isn’t interrupted. In an August 26, 2016, vendor-sponsored article in NetworkWorld, Talena executive Jay Desai presents seven myths relating to big-data backup and recovery.
The first of the misconceptions described by Desai is that having multiple copies of data stored on widely distributed servers and racks is tantamount to a backup/recovery strategy. While this approach may protect against hardware failure, it leaves the organization vulnerable to user errors, accidental deletions, and data corruptions, among other problems.
While few companies can afford a comprehensive backup and recovery system for a petabyte of data, nearly all can afford to protect a subset of such a humongous data store representing the firm’s most critical data resources.
Similarly, a script approach to backup and recovery may be suitable when you’re dealing with relatively small amounts of data, but scripts are impractical for systems comprised of many terabytes of data. The scripts would have to be written for each platform separately (Hadoop, Cassandra, Couchbase, etc.), and they would have to be tested at scale and retested each time the platform was updated.
Someday perhaps restoring a company’s data systems following a disaster will be as easy as flipping a switch. Clearly, that day is a long way off. In the interim, the keepers of the company’s valuable data assets will be left to minimize the impact of outages by adapting traditional backup approaches to today’s virtual, infinitely scalable data environments.