What is Failover?

Article by Mark Muller (8,408 pts ) , published Sep 29, 2009

Failover is an enterprise style fault-tolerant IT strategy for high-availability clusters consisting of computers equipped with redundant parts. Here’s all you want to know about failover and failback.

A failover is the automatic switch from a primary IT system to a standby system should the primary system fail to deliver its service. Fault-tolerance including failover technology is used in systems designed for high availability such as for instance the computing systems responsible for making retail payments with credit cards or controlling ATM withdrawals. Assumed a hypothetical system for managing ATMs consists of computer A and computer B, and further assumed that the primary computer A fails then the system should fail over (automatically switch) to allow transactions via computer B whilst clients withdrawing money being unaware of a technical outage.

One way to think of failover is the concept of master (primary device, software or solution) and slave (standby system) with the slave periodically checking if the master is alive by means of a heartbeat signal. If the master does not respond for a threshold time it is supposed dead and the roles between the master and the slave are then instantaneously reversed without human intervention. Operators and technician should consequently be alerted via consoles and text messages to cellular phones in order to investigate and fix the problem which led to the failover. In case the standby system is not as capable as the primary system IT staff will aim at failback as soon as possible. In cases of a cluster consisting of identical hardware this is not needed as the roles are hardware-independent.

Computers in a high availability cluster have their important parts such as fans for cooling at least twofold (redundancy) so that any computer can continue providing its service without failing over should a redundant piece of hardware fail. The causes of failovers are sometimes not hardware failures but abnormal logical conditions which make the primary system hang, stall or terminate abnormally. So as not have the standby system being idle most of the time the failover concept can be paired with load-balancing which is helpful for reducing outages caused by heavy load. Please note that failover technology and cluster-aware software are enterprise style computing with prices tags for failover, load-balancing and cluster technology considerably higher than for stand-alone systems.

 
Subscribe to Computer Security
RSS
Get free weekly updates, directly to your inbox.
Browse Computer Security