How to Ensure 24/7 High Availability For An Online Retail Business

November 4, 2022

How to Ensure 24/7 High Availability For An Online Retail Business

If an online business wants to sell products and meet the needs of customers on a 24-by-7 basis, it’s going to need a way to ensure that its infrastructure remains not only online but also operational and accessible.

That “operational and accessible” part is often overlooked, experts say. Cloud service providers can offer high availability (HA) configurations with a service level agreement (SLA), guaranteeing that at least one node in a multi-node cluster will be online 99.99% of the time. However, that SLA doesn’t ensure that the applications or data powering an online business will be operational or accessible.

The node can be online, but if that node cannot access the applications or the data supporting the business — because of human error, compatibility issues, the data needed was sitting on storage that’s gone offline or any of a dozen other reasons — then the business is effectively offline.

Online retailers that want to avoid this fate need to configure their infrastructures to ensure the uninterrupted availability of critical applications and data, and that requires more than a redundant hardware infrastructure.

They need to ensure that their active infrastructure can fail over to a standby infrastructure — located in a separate data center that will not be affected by whatever incident has caused the active infrastructure to go offline — and they need to ensure that that infrastructure can access all applications and data.

A failover cluster always involves at least two nodes. Optimally, each node is located in a physically separate data center for disaster protection. One node might be on-prem and the other in the cloud or both could be in geographically separated on-premises data centers. Or both could be in the cloud in different availability zones. Typically, one of the nodes in the FC operates as the primary node, and the other(s) act as secondary or standby nodes.

An FC relies on cluster failover management software that monitors the health of the nodes in the cluster. If the cluster management software detects that the primary node has gone offline, it orchestrates a failover of operations to one of the secondary nodes. That (formerly) secondary node then becomes the primary node actively supporting operations. The cluster management software should also perform related housekeeping tasks, such as updating routing tables, logical names, and the like to ensure that your operations can continue on the new primary infrastructure without interruption.

When the former primary node becomes operational again, the cluster management software should automatically recognize it as a secondary node in the cluster that can be called into service in case a second failover should become necessary. However, these features of a failover cluster don’t ensure access to data that had been used by the applications running on the old primary infrastructure.

There are several ways to meet that challenge.

First, some well-known database vendors, including Oracle, Microsoft, and SAP, offer services that can automatically replicate database content from one node to another.

In Microsoft SQL Server, for example, you’d configure the databases on each cluster node in an “Availability Group” (AG), and the AG feature in SQL Server would automatically replicate any updates to the database on the primary node to instances of the database sitting on each of the secondary nodes.

SAP and Oracle have similar kinds of data replication offerings. Still, each also suffers from one weakness that undercuts the utility of SQL Server’s AG functionality. These services replicate only the data associated with particular SAP, Oracle, and SQL Server databases. If you have any other critical data residing in storage, that data won’t be replicated by these application-specific services.

Also, depending on how many databases you want to replicate and to how many secondary nodes you may have to uplevel your database licenses to gain access to the replication services you seek.

Second, you can accomplish the same data replication goals through third-party tools that are fundamentally application agnostic. These tools create what is known as a SANless cluster, and they perform synchronous, block-level data replication from storage on one node to storage on another.

It doesn’t matter whether the data is associated with an Oracle database, a SQL Server database, a media file, or a text file. The SANless Clustering software isn’t paying attention to the content of a given data block. It’s only updating changes from one data block to another.

The advantages of a third-party approach are that you can use a SANless Clustering solution with any software infrastructure that might be supporting your online retail operations — Microsoft, Oracle, SAP, anyone. Moreover, because the SANless Clustering tools are application agnostic, there are no limitations on the number of databases you might want to replicate or the number of secondary nodes you might want to copy to.

The downside is that the software to support a SANless Clustering will involve yet another vendor and licensing software to provide replication functionality that may already be present in the database software you’re using.

SANless Clustering software is essentially a set-it-and-forget-it solution from a management standpoint, but it is one more solution that your system admins will need to understand. At the same time, if your need for data replication extends beyond the narrow confines of the replication systems built into the solutions you are already using, the assurance of HA that these third-party products provide is well worth the management burden of relying on them to support uninterrupted access to your online retail solution.

How to Ensure 24/7 High Availability For An Online Retail Business