Disaster recovery: ZeRT0 vision for a virtual world

Several times in my career I had to face a system issue that needed a “time machine” to solve. Usually a restore from backup was enough, whereas the backup was scheduled. In other situations, if and when the company I worked for could afford a DR traditional solution, I was lucky to rebuild the system from its mirror. Many times I had to rebuild the whole system, loosing data, usually not so critical to be backed up.

restore-red-key-keyboard-595x335.jpg

Then virtualization came. Snapshots were a godsend, the main thing was to schedule them, no matter if they could have an impact on performances, the pros were far superior than cons.

After some time, when virtualization became important and not only an exception,  snapshots started to be annoying. In the last years, having a backup was no more enough for a critical system in production – too many hours could be lost. And snapshots weren’t anymore adequate because or they were catched frequently – but with performances impact – or occasionally, between a backup and another – which in this case lots of data could be lost.

At 2012 VMworld I visited ZeRT0‘s booth. I was amazed by their solution, because it was almost syncronous, no snapshot based, no need of an alternative datacenter (either hot or warm) but, overall, hypervisor based.

master-of-disaster.jpg

Through these years the application evolved ’till covering, today, even hypervisor-hybrid environment. So nomore just HW agnostic, but even Hypervisor agnostic.

We latey upgraded to the last version, 4.0, it’s a major release.

Before reviewing the last version, I’ve to say that the upgrade process took very few time and, overall, so few operations – it was a real easy walk even in a complex environment like ours.

First of all, the architecture. We’ll talk of VRA (Virtual Replication Appliance), ZVM (Zert0 Virtual Manager), ZCM (Zert0 Cloud Manager) and VPG (Virtual Protection Group), just to name some acronyms.

At the very low layer are the VRAs: little linux appliances rooted one for a hypervisor that take care of every single VM, all its pointers, all its network data, where its storage is, etc.

Just upper lay the ZVM: it manages all the VRAs, transposes user’s setting via an intuitive (and nice) GUI, and connect to the recovery site pair ZVM.

zerto.png

In the case of Cloud providers, or simply in environments where more than a ZVM is needed, you’ll have to install a ZCM: an orchestrator for the ZVMs, but, especially, a tool that offers cloud multitenancy, exposing a Self Service interface for the end user (ZSSP) and that deals with Connector installation – a proxy that acts as a ZVM pair for the customer, but that allows a complete privacy showing only its organization and separating all the other customers’ ones.

zssp.png

Some of the benefit experienced in these years, and not only “marketing words”:

  • RTO of minutes (since the name of ZeRT0 – Zero RTO), RPO of secons
  • Native multitenant architecture
  • HW agnostic
  • Full vCloud Director compatibility and integration
  • Zero impact on production VMs
  • Full automation of failover, failback and test
  • Partial recovery

The main difference I fond with the other solutions in the market was the RTO/RPO timing (very low to call it not only DR but BC too), and no usage of snapshots thanks to a journaling file where all the modifications written on a protected vDisk are reported to the replicated one and consolidated one by one after the chosen history time.

Compared to a traditional solution, well, in this case the difference is even wider, from the possibility to recover a single VM and not all the storage volume, no need of mirrored DC, and the consequent cost.

Now, what this solution misses from my point of view.

  • From version 3.5 ZeRT0 implemented the extended recovery: it’s a middle way between DR and backup. I’d like to reduce the number of vendors for many reasons, so if backup could be covered by ZeRT0 I’ll be satisfied. But today we can’t use it for backup for 2 main reasons: it doesn’t dedupe, and restore from backup isn’t integrated in vCloud Director like, instead, recovery from DR is. So customer won’t be autonomous.
  • From a CSP point of view, another missing service is the possibility to connect to customers residing on other CSP.
  • It is a only-virtual solution. But this is not a limit for our needs.
  • It does not replicate IDE disks – some virtual network devices are based on IDE disks – Watchguard is one of them.
  • It doesn’t replicate vShield Edge rules, either org ones and vApp ones. The first case could be solved replicating by hand that rules before any disaster should occur, but the vApp ones cannot since the vApp is created only when a failover starts.

My opinion, anyway, is that this is a real enterprise solution, where the benefits are far more than the disadvantages. A particular mention goes to the Support guys. They’re highly professional, responsibles at any hour of the day (and the night) and they won’t leave you, if a WebEx is needed, ’till the issue is solved, no sentences like “I’ll call you after my senior engineer will evaluate the issue”, that senior engineer will be ready online for your issue.

Be aware, anyway, that ZeRT0 is a great tool, but it’s a TOOL, not a DR Plan. You need a plan to decide in an objective way when to declare a disaster and which are the procedures to be taken. Inside the plan you’ll insert ZeRT0 as part of it. Don’t make this mistake as some of our customers did.

I’m sure I forgot some important feature, I hope ZeRT0 will forgive me. Below the comments section will welcome any suggestion and correction.

Leave a Reply

Your email address will not be published. Required fields are marked *