Part 2 : Automatic Failover of a Unifi Cluster

We could see in the first part of my tutorial the implementation of a Unifi cluster via a MongoDB replication. The first part is available here ( https://wireless.fr/en/part-1-setting-up-a-unifi-server-with-high-availability.php ) for those who have not yet followed it!

This cluster, which is certainly functional, lacks automation: the failover between the two servers must be done manually, which is a rather rustic method! We will therefore work in this part 2 to automate the failover when Unifi or MongoDB crashes on one of the two servers, to be able to ensure high availability.

So we are back to the point where we left it:

  • Two dedicated servers, the Unifi1 server of IP X.X.X.X and the Unifi2 server of IP Y.Y.Y.Y.
  • A Floating IP Z.Z.Z.Z, which can be assigned to one of the two servers as required.

IPs are Public IPs, accessible from the Internet.
We also have three DNS records, on the domain of your choice. Here for the example, yourdomain.com :

  • unifi1.yourdomain.com pointing to X.X.X.X,
  • unifi2.yourdomain.com pointing to Y.Y.Y.Y,
  • unifi.yourdomain.com pointing to Z.Z.Z.Z,

The two dedicated servers are hosted on the Dedi-Online platform (French hosting platform), as well as the IP Failover. This tutorial will therefore use the Dedi-Online API to automatically move the IP Failover between servers in case of a problem. I assure you, this tutorial is quite usable on other platforms : you will only have to adapt the script seen below for your platform. The rest of the tutorial does not change !

Note that the tutorial also works with private addresses, on a local network, if your Unifi cluster is local. I would specify in the tutorial how to adapt it to your network infrastructure 🙂

1. FAILOVER IMPLEMENTATION with KEEPALIVED

To achieve this automation, we will use the Keepalived package (http://www.keepalived.org/) which allows us to build a cluster with VRRP protocol, often used in routers. Keepalived has many different functions : load-balancing, failover, master/slave cluster, etc… and will allow you to automate our Unifi cluster !

So we will start by installing Keepalived on both servers, it is available in the Ubuntu/Debian repositories (we are still working on Ubuntu 16.04 in this tutorial, like Part 1) :

root@unifi1:~#  apt-get install -y keepalived
root@unifi2:~#  apt-get install -y keepalived

Once installed, we will come and modify the Keepalived configuration files. First, on our primary server:

root@unifi1:~#  nano /etc/keepalived/keepalived.conf
vrrp_script chk_mongod {
 script "/usr/bin/pgrep mongod"
 interval 2
}
vrrp_script chk_unifi {
 script "/bin/systemctl status unifi"
 interval 2
}
vrrp_instance VI_1 {
 # The interface keepalived will manage
 interface enp0s20
 state BACKUP
 # How often to send out VRRP advertisements
 advert_int 2
 # The virtual router id number to assign the routers to
 virtual_router_id 51
 # The priority to assign to this device. This controls
 # who will become the MASTER and BACKUP for a given
 # VRRP instance (a lower number get’s less priority).
 priority 50
 authentication {
 auth_type PASS
 auth_pass password
 }
 unicast_src_ip X.X.X.X
 unicast_peer {
 Y.Y.Y.Y
 }
 track_script {
 chk_mongod
 chk_unifi
 }
 # The virtual IP addresses to float between nodes.
 virtual_ipaddress {
 Z.Z.Z.Z
 }
 notify_master "/etc/keepalived/failover.sh"
}

Several things to modify in this file, depending on your configuration:

  • interface enp0s20” defines here the name of your main interface. You can use the “ip a” command to see the interfaces of your server. Replace “enp0s20” with the name of your interface.
Example here on our primary server : the interface name is “enp0s20”.
  • unicast_src_ip X.X.X.X.X.X” defines the IP of the server on which you are located, here the primary server. Replace X.X.X.X by the IP of your primary server.
  • unicast_peer {Y.Y.Y.Y.Y.Y}” defines the IP of the secondary server it should contact. Replace Y.Y.Y.Y with the IP of your secondary server.
  • virtual_ipaddress { Z.Z.Z.Z }” defines the IP that must “float” or switch between servers. Replace Z.Z.Z.Z with your Floating IP.

We also could see that two scripts have been defined at the beginning of the configuration file : chk_mongod and chk_unifi . These two scripts will check the proper functioning of the Unifi service and the MongoDB service (defined in Part 1 of the tutorial), and return an error code or not. If Keepalived detects that the script is in default, it triggers a switch to the secondary server.

Once modified on the primary server, the same operations are performed on the secondary server:

root@unifi2:~#  nano /etc/keepalived/keepalived.conf
vrrp_script chk_mongod {
 script "/usr/bin/pgrep mongod"
 interval 2
}
vrrp_script chk_unifi {
 script "/bin/systemctl status unifi"
 interval 2
}
vrrp_instance VI_1 {
 # The interface keepalived will manage
 interface enp1s0
 state BACKUP
 # How often to send out VRRP advertisements
 advert_int 2
 # The virtual router id number to assign the routers to
 virtual_router_id 51
 # The priority to assign to this device. This controls
 # who will become the MASTER and BACKUP for a given
 # VRRP instance (a lower number get’s less priority).
 priority 50
 authentication {
 auth_type PASS
 auth_pass password
 }
 unicast_src_ip Y.Y.Y.Y
 unicast_peer {
 X.X.X.X
 }
 track_script {
 chk_mongod
 chk_unifi
 }
 # The virtual IP addresses to float between nodes.
 virtual_ipaddress {
 Z.Z.Z.Z
 }
 notify_master "/etc/keepalived/failover.sh"
}

The configuration is similar to the one on the primary server, except that unicast_src_ip and unicast_peer are reversed, since we are on the secondary server 😉

  • interface enp1s0” defines here the name of your main interface. You can use the “ip a” command to see the interfaces of your server. Replace “enp1s0” with the name of your interface
  • unicast_src_ip Y.Y.Y.Y” defines the IP of the server on which you are located, here the secondary server. Replace Y.Y.Y.Y with the IP of your secondary server.
  • unicast_peer {X.X.X.X}” defines the IP of the primary server it should contact. Replace X.X.X.X by the IP of your primary server.
  • virtual_ipaddress { Z.Z.Z.Z }” defines the IP that must “float” or switch between servers. Replace X.X.X.X by your Floating IP.

Note in both configurations the line : notify_master “/etc/keepalived/failover.sh”

The notify_master argument is called when Keepalived detects an error and must switch. It can only host one Bash script.
Ideally it’s an argument used to notify the administrator that an incident has occurred, by email for example. However, it will be used here to start the failover with Dedi-Online API.

Indeed, Keepalived will move our Floating IP on the system side between primary and secondary servers, which is the expected behavior. However, as is often the case with Failover on hosting platforms, it is also necessary to switch it on your platform web interface to make sure it switch from one server to another: otherwise, your IP will have switched but it will not respond correctly because the route will have remained on the old server.

If you are using a local configuration (with private addresses on a LAN), you can skip the second chapter of this page, after first removing the “notify_master” line from the Keepalived configuration. Don’t forget to restart the Keepalived services on both servers.

2. IMPLEMENTATION OF THE SWITCHOVER SCRIPT VIA THE API

This part will allow us to configure the “/etc/keepalived/failover.sh” script, which will switch our Floating IP on Dedi-Online interface. The script must be adapted according to your platform, refer to the available documentation. If you need help on this part, feel free to contact me or comment the article 🙂

As is often the case with Floating IP, an API is available on the hosting platform to integrate the switchover into scripts or applications. On Dedi-Online, the API is available on this link: https://console.online.net/fr/api/

On both servers, we will create and modify the script “/etc/keepalived/failover.sh”. It is a call to a PHP script that we will create after :

root@unifi1:~# nano /etc/keepalived/failover.sh
root@unifi2:~# nano /etc/keepalived/failover.sh
#!/usr/bin/env bash
php /etc/keepalived/failover.php

Before creating the PHP switchover script, we install PHP and PHP-Curl on both servers:

root@unifi1:~# apt-get install -y php php-curl
root@unifi2:~# apt-get install -y php php-curl

On the primary server, the script “/etc/keepalived/failover.php” is modified and the following lines are added:

root@unifi1:~# nano /etc/keepalived/failover.php
<?php

function call_online_api($token, $http_method, $endpoint, $get = array(), $post = array())
{
    if (!empty($get)) {
        $endpoint .= '?' . http_build_query($get);
    }

    $call = curl_init();
    curl_setopt($call, CURLOPT_URL, 'https://api.online.net/api/v1' . $endpoint);
    curl_setopt($call, CURLOPT_HTTPHEADER, array('Authorization: Bearer ' . $token, 'X-Pretty-JSON: 1'));
    curl_setopt($call, CURLOPT_RETURNTRANSFER, true);

    if ($http_method == 'POST') {
        curl_setopt($call, CURLOPT_POST, true);
        curl_setopt($call, CURLOPT_POSTFIELDS, http_build_query($post));
    }

    return curl_exec($call);
}

$token = "56484846a46468464641za6d465a468464";

//$user_info = call_online_api($token, 'GET', '/user/info');
//echo $user_info;

//$failovers = call_online_api($token, 'GET', '/server/failover');
//echo $failovers;

// edit a failover IP
 $post = array(
    'source' => 'Z.Z.Z.Z',
    'destination' => 'Y.Y.Y.Y',
);
$move_failover = call_online_api($token, 'POST', '/server/failover/edit', null, $post);
var_export($move_failover);

As a reminder, this script will be called when switching from the primary server to the secondary one. Consequently, the following information must be modified in the script:

  • “$token = “56484846a46468464641za6d465a468464” is the “token” that validates the connection to the Dedi-Online API. You can generate it directly via the following link: https://console.online.net/fr/api/access
  • “‘source’ =>’Z.Z.Z.Z'” defines the Floating IP that will be switched. Replace Z.Z.Z.Z with your Floating IP.
  • “‘destination’ =>’Y.Y.Y.Y'” defines the IP of the secondary server to which the Floating IP will be switched. Replace Y.Y.Y.Y with the IP of your secondary server.

On the secondary server, we also modify the script “/etc/keepalived/failover.php”:

root@unifi2:~# nano /etc/keepalived/failover.php
<?php

function call_online_api($token, $http_method, $endpoint, $get = array(), $post = array())
{
    if (!empty($get)) {
        $endpoint .= '?' . http_build_query($get);
    }

    $call = curl_init();
    curl_setopt($call, CURLOPT_URL, 'https://api.online.net/api/v1' . $endpoint);
    curl_setopt($call, CURLOPT_HTTPHEADER, array('Authorization: Bearer ' . $token, 'X-Pretty-JSON: 1'));
    curl_setopt($call, CURLOPT_RETURNTRANSFER, true);

    if ($http_method == 'POST') {
        curl_setopt($call, CURLOPT_POST, true);
        curl_setopt($call, CURLOPT_POSTFIELDS, http_build_query($post));
    }

    return curl_exec($call);
}

$token = "56484846a46468464641za6d465a468464";

//$user_info = call_online_api($token, 'GET', '/user/info');
//echo $user_info;

//$failovers = call_online_api($token, 'GET', '/server/failover');
//echo $failovers;

// edit a failover IP
 $post = array(
    'source' => 'Z.Z.Z.Z',
    'destination' => 'X.X.X.X',
);
$move_failover = call_online_api($token, 'POST', '/server/failover/edit', null, $post);
var_export($move_failover);

As a reminder, this script will be called when switching from the secondary server to the primary server. Consequently, the following information must be modified in the script:

  • “$token = “56484846a46468464641za6d465a468464” is the “token” that validates the connection to the Dedi-Online API. You can generate it directly via the following link: https://console.online.net/fr/api/access
  • ‘source’ =>’Z.Z.Z.Z’” defines the Floating IP that will be switched. Replace Z.Z.Z.Z with your Floating IP.
  • “‘destination’ =>’X.X.X.X'” defines the IP of the primary server to which the Floating IP will be switched. Replace X.X.X.X by the IP of your primary server.

On both servers, do not forget to make the scripts executable:

root@unifi1:~# chmod +x /etc/keepalived/failover.*
root@unifi2:~# chmod +x /etc/keepalived/failover.*

Finally, on both servers, the Keepalived service is restarted to take into account the changes made:

root@unifi1:~# service keepalived restart
root@unifi2:~# service keepalived restart

Congratulations, your cluster is now functional and fully automated !

3. CHECKS ON THE PROPER FUNCTIONING OF OUR CLUSTER

On both servers, you can then check that Keepalived is working well by using the “service keepalived status” command :

root@unifi1:~# service keepalived status
● keepalived.service - Keepalive Daemon (LVS and VRRP)
   Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2018-11-20 14:49:15 CET; 8s ago
  Process: 9919 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 9921 (keepalived)
   CGroup: /system.slice/keepalived.service
           ├─9921 /usr/sbin/keepalived
           ├─9922 /usr/sbin/keepalived
           └─9923 /usr/sbin/keepalived

Nov 20 14:49:15 unifi1 Keepalived_vrrp[9923]: VRRP_Instance(VI_1) Entering BACKUP STATE
Nov 20 14:49:15 unifi1 Keepalived_healthcheckers[9922]: Registering Kernel netlink reflector
Nov 20 14:49:15 unifi1 Keepalived_healthcheckers[9922]: Registering Kernel netlink command channel
Nov 20 14:49:15 unifi1 Keepalived_healthcheckers[9922]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 20 14:49:15 unifi1 Keepalived_healthcheckers[9922]: Using LinkWatch kernel netlink reflector...
Nov 20 14:49:15 unifi1 Keepalived_vrrp[9923]: VRRP_Script(chk_mongod) succeeded
Nov 20 14:49:15 unifi1 Keepalived_vrrp[9923]: VRRP_Script(chk_unifi) succeeded
Nov 20 14:49:21 unifi1 Keepalived_vrrp[9923]: VRRP_Instance(VI_1) Transition to MASTER STATE
Nov 20 14:49:23 unifi1 Keepalived_vrrp[9923]: VRRP_Instance(VI_1) Entering MASTER STATE
Nov 20 14:49:23 unifi1 Keepalived_vrrp[9923]: Opening script file /etc/keepalived/failover.sh 
root@unifi2:~# service keepalived status
● keepalived.service - Keepalive Daemon (LVS and VRRP)
   Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2018-11-20 14:49:21 CET; 5s ago
  Process: 11015 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 11017 (keepalived)
   CGroup: /system.slice/keepalived.service
           ├─11017 /usr/sbin/keepalived
           ├─11018 /usr/sbin/keepalived
           └─11019 /usr/sbin/keepalived

Nov 20 14:49:21 unifi2 Keepalived_vrrp[11019]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 20 14:49:21 unifi2 systemd[1]: Started Keepalive Daemon (LVS and VRRP).
Nov 20 14:49:21 unifi2 Keepalived_vrrp[11019]: Using LinkWatch kernel netlink reflector...
Nov 20 14:49:21 unifi2 Keepalived_vrrp[11019]: VRRP_Instance(VI_1) Entering BACKUP STATE
Nov 20 14:49:21 unifi2 Keepalived_healthcheckers[11018]: Registering Kernel netlink reflector
Nov 20 14:49:21 unifi2 Keepalived_healthcheckers[11018]: Registering Kernel netlink command channel
Nov 20 14:49:21 unifi2 Keepalived_healthcheckers[11018]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 20 14:49:21 unifi2 Keepalived_healthcheckers[11018]: Using LinkWatch kernel netlink reflector...
Nov 20 14:49:21 unifi2 Keepalived_vrrp[11019]: VRRP_Script(chk_unifi) succeeded
Nov 20 14:49:21 unifi2 Keepalived_vrrp[11019]: VRRP_Script(chk_mongod) succeeded

We see that the Unifi1 server is considered as “Master” and the Unifi2 server is considered as “Backup” state. We also see that on both servers, the chk_unifi and chk_mongod scripts are in “succeeded” state, which means that the Unifi and MongoDB services run correctly on both servers. So everything is fine!

Via the “ip a” command, you can also check which server is the IP Failover and check at the same time that the status in Keepalived matches: only the Master (“VRRP_Instance(VI_1) Entering MASTER STATE“) carries the cluster IP. In the above case, Unifi1 :

root@unifi1:~# ip a

1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s20: mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:07:cb:03:b3:32 brd ff:ff:ff:ff:ff:ff
inet X.X.X.X/24 brd X.X.X.X scope global enp0s20
valid_lft forever preferred_lft forever
inet Z.Z.Z.Z/32 scope global enp0s20
valid_lft forever preferred_lft forever
inet6 ff:ff:ff:ff:ff:ff/64 scope link
valid_lft forever preferred_lft forever

If you want to test that everything is fully functional, ping from your PC to the IP Failover Z.Z.Z.Z. Then trigger the switchover to the primary server by switching off the Unifi service (“service unifi stop“) or MongoDB (“service mongod stop“). If everything goes as planned, you will experience 30 seconds to 1 minute of downtime on your pings, while the Floating IP routing is changed. Then your secondary server takes over 🙂

Conclusion

Congratulations, you have a fully functional and automatic Unifi cluster! Do as many tests as you can to make sure that everything works as you want, by stopping the Keepalived, Unifi, MongoDB service, restarting the servers, etc….

Note that if you switch to the secondary server, there is no automatic return to the primary, you will have to do it manually. Keepalived’s configuration goes in this direction to prevent that when the primary server restarts / becomes functional again after switching, the IP Failover also returns. It will take manual action to switch back to the primary, while tests and verifications are carried out to ensure that the primary server is functional!

You can also go further by adding check scripts to Keepalived’s configuration, if you want to check other information in order to generate a switchover:)

If you have any ideas to improve the configuration of this tutorial, feel free to react via the comments or via our contact form!

See you soon, and the next part of our tutorial will be dedicated to the implementation of HTTPS for our beautiful new cluster! @ +

Post a Comment

avatar
  Subscribe  
Notify of