Ir al contenido principal

Building High Availavility



When running mission-critical services, you don’t want to depend on a single (virtual) machine to provide those services. Even when your systems would never crash or hang, from time to time you will need to do some maintenance and restart some services or even the whole machine. Fortunately, clusters were designed to overcome these problems and give the ability to reach a near 100% uptime for your services.
Introduction
There are a lot of different scenarios and types of clusters but here I will focus on a simple, 2 node, high availability cluster that is serving a website. The focus is on the availability and not on balancing the load over multiple nodes or improving performance. Of course this example can be expanded or customized to whatever your requirement would be.
To reach the service(s) offered by our simple cluster, we will create a virtual IP which represents the cluster nodes, regardless of how many there are. The client only needs to know our virtual IP and doesn’t have to bother for the “real” IP addresses of the nodes or which node is the active one.
In a stable situation, our cluster should look something like this:
cluster_normal
There is one owner of the virtual IP, in this case that is node 01. The owner of the virtual IP also provides the service for the cluster at that moment. A client that is trying to reach our website via 192.168.202.100 will be served the webpages from the webserver running on node 01. In the above situation, the second node is not doing anything besides waiting for node 01 to fail and take over. This scenario is called active-passive.
In case something happens to node 01, the system crashes, the node is no longer reachable or the webserver isn’t responding anymore, node 02 will become the owner of the virtual IP and start its webserver to provide the same services as were running on node 01:
cluster_failure
For the client, nothing changes since the virtual IP remains the same. The client doesn’t know that the first node is no longer reachable and sees the same website as he is used to (assuming that both the webserver on node 01 and node 02 server the same webpages).
When we would need to do some maintenance on one of the nodes, we could easily manually switch the virtual IP and server-owner, do our maintenance on one node, switch back to the first node and do our maintenance on the second node. Without downtime.
Building the cluster
To build this simple cluster, we need a few basic components:
§  Service which you want to be always available (webserver, mailserver, file-server,…)
§  Resource manager that can start and stop resources (like Pacemaker)
§  Messaging component which is responsible for communication and membership (like Corosync or Heartbeat)
§  Optionally: file synchronization which will keep filesystems equal at all cluster nodes (with DRDB or GlusterFS)
§  Optionally: Cluster manager to easily manange the cluster settings on all nodes (like PCS)
The example is based on CentOS 7 but should work without modifications on basically all el6 and el7 platforms and with some minor modifications on other Linux distributions as well.
The components we will use will be Apache (webserver)as our service, Pacemaker as resource manager, Corosync as messaging (Heartbeat is considered deprecate since CentOS 7) and PCS to manage our cluster easily.
In the examples given, pay attention to the host where the command is executed since that can be critical in getting things to work.
Preparation
Start with configuring both cluster nodes with a static IP, a nice hostname, make sure that they are in the same subnet and can reach each other by nodename. This seems to be a very logical thing but could easily be forgotten and cause problems later down the road.

[ciraxa@node01 ~]$ uname -n
 node01
 [cirax@node01 ~]$ ip a|grep "inet "
 inet 127.0.0.1/8 scope host lo
 inet 192.168.202.101/24 brd 192.168.202.255 scope global eno16777736


[cirax@node02 ~]$ uname -n
 node02
 [cirax@node02 ~]$ ip a|grep "inet "
 inet 127.0.0.1/8 scope host lo
 inet 192.168.202.102/24 brd 192.168.202.255 scope global eno16777736


[cirax@node01 ~]$ ping -c1 node02
 PING node02 (192.168.202.102) 56(84) bytes of data.
 64 bytes from node02 (192.168.202.102): icmp_seq=1 ttl=64 time=1.31 ms
--- node02 ping statistics ---
 1 packets transmitted, 1 received, 0% packet loss, time 0ms
 rtt min/avg/max/mdev = 1.311/1.311/1.311/0.000 ms


[cirax@node02 ~]$ ping -c1 node01
 PING node01 (192.168.202.101) 56(84) bytes of data.
 64 bytes from node01 (192.168.202.101): icmp_seq=1 ttl=64 time=0.640 ms
--- node01 ping statistics ---
packets transmitted, 1 received, 0% packet loss, time 0ms
 rtt min/avg/max/mdev = 0.640/0.640/0.640/0.000 ms
Firewall
Before we can take any actions for our cluster, we need to allow cluster traffic trough the firewall (if it’s active on any of the nodes). The details of these firewall rules can be found elsewhere. Just assume that this is what you have to open:
Open UDP-ports 5404 and 5405 for Corosync:
1
[cirax@node01 ~]$ sudo iptables -I INPUT -m state --state NEW -p udp -m multiport --dports 5404,5405 -j ACCEPT


[cirax@node02 ~]$ sudo iptables -I INPUT -m state --state NEW -p udp -m multiport --dports 5404,5405 -j ACCEPT
Open TCP-port 2224 for PCS

[cirax@node01 ~]$ sudo iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 2224 -j ACCEPT

1
[cirax@node02 ~]$ sudo iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 2224 -j ACCEPT
Allow IGMP-traffic
1
[cirax@node01 ~]$ sudo iptables -I INPUT -p igmp -j ACCEPT


[cirax@node02 ~]$ sudo iptables -I INPUT -p igmp -j ACCEPT
Allow multicast-traffic

[cirax@node01 ~]$ sudo iptables -I INPUT -m addrtype --dst-type MULTICAST -j ACCEPT

1
[cirax@node02 ~]$ sudo iptables -I INPUT -m addrtype --dst-type MULTICAST -j ACCEPT
Save the changes you made to iptables:
Centos 7
# SYSTEMCTL MASK FIREWALLD
# SYSTEMCTL DISABLE FIREWALLD

[cirax@node01 ~]$ sudo service iptables save
iptables: Saving firewall rules to /etc/sysconfig/iptables:[ OK ]


[cirax@node02 ~]$ sudo service iptables save
iptables: Saving firewall rules to /etc/sysconfig/iptables:[ OK ]
When testing the cluster, you could temporarily disable the firewall to be sure that blocked ports aren’t causing unexpected problems.
Installation
After setting up the basics, we need to install the packages for the components that we planned to use:

[cirax@node01 ~]$ sudo yum install corosync pcs pacemaker
 ...
 Complete!


[cirax@node02 ~]$ sudo yum install corosync pcs pacemaker
 ...
 Complete!
To manage the cluster nodes, we will use PCS. This allows us to have a single interface to manage all cluster nodes. By installing the necessary packages, Yum also created a user, hacluster, which can be used together with PCS to do the configuration of the cluster nodes. Before we can use PCS, we need to configure public key authentication or give the user a password on both nodes:

[cirax@node01 ~]$ sudo passwd hacluster
 Changing password for user hacluster.
 New password:
 Retype new password:
 passwd: all authentication tokens updated successfully.


[cirax@node02 ~]$ sudo passwd hacluster
 Changing password for user hacluster.
 New password:
 Retype new password:
 passwd: all authentication tokens updated successfully.
Next, start the pcsd service on both nodes:
1
[cirax@node01 ~]$ sudo systemctl start pcsd


[cirax@node02 ~]$ sudo systemctl start pcsd
Since we will configure all nodes from one point, we need to authenticate on all nodes before we are allowed to change the configuration. Use the previously configured hacluster user and password to do this.

[cirax@node01 ~]$ sudo pcs cluster auth node01 node02
 Username: hacluster
 Password:
 node01: Authorized
 node02: Authorized
From here, we can control the cluster by using PCS from node01. It’s no longer required to repeat all commands on both nodes (imagine you need to configure a 100-node cluster without automation).
Create the cluster and add nodes
We’ll start by adding both nodes to a cluster named cluster_web:

[cirax@node01 ~]$ sudo pcs cluster setup --name cluster_web node01 node02
 ...
 node01: Succeeded
 node02: Succeeded
The above command creates the cluster node configuration in /etc/corosync.conf. The syntax in that file is quite readable in case you would like to automate/script this.
After creating the cluster and adding nodes to it, we can start it. The cluster won’t do a lot yet since we didn’t configure any resources.

[cirax@node01 ~]$ sudo pcs cluster start --all
 node02: Starting Cluster...
 node01: Starting Cluster...
You could also start the pacemaker and corosync services on both nodes (as will happen at boot time) to accomplish this.
To check the status of the cluster after starting it:

[cirax@node01 ~]$ sudo pcs status cluster
 Cluster Status:
 Last updated: Fri Aug 22 11:07:32 2014
 Last change: Fri Aug 22 11:03:45 2014 via cibadmin on node01
 Stack: corosync
 Current DC: node01 (1) - partition with quorum
 Version: 1.1.10-32.el7_0-368c726
 2 Nodes configured
 1 Resources configured
To check the status of the nodes in the cluster:

[cirax@node01 ~]$ sudo pcs status nodes
 Pacemaker Nodes:
 Online: node01 node02
 Standby:
 Offline:


[cirax@node01 ~]$ sudo corosync-cmapctl | grep members
 runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
 runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.202.101)
 runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
 runtime.totem.pg.mrp.srp.members.1.status (str) = joined
 runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
 runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.202.102)
 runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
 runtime.totem.pg.mrp.srp.members.2.status (str) = joined


[cirax@node01 ~]$ sudo pcs status corosync
Membership information
 ----------------------
 Nodeid Votes Name
 1 1 node01 (local)
 2 1 node02
Cluster configuration
To check the configuration for errors, and there still are some:

[cirax@node01 ~]$ sudo crm_verify -L -V
 error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
 error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
 error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
 Errors found during check: config not valid
The above message tells us that there still is an error regarding STONITH (Shoot The Other Node In The Head), which is a mechanism to ensure that you don’t end up with two nodes that both think they are active and claim to be the service and virtual IP owner, also called a split brain situation. Since we have simple cluster, we’ll just disable the stonith option:
1
[cirax@node01 ~]$ sudo pcs property set stonith-enabled=false
While configuring the behavior of the cluster, we can also configure the quorum settings. The quorum describes the minimum number of nodes in the cluster that need to be active in order for the cluster to be available. This can be handy in a situation where a lot of nodes provide simultaneous computing power. When the number of available nodes is too low, it’s better to stop the cluster rather than deliver a non-working service. By default, the quorum is considered too low if the total number of nodes is smaller than twice the number of active nodes. For a 2 node cluster that means that both nodes need to be available in order for the cluster to be available. In our case this would completely destroy the purpose of the cluster.
To ignore a low quorum:

[cirax@node01 ~]$ sudo pcs property set no-quorum-policy=ignore
[cirax@node01 ~]$ sudo pcs property
Cluster Properties:
 cluster-infrastructure: corosync
 dc-version: 1.1.10-32.el7_0-368c726
 no-quorum-policy: ignore
 stonith-enabled: false
Virtual IP address
The next step is to actually let our cluster do something. We will add a virtual IP to our cluster. This virtual IP is the IP address that which will be contacted to reach the services (the webserver in our case). A virtual IP is a resource. To add the resource:
1
2
3
[cirax@node01 ~]$ sudo pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.202.100 cidr_netmask=32 op monitor interval=30s
[cirax@node01 ~]$ sudo pcs status resources
 virtual_ip (ocf::heartbeat:IPaddr2): Started
As you can see in the output of the second command, the resource is marked as started. So the new, virtual, IP address should be reachable.

[cirax@node01 ~]$ ping -c1 192.168.202.100
 PING 192.168.202.100 (192.168.202.100) 56(84) bytes of data.
 64 bytes from 192.168.202.100: icmp_seq=1 ttl=64 time=0.066 ms
--- 192.168.202.100 ping statistics ---
 1 packets transmitted, 1 received, 0% packet loss, time 0ms
 rtt min/avg/max/mdev = 0.066/0.066/0.066/0.000 ms
To see who is the current owner of the resource/virtual IP:

[cirax@node01 ~]$ sudo pcs status|grep virtual_ip
 virtual_ip (ocf::heartbeat:IPaddr2): Started node01
Apache webserver configuration
Once our virtual IP is up and running, we will install and configure the service which we want to make high-available on both nodes: Apache. To start, install Apache and configure a simple static webpage on both nodes that is different. This is just temporary to check the function of our cluster. Later the webpages on node 01 and node 02 should be synchronized in order to serve the same website regardless of which node is active.
Install Apache on both nodes:

[cirax@node01 ~]$ sudo yum install httpd
 ...
 Complete!


[cirax@node02 ~]$ sudo yum install httpd
 ...
 Complete!
Make sure that the firewall allows traffic trough TCP-port 80:

[cirax@node01 ~]$ sudo iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT
 [cirax@node01 ~]$ sudo service iptables save
 iptables: Saving firewall rules to /etc/sysconfig/iptables:[ OK ]


[cirax@node02 ~]$ sudo iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT
 [cirax@node02 ~]$ sudo service iptables save
 iptables: Saving firewall rules to /etc/sysconfig/iptables:[ OK ]
In order for the cluster to check if Apache is still active and responding on the active node, we need to create a small test mechanism. For that, we will add a status-page that will be regularly queried. The page won’t be available to the outside in order to avoid getting the status of the wrong node.
Create a file /etc/httpd/conf.d/serverstatus.conf with the following contents on both nodes:

[cirax@node0x ~]$ cat /etc/httpd/conf.d/serverstatus.conf
 Listen 127.0.0.1:80
 <Location /server-status>
 SetHandler server-status
 Order deny,allow
 Deny from all
 Allow from 127.0.0.1
 </Location>
Disable the current Listen-statement in the Apache configuration in order to avoid trying to listen multiple times on the same port.

[cirax@node01 ~]$ sudo sed -i 's/Listen/#Listen/' /etc/httpd/conf/httpd.conf


[cirax@node02 ~]$ sudo sed -i 's/Listen/#Listen/' /etc/httpd/conf/httpd.conf
Start Apache on both nodes and verify if the status page is working:

[cirax@node01 ~]$ sudo systemctl restart httpd
[cirax@node01 ~]$ wget http://127.0.0.1/server-status

[cirax@node02 ~]$ sudo systemctl restart httpd
[cirax@node02 ~]$ wget http://127.0.0.1/server-status
Put a simple webpage in the document-root of the Apache server that contains the node name in order to know which one of the nodes we reach. This is just temporary.

[cirax@node01 ~]$ cat /var/www/html/index.html
 <html>
 <h1>node01</h1>
 </html>


[cirax@node02 ~]$ cat /var/www/html/index.html
 <html>
 <h1>node02</h1>
 </html>
Let the cluster control Apache
Now we will stop the webserver on both nodes. From now on, the cluster is responsible for starting and stopping it. First we need to enable Apache to listen to the outside world again (remember, we disabled the Listen-statement in the default configuration). Since we want our website to be served on the virtual IP, we will configure Apache to listen on that IP address.
First stop Apache:

[cirax@node01 ~]$ sudo systemctl stop httpd


[cirax@node02 ~]$ sudo systemctl stop httpd
Then configure where to listen:
1
[cirax@node01 ~]$ echo "Listen 192.168.202.100:80"|sudo tee --append /etc/httpd/conf/httpd.conf


[cirax@node02 ~]$ echo "Listen 192.168.202.100:80"|sudo tee --append /etc/httpd/conf/httpd.conf
Now that Apache is ready to be controlled by our cluster, we’ll add a resource for the webserver. Remember that we only need to do this from one node since all nodes are configured by PCS:
1
[cirax@node01 ~]$ sudo pcs resource create webserver ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" op monitor interval=1min
By default, the cluster will try to balance the resources over the cluster. That means that the virtual IP, which is a resource, will be started on a different node than the webserver-resource. Starting the webserver on a node that isn’t the owner of the virtual IP will cause it to fail since we configured Apache to listen on the virtual IP. In order to make sure that the virtual IP and webserver always stay together, we can add a constraint:

[cirax@node01 ~]$ sudo pcs constraint colocation add webserver virtual_ip INFINITY
To avoid the situation where the webserver would start before the virtual IP is started or owned by a certain node, we need to add another constraint which determines the order of availability of both resources:

[cirax@node01 ~]$ sudo pcs constraint order virtual_ip then webserver
Adding virtual_ip webserver (kind: Mandatory) (Options: first-action=start then-action=start)
When both the cluster nodes are not equally powered machines and you would like the resources to be available on the most powerful machine, you can add another constraint for location:

[cirax@node01 ~]$ sudo pcs constraint location webserver prefers node01=50
To look at the configured constraints:

[cirax@node01 ~]$ sudo pcs constraint
Location Constraints:
 Resource: webserver
 Enabled on: node01 (score:50)
Ordering Constraints:
 start virtual_ip then start webserver
Colocation Constraints:
 webserver with virtual_ip
After configuring the cluster with the correct constraints, restart it and check the status:

[cirax@node01 ~]$ sudo pcs cluster stop --all && sudo pcs cluster start --all
node02: Stopping Cluster...
node01: Stopping Cluster...
node02: Starting Cluster...
node01: Starting Cluster...
[cirax@node01 ~]$ sudo pcs status
Cluster name: cluster_web
Last updated: Fri Aug 22 13:27:28 2014
Last change: Fri Aug 22 13:25:17 2014 via cibadmin on node01
Stack: corosync
Current DC: node02 (2) - partition with quorum
Version: 1.1.10-32.el7_0-368c726
2 Nodes configured
2 Resources configured
Online: [ node01 node02 ]
Full list of resources:
virtual_ip (ocf::heartbeat:IPaddr2): Started node01
webserver (ocf::heartbeat:apache): Started node01
PCSD Status:
node01: Online
node02: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/disabled
As you can see, the virtual IP and the webserver are both running on node01. If all goes well, you should be able to reach the website on the virtual IP address (192.168.202.100):
cluster_node01
If you want to test the failover, you can stop the cluster for node01 and see if the website is still available on the virtual IP:

[cirax@node01 ~]$ sudo pcs cluster stop node01
node01: Stopping Cluster...


[cirax@node2 ~]$ sudo pcs status
Cluster name: cluster_web
...
Online: [ node02 ]
OFFLINE: [ node01 ]
Full list of resources:
virtual_ip (ocf::heartbeat:IPaddr2): Started node02
webserver (ocf::heartbeat:apache): Started node02
A refresh of the same URL gives us the webpage served by node02. Since we created both small but different webpages, we can see where we eventually end:
cluster_node02
Enable the cluster-components to start up at boot
To start the cluster setup and the components that are related to it, you should simple enable the services to run when the machine is booting:

[cirax@node01 ~]$ sudo systemctl enable pcsd
ln -s '/usr/lib/systemd/system/pcsd.service' '/etc/systemd/system/multi-user.target.wants/pcsd.service'
[cirax@node01 ~]$ sudo systemctl enable corosync
ln -s '/usr/lib/systemd/system/corosync.service' '/etc/systemd/system/multi-user.target.wants/corosync.service'
[cirax@node01 ~]$ sudo systemctl enable pacemaker
ln -s '/usr/lib/systemd/system/pacemaker.service' '/etc/systemd/system/multi-user.target.wants/pacemaker.service'



[cirax@node02 ~]$ sudo systemctl enable pcsd
ln -s '/usr/lib/systemd/system/pcsd.service' '/etc/systemd/system/multi-user.target.wants/pcsd.service'
[cirax@node02 ~]$ sudo systemctl enable corosync
ln -s '/usr/lib/systemd/system/corosync.service' '/etc/systemd/system/multi-user.target.wants/corosync.service'
[cirax@node02 ~]$ sudo systemctl enable pacemaker
ln -s '/usr/lib/systemd/system/pacemaker.service' '/etc/systemd/system/multi-user.target.wants/pacemaker.service'
Unfortunately, after rebooting the system, the cluster is not starting and the following messages appear in /var/log/messages:

Nov 21 10:43:36 node01 corosync: Starting Corosync Cluster Engine (corosync): [FAILED]^M[  OK  ]
Nov 21 10:43:36 node01 systemd: corosync.service: control process exited, code=exited status=1
Nov 21 10:43:36 node01 systemd: Failed to start Corosync Cluster Engine.
Nov 21 10:43:36 node01 systemd: Dependency failed for Pacemaker High Availability Cluster Manager.
Nov 21 10:43:36 node01 systemd:
Nov 21 10:43:36 node01 systemd: Unit corosync.service entered failed state.
Apparently, this is a known bug which is described in Redhat bugzilla bug #1030583.
It seems that the interfaces are reporting that they are available to systemd, and the target network-online is reached, while they still need some time in order to be used.
A possible workaround (not so clean), is to delay the Corosync start for 10 seconds in order to be sure that the network interfaces are available. To do so, edit the systemd-service file for corosync: /usr/lib/systemd/system/corosync.service

[Unit]
Description=Corosync Cluster Engine
ConditionKernelCommandLine=!nocluster
Requires=network-online.target
After=network-online.target

[Service]
ExecStartPre=/usr/bin/sleep 10
ExecStart=/usr/share/corosync/corosync start
ExecStop=/usr/share/corosync/corosync stop
Type=forking

[Install]
WantedBy=multi-user.target
Line 8 was added to get the desired delay when starting Corosync.
After changing the service files (customized files should actually reside in /etc/systemd/system), reload the systemd daemon:
1
[cirax@node01 ~]$ sudo systemctl daemon-reload

1
[cirax@node02 ~]$ sudo systemctl daemon-reload