Saturday, January 9, 2010

install openais high-availability cluster

Openais is a free opensource solution to create cluster. It forked from the heartbeat project and gives more features than that latter project. With Openais, you can configure a many nodes high-availability or load-balancing cluster. To my view, it proves to be a great and efficient solution and can sustitute expensive softwares such as Sun Cluster, IBM HACMP or Redhat Cluster Suite.

This article aims to present a quick howto in order to install a 2 nodes high-availability cluster.
Clustered service here is just an apache web. Quite easy indeed but if you understand how to do it, you'll be able to handle very complicated clusters. Servers are RHEL 5 64 bits although I have tried it successfully on RHEL 4 64 bits. The only difference is that for a RHEL 4, you need to install a more recent version of python (I compiled 2.4.6 version and it worked great) and change the shabang of the crm python command. Hardware is xen domU virtual servers but I don't think it really matters, as long as you manage to have a shared quorum disk (with SAN for instance).
A last word about "hardware", you must ensure that you have multicast enabled on the switchs that rely the two nodes.

OK, so let first start this howto creating the quorum disk. If you don't use xen but real hardware, you just need to know that I am going to create a 512Ko disk that is visible on both node as /dev/xvdc. My xen domUs names are srxmtest7.example.com and srxmtest8.example.com. On my domOs, each domU has its own folder under /data/xm. So, here are the commands to execute on the dom0:
cd /data/xm/srxmtest7/
dd if=/dev/zero of=sharedDiskSbdClustertest7 bs=1k seek=512 count=1
Edit the domU configuration file and add the following line in the disk part :
"tap:aio:/data/xm/srxmtest7/sharedDiskSbdClustertest7,xvdc,w"
Then, we add the disk on the second node :
cd ../srxmtest8
ln -s /data/xm/srxmtest7/sharedDiskSbdClustertest7 .
And in the srxmtest8 configuration file we add :
tap:aio:/data/xm/srxmtest8/sharedDiskSbdClustertest7,xvdc,w
Then, start the two servers.

OK, now we've finished with xen configuration. Now we are going to work on srxmtest7 and srxmtest8. To distinguish, I will use
[srxmtest7]
for commands to be executed on srxmtest7,
[srxmtest8]
for commands to be executed on srxmtest8 and
[both]
for commands to be executed on both servers. You may use clusterssh software in order to do it (see my previous post on this blog).

Now, we are going to install the RPMs we need. Redhat does not support the whole openais/pacemaker solution so we can not use its repositories. So we have to get them from the pacemaker website. Take care not to mix those openais RPMs with Redhat openais RPMs because they are not compatible. Here are the RPMs you must install from that repository :
[both] openais pacemaker cluster-glue cluster-glue-libs heartbeat libopenais2 pacemaker-libs resource-agents
If you want to ensure that you won't install any Redhat RPM that could update those Suse RPMs, modify your yum configuration :
[both] echo -e "\n\nexclude=openais pacemaker cluster-glue cluster-glue-libs heartbeat libopenais2 pacemaker-libs resource-agents" >> /etc/yum.conf
Then, configure automatic startup :
[both] chkconfig heartbeat off
[both] chkconfig openais on
In order to avoid a deathmatch stonith problem (that could happen if you have a multicast network problem for instance), you may sustitue the /etc/init.d/openais script on each host by this one. The change I made detects if the server has rebooted more than 3 times since the start of the day, and if so, does not start openais.

We must now set up openais basic configuration :
[both] cd /etc/ais
[srxmtest7] ais-keygen
[srxmtest7] scp authkey root@srxmtest8:/etc/ais
[both] vim openais.conf
In the configuration file, we will change the bindnetaddr parameter. My servers srxmtest7 and srxmtest8 have the following IPs : 172.18.6.52/24 172.18.6.53/24. So bindnetaddr will be 172.18.6.0. You must also ensure that the couple mcastaddr/mcastport is unique among whole your clusters. What I usually do is adding to the multicast address 226.94.0.0 the last two bytes of my first node IP. So mcastaddr would be 226.94.6.52. Another thing I like to change is modifying the logging facility to be unique. For instance, I would write "local4". The purpose is to have all the openais logs in a separated file :
[both] vim /etc/syslog.conf
Change the common line to :
*.info;mail.none;authpriv.none;local4.none /var/log/messages
And add the following line :
local4.* /var/log/openais.log
Last thing we have to do is configuring the changelog definition :
[both] mkdir /var/log/openais
[both] vim /etc/logrotate.d/openais
Write the file like this :
/var/log/openais.log {
rotate 28
daily
compress
olddir openais
postrotate
/bin/kill -HUP `cat /var/run/syslogd.pid 2> /dev/null` 2> /dev/null || true
endscript
}
We can now restart the syslog service :
[both] service syslog restart
We also want to regularly delete the PEngine temporary files. Edit the /etc/cron.d/openais file :
# erase PEngine files every saturday
33 23 0 0 6 /usr/bin/find /var/lib/pengine -daystart -type f -ctime +7 -exec /bin/rm -f {} \;

Basic openais configuration has been done so we may start the cluster service :
[srxmtest7] service openais start
You can control the how the nodes are joining the cluster with the crm_mon command :
[srxmtest7] crm_mon
(use Ctrl-C to quit). After a few seconds, start the cluster on the other node :
[srxmtest8] service openais start
Once the second node has joined the cluster, you should see something like that :
[root@srxmtest7 ~]# crm_mon -1
============
Last updated: Mon Jan 11 08:58:28 2010
Stack: openais
Current DC: srxmtest7.example.com - partition with quorum
Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
2 Nodes configured, 2 expected votes
0 Resource configured.
============

Online: [ srxmtest7.example.com srxmtest8.example.com ]

Cluster seems to work. So the next step is to secure fencing behavior with the quorum disk. This is done with the stonith software.
[both] cat > /etc/sysconfig/sbd <<END
SBD_DEVICE="/dev/xvdc"
SBD_OPTS="-W"
END
The sbd cluster resource script is quite buggy so you should change the /usr/lib64/stonith/plugins/external/sbd file by this one. The change I made was to fix the status function. Then, me must initialize the quorum disk :
[srxmtest7] sbd -d /dev/xvdc create
[srxmtest7] sbd -d /dev/xvdc allocate srxmtest7.example.com
[srxmtest7] sbd -d /dev/xvdc allocate srxmtest8.example.com
Now we're ready to configure our first openais service. This is done with the crm configure primitive command :
[srxmtest7] crm configure primitive sbdFencing stonith::external/sbd params sbd_device="/dev/xvdc"
And to have this service executed on both node :
[srxmtest7] crm configure clone fencing sbdFencing
Quorum behavior is quite special for a 2 nodes cluster and we don't want the cluster to crash if one node is down :
[srxmtest7] crm configure property no-quorum-policy=ignore
To get stonith fully operational, we must restart the openais service :
[both] service openais restart
We can check that stonith is working with the following commands :
[root@srxmtest7 ~]# sbd -d /dev/xvdc list
0 srxmtest7.example.com clear
1 srxmtest8.example.com clear
[root@srxmtest7 ~]# pgrep -lf sbd
27527 /usr/sbin/sbd -d /dev/xvdc -D -W watch

Now, let configure the cluster IP service. For all your clients and remote application, this is the only relevant IP, the one where all the clustered services should be bound and the one that should never be down.
[both] crm configure primitive clusterIP ocf:heartbeat:IPaddr2 params ip=172.18.6.54 cidr_netmask=24 op monitor interval=10s
The IP should have the same network definition as one of your real IPs definitions. Don't be surprised if you can't see the new IP definition with the ifconfig -a command. If you want to see it, you may use the crm configure show command :
[root@srxmtest7 ~]# crm configure show
node srxmtest7.example.com
node srxmtest8.example.com
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="172.18.6.54" cidr_netmask="24" \
op monitor interval="10s" \
meta target-role="Started"
primitive sbdFencing stonith:external/sbd \
params sbd_device="/dev/xvdc"
clone Fencing sbdFencing \
meta target-role="Started"
property $id="cib-bootstrap-options" \
dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="true" \
no-quorum-policy="ignore" \
last-lrm-refresh="1258646041"
And to know which node holds the cluster IP, you can use the crm_mon command :
[root@srxmtest7 ~]# crm_mon -1
...
Online: [ srxmtest7.example.com srxmtest8.example.com ]

Clone Set: Fencing
Started: [ srxmtest7.example.com srxmtest8.example.com ]
ClusterIP (ocf::heartbeat:IPaddr2): Started srxmtest7.example.com
To give more stability to your services, I would recommend configure stickiness :
crm configure property default-resource-stickiness=100
And in a basic configuration, you just want your cluster to be event driven so you don't need to check properties based on time :
[srxmtest7] crm_attribute -n cluster-recheck-interval -v 0

The second service we must install is the apache one. We use a classical redhat installation :
[both] yum install httpd
To enable the openais monitoring of the service, we must create the /var/www/html/index.html. Just edit the file and write whatever you want (the hostname of the server for instance). Then, you can register the service in openais :
[srxmtest7] crm configure primitive apache ocf:heartbeat:apache params configfile=/etc/httpd/conf/httpd.conf op monitor interval=10s

Now we want to get sure that the web service is on the same node as the cluster IP service :
[srxmtest7] crm configure colocation website-with-ip INFINITY: apache clusterIP
What is more, we want Apache service to start after the cluster IP service. Because of the colocation command we just passed before, this is not really necessary but it may be important to stop the services in the proper order ( especially if you have a LVM or filesystem service instead of a mere IP service!) :
[srxmtest7] crm configure order apache-after-ip mandatory: clusterIP apache
Et voila! Cluster configuration is just finished and just works fine!

To better understand what you have just done and to improve your skills in openais/pacemaker, you may refer to the following links :
Another how to install a Apache cluster, deeper explained, with the DRBD and OCFS2 resources.
Pacemaker explained, this PDF will tell you all you need to know to administrate your cluster.
If you want to use your cluster with a software not supported by pacemaker, you will need to write a resource script :
This website, although no longer maintained, explains the basis to write a script.
But to really understand what you need to write, the opencfg draft is the reference.
Finally, you may also need to have a look at the mailing lists archives : the linux-ha mailing list and the pacemaker mailing list.