Friday, July 23, 2010

A scheduler for Netapp snapmirrors

Snapmirroring is a good way to protect your datas on your Netapp filer, replicating every night for instance all or part of your information of your filer in production to a remote hot spare filer. If you want to use a asynchronous snapmirror policy, you will have to rely on a sort of cron table to launch your replications.

But to my view, such cron feature does not prove to be a great idea if you want to avoid that too many snapmirrors occur at the same time (and thus slowing your network and increasing the latency of your filer). Of course, you can configure the cron in order to give sufficient time for some replications to end before launching new ones. This yet turns out to be a bit complicated to manage if you have a lot of volumes or qtrees to replicate. What is more, doing this way, you may waste part of your night time and replications may happened during business hours, impacting your responsiveness during peak hours.

To cope with such problems, I decided to create a sort of scheduler that takes a list of volumes/qtrees to replicate and ensure that only a certain amount of replications (and thus bandwidth) occur at the same time. When a slot is freed, it launches a new one. If you know Veritas Netbackup, you'll see where I got my inspiration.

Script is written in perl and you'll need the Netapp ONTAPI APIs (well, in fact just the perl part). You can dowload it here.

Install the scheduler :
mkdir /opt/netapp-manageability-sdk
tar xvfz /tmp/snapmirrorScheduler.tar.gz -C /opt/netapp-manageability-sdk
In the /opt/netapp-manageability-sdk/prod directory, install the ONTAPI API (just the perl part, don't bother with C, java etc). Then, you may need to patch a file in the API : /opt/netapp-manageability-sdk/prod/lib/perl/NetApp/NaServer.pm. The invoke function of my API version did not manage the case where it received as an argument a reference to an object instead of a scalar. So here are the changes you may need to apply :
diff -r1.1 NaServer.pm
762c762,768
< $xi->child_add(new NaElement($key, $value));
---
> unless( ref($value)){ # we have a scalar variable
> $xi->child_add(new NaElement($key, $value));
> } else { # we have a reference object variable : must be treated in another way (bug from Netapp)
> my $newElement = new NaElement($key);
> $newElement->child_add( $value);
> $xi->child_add( $newElement);
> }

Now, let's change the root password of your filer. Look for the file lib/MyNetapp.pm and replace the TOCHANGE string by your password. Snapmirrors are launched on the servers that receive the replication so it should be the password of those filers. I considered that the filers had the same password, if your configuration differs, you'll have to change a bit the constructor block.

Finally, one should edit the configuration files. The tarball has got two configuration files : etc/scheduleSnapMirror_nas01a-ibm.conf and etc/scheduleSnapMirror_nas01b-ibm.conf. The files you'll have to create must be named the following way etc/scheduleSnapMirror_<filer-hostname>.conf. There must be a configuration file for each receiver filer. On each line, you explain the replication you want to launch. Replications at the top of the file will be launched first and the one at the bottom last. You can replicate just qtrees or whole volumes or a mix of them.
To fully understand the syntax of the configuration file, you must know that in my case, I have 2 principal filers : nas1a and nas1b and I have 2 filers on my backup data center : nas01a-ibm and nas01b-ibm. nas1a replicates on nas01a-ibm and nas1b replicates on nas01b-ibm. What is more, if I have a volume myVol13 on nas1a, replicated volumen on nas01a-ibm will be named R_myVol13. This allows me to make shorter my replication lines (if you have different conventions, you may hack a bit the code to do it as explained lated).

And now, it should work! Just execute the script :
/opt/netapp-manageability-sdk/bin/scheduleSnapmirror.pl --verbose nas01a-ibm
And you should see the first replications beginning.
If everything's OK, you can set such command in cron to execute it every night for instance :

mkdir /var/log/netapp
cat > /etc/cron.d/netapp <<-EOF
# schedule snapmirror execution on Netapp in order not to launch to many replications at the same time
03 22 * * * root /usr/local/bin/scheduleSnapmirror.sh nas01a-ibm
05 22 * * * root /usr/local/bin/scheduleSnapmirror.sh nas01b-ibm
EOF

Some explanations about the script :
Let's explain a bit the script scheduleSnapmirror.pl.
One function you might want to change is maxTransfersAllowed. It defines how many transfers you allow at the same time. I defined a business policy (lowTransfer variable) and an out of business policy (fullTransfer variable).
If you use the verbose mode, you'll see much information of your transfers in /tmp/.scheduleSnapMirror_<filer>-ibm.debug file. Every 5 minutes, the script will write on that file what transfers are executing and how many bytes have been transmitted.
You can also see in the script many eval constructions in order to catch errors. I did that because it sometimes happens that I have XML serialization problems that I did not really understand. What is more, you don't want your replication policy to stop if you loose network connectivity during a few seconds.
As default, the transfer rate (maxRate) is 8704 kb/s ; you'll may want to change it.
Last thing that can be interesting to change is in the launchEasyTransfer function. There, you'll see the code to establish a link between origin volume (filerA:volume) and destination volume (filerA-ibm:R_volume). You may need to adapt it according to your environment.

Sunday, June 6, 2010

Monitor xen domUs from dom0 with xenstore

The easier solution to monitor your domU servers may be to consider them as normal servers and install on them the Nagios NRPE plugin or any of your favorite monitoring software. But this means having a new daemon running on every of your domU, listening on a new port and new firewall rules to manage...
To avoid such burden, it is possible to monitor the virtual guests directly from the physical servers, using the xenstore database to pass information from the domUs to the host. The idea is the following one : every 5 minutes, the domUs execute a script that monitors what the system administrator thinks is important. The results is written on xenstore. Then, the dom0 collects all that information. Of course, if a domU fails to execute its script, the dom0 will detect it. Finally, there is a script on another server that does an snmpget to all the dom0s in order to gather all the information of your xen domUs.

The solution I wrote uses such idea, is written in python and works on RHEL 5, although it should work on any xen infrastructure. You can dowload it here.

Install the monitoring solution :
The first thing we need to do is to create the xenstore location on our dom0s. I chose to use /tool/myManagement/domUunsecureFiles/basicMonitoring. /tool/myManagement/ is the xenstore directory where I decided to store all the non standard information. Then I created the domUunsecureFiles to explain that every subdirectory in it has loose permissions : every domU may write in basicMonitoring. And as there is no sticky bit in xenstore, that means that every domU may delete or modify information another domU wrote before. There is a way with xenstore to enforce permissions but it is more complicated. What is more, I trust the people that administrate the domUs so I don't feel the need to go deep into those security concerns.

Let's add the commands to rc.local and execute them :
cat >> /etc/rc.d/rc.local <<-EOF

# write useful xenstore keys
xenstore-write /tool/myManagement/domUunsecureFiles/basicMonitoring/toDelete nothing
xenstore-rm /tool/myManagement/domUunsecureFiles/basicMonitoring/toDelete
xenstore-chmod /tool/myManagement/domUunsecureFiles/basicMonitoring w

xenstore-write /tool/myManagement/privateInfo/migration/toDelete nothing
xenstore-rm /tool/myManagement/privateInfo/migration/toDelete
EOF

. /etc/rc.d/rc.local

Then, copy the scripts analyseBasicXenstoreMonitoring.py, listXenCluster.sh and sendSnmpBasicXenstoreMonitoring.sh in the /usr/local/bin directory of your dom0s.

We want to execute the monitoring script every 5 minutes :
cat > /etc/cron.d/basicXenMonitoring <<-EOF
# run xen basic monitoring every 5 minutes, one minute after the domUs
1,6,11,16,21,26,31,36,41,46,51,56 * * * * root /usr/local/bin/analyseBasicXenstoreMonitoring.py
EOF

Then, we must configure the snmpd daemon in each dom0. In /etc/snmp/snmpd.conf we add the following lines :
extend basicXenstoreMonitoring /usr/local/bin/sendSnmpBasicXenstoreMonitoring.sh
extend listXenCluster /usr/local/bin/listXenCluster.sh
And we restart the service :
/etc/init.d/snmpd restart


Secondly, let's focus on your domUs. For the script to work, you need to have installed the xenstore-write and xenstore-rm commands. In Debian, it is quite easy because there is small package for it and all you have to do is :
apt-get install xenstore-utils
In Redhat, it is more complicated because those commands belong to the xen package. So you will have to install them on every domU and also copy the /usr/lib64/libxenstore.so.3.0 library.

Set the script ping.py in /usr/local/lib.

Then, copy the script basicXenstoreMonitoring.py on the /usr/local/bin directory. You will need to change the gateway definitions in the checkNetworkConnectivity function (see explanations part below).

As for the dom0s, we create the cron file :
cat >/etc/cron.d/basicXenMonitoring <<-EOF
*/5 * * * * root /usr/local/bin/basicXenstoreMonitoring.py
EOF

Finally, on your monitoring server (or any server you judge valuable to collect the monitoring information), ensure that the /usr/bin/snmpget binary is installed. Copy the getBasicXenMonitoring.py script in /usr/local/bin. And change the snmpget community (search at the end of the script for the toChange token).

Use it :
On the latter server, just use the command :
getBasicXenMonitoring.py
This command is made for "humans" : it separates errors from warnings and will list the domUs problems grouped by dom0s.
In order to make it easier for your monitoring system to parse the results, it may proves more interesting to use the following command :
getBasicXenMonitoring.py raw

Some explanations about the scripts :

basicXenstoreMonitoring.py :
We just check 4 resources of the domU : CPU, swapping, disk and network connectivity. Those are the basic resources I consider we should monitor on every server (hence the name of the scripts...). Anyway, if you believe that there are other resources that should be monitored on every server, I think the script makes is quite easy to code it.

I tried to use the /proc filesystem to get information of the servers instead of using other commands or reading some configuration files (like /etc/fstab for instance). This is because I prefer getting the information from memory than accessing the disk.

In the checkDiskSpace function, you can see that I remove all the filesystems that contain the .snapshots token. This is due to the fact that for every Netapp NFS mount, we can mount lot of snapshots filesystems that don't offer valuable space information.

The most difficult function to understand may be checkNetworkConnectivity. The basic idea is to have a listing of all network definitions of my interfaces, and then to try to ping the gateways of those networks. In Xen, you can fix the MAC address of each interface. And in my case, I have a direct translation between the IP and the MAC (I consider it a good way to avoid MAC conflicts...). I build the MAC this way : MAC = 00:16 + <IP in hexa>. That is why, if I have a NIC whose MAC address begins with 00:16:ac:13:0a, I know that the corresponding IP belongs to the 172.19.10.0/24 network and that the IP of the gateway I should try to ping is 172.19.10.254.
Surely you will have to adapt this part.

analyseBasicXenstoreMonitoring.py :
This script not only monitors the domUs but also the dom0. If you don't like that feature, just comment the call of the checkDom0 at the end of the script.
That function only monitors the memory and the disk space. I chose not to include the network connectivity check because it would be a bit complicated with my bridges configuration. What is more, if there is a problem in one of my VLAN/bridges, surely a domU will detect it!
Nor do I check the CPU with this function. This is just because I have another script to compute the Xen CPU (doing a xm list and looking at the values of the last command) to do that.

getBasicXenMonitoring.py :
This script does an snmpget to all the dom0s in order to get the monitoring results. That means that it should know the list of all the dom0s. I could have written that list in the script but I did not think it would be easy to administrate (you must change the script every time there is another dom0). So I preferred using another solution. On an NFS share mounted on every dom0, I have a description of all my hosts :
[root@srxen09 xenAdmin]# ls /data/config/auto/*
/data/config/auto/srxen01:
servidor1.cfg srxmwebmail.cfg www01.cfg www02.cfg wwwp02.cfg

/data/config/auto/srxen02:
srxmbf.cfg srxmcca-pre02.cfg srxmcolumbusd.cfg srxmldap-pre.cfg srxmlinuxweb01.cfg srxmtest2.cfg srxmtest.cfg srxmwasd02.cfg

/data/config/auto/srxen03:
servidor2.cfg www03.cfg wwwp01.cfg

/data/config/auto/srxen09:

/data/config/auto/srxen10:
Here, srxen01, srxen02, srxen03, srxen09 and srxen10 are some of my hosts. And you can see the configuration files of the domUs running on each of those hosts. So the script getBasicXenMonitoring.py does an snmpget call to the listXenCluster script in order to know the dom0 list. As I have 2 datacenters with 2 NFS shares, I must call listXenCluster 2 times (doing an snmpget to the principal dom0).
Such solution may seem a bit complicated, so please don't hesitate to change it and hard-code your dom0 list if you don't like it.