Saturday, January 12, 2008

lucreate can screw your libc

I bet there are plenty of people using Solaris Live Upgrade facility. So am I for more than 5 years. I thought the procedure is well tested and bullet proof. Nonetheless, I recently trip over quite an amusing bug in lucreate/lumake. Well, actually it is in /usr/lib/lu/lucopy, which is called internally during Boot Environment (BE) population.

Let's see the problem:

zero:/root# uname -a
SunOS zero 5.11 snv_78 i86pc i386 i86pc

zero:/root# lucreate -n busted_libc -m /:c1d0s5:ufs
Discovering physical storage devices
Discovering logical storage devices
Cross referencing storage devices with boot environment configurations
Determining types of file systems supported
Validating file system requests
The device name expands to device path


Creating compare databases for boot environment .
Creating compare database for file system .
Updating compare databases on boot environment .
Updating compare databases on boot environment .
Updating compare databases on boot environment .
Making boot environment bootable.
Updating bootenv.rc on ABE .
Population of boot environment successful.
Creation of boot environment successful.

Ok, after fresh copy is created one would expect all the files to be the same, right ? Wrong !

zero:/root# lucompare -i /tmp/aa busted_libc
Determining the configuration of busted_libc ...

< snv_78
> busted_libc

Processing Global Zone
Comparing / ...


Links, Sizes differ
01 < /usr/lib/libc/
02 > /usr/lib/libc/



What ? /usr/lib/libc/ is different ??? How can that be ?

At first glance it seems to be absurd. One would understand if /lib/ would be different, if during copy they forget to umount it. (/libc/ is mounted during boot on one of the hardware specific versions from /usr/lib/libc). Something must be really wrong here. Where are the Live Upgrade sources ?

Unfortunately Live Upgrade suite is not open source and, probably, never will be. However, clever observer may notice that most of the LU files are, in fact, shell scripts. Fortunately for us the problematic file is script as well.

Looking at /usr/lib/lu/lucopy near line 1266 one can notice that there seems to be correct handling of the mounted /lib/ The caveat here is that at this moment both /lib/ and /usr/lib/libc/ are linked to each other !!!

It turns out that cpio that copied over all the content of the original BE to the new BE was smart enought to notice that both files have same inode number and re-created that on new BE. Now, when lucopy thinks that it only copies generic it also inadvertently overwrites the hardware specific one !

Ok, we see the problem, but is there any workaround ? One simple way to avoid trashing would be to unlink the new /lib/ right before copying. That way we will never harm any other file that could be linked to it.

--- lucopy Thu Nov 15 23:44:28 2007
+++ Fri Jan 11 22:01:13 2008
@@ -1273,6 +1273,7 @@
# inside a shared file system. In that case, we needn't update
# anything and we thus ignore errors.
/usr/sbin/umount $mountpoint && (
+ /bin/rm -f "$dest$mountpoint" > /dev/null 2>&1
$CP_CMD $mountpoint "$dest$mountpoint" 2>/dev/null
/usr/sbin/mount -OF $fstype $special $mountpoint

Another lucreate/lucompare cycle and now /usr/lib/libc/ looks perfectly fine.