Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.1.0
-
None
-
Centos 6.0
-
3
-
6489
Description
I believe that I have found a possible race condition.
I have an OSS with four OSTs. If I mount them one at a time, then they always mount just fine. That is, the following always works:
mount -t lustre /dev/mapper/map00 /mnt/ost00
mount -t lustre /dev/mapper/map01 /mnt/ost01
mount -t lustre /dev/mapper/map02 /mnt/ost02
mount -t lustre /dev/mapper/map03 /mnt/ost03
If I mount them all at the same time, like the following, then it sometimes fails.
mount -t lustre /dev/mapper/map00 /mnt/ost00 &
mount -t lustre /dev/mapper/map01 /mnt/ost01 &
mount -t lustre /dev/mapper/map02 /mnt/ost02 &
mount -t lustre /dev/mapper/map03 /mnt/ost03 &
The failures are because some modules do not load successfully. I get errors such as:
kernel: lov: gave up waiting for init of module osc.
kernel: lov: Unknown symbol osc_update_enqueue
To track this down, I added printk's to osc_init() in osc_request.c, and to init_lustre_quota() in quota_interface.c (these are the module init routines for those two modules).
If I mount the targets without the ampersand (and sometimes when I mount the targets with the ampersand), then lquota is initialized first before osc_init. In these cases, everything mounts just fine.
In the cases when there is a problem, osc_init is called before lquota.
osc_init() calls:
cfs_request_module("lquota");
Using printk's, I have shown that when osc_init() runs before init_lustre_quota(), then that call to cfs_request_module does not return quickly, meaning that the system is NOT loading the lquota.ko module right away. I believe that this is because multiple lustre modules are trying to load lquota at once.
Question: Why do several lustre modules call cfs_request_module("lquota") ?
Are they using a service or a variable exported by lquota? I don't think so. If they were, then modprobe would force lquota to be loaded first, which is not the case. In particular, the lustre and the osc modules DO NOT have a dependency on lquota. So, why are these modules calling request_module("lquota")?
Roger Spellman
Staff Engineer
Terascala, Inc.
508-588-1501
www.terascala.com <http://www.terascala.com/>