[LU-986] Possible Race Condition Created: 12/Jan/12 Updated: 06/Mar/12 Resolved: 06/Mar/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | Lustre 2.1.1 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Roger Spellman (Inactive) | Assignee: | Peter Jones |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Centos 6.0 |
||
| Severity: | 3 |
| Rank (Obsolete): | 6489 |
| Description |
|
I believe that I have found a possible race condition. I have an OSS with four OSTs. If I mount them one at a time, then they always mount just fine. That is, the following always works: mount -t lustre /dev/mapper/map00 /mnt/ost00 If I mount them all at the same time, like the following, then it sometimes fails. mount -t lustre /dev/mapper/map00 /mnt/ost00 & The failures are because some modules do not load successfully. I get errors such as: kernel: lov: gave up waiting for init of module osc. To track this down, I added printk's to osc_init() in osc_request.c, and to init_lustre_quota() in quota_interface.c (these are the module init routines for those two modules). If I mount the targets without the ampersand (and sometimes when I mount the targets with the ampersand), then lquota is initialized first before osc_init. In these cases, everything mounts just fine. In the cases when there is a problem, osc_init is called before lquota. osc_init() calls: cfs_request_module("lquota"); Using printk's, I have shown that when osc_init() runs before init_lustre_quota(), then that call to cfs_request_module does not return quickly, meaning that the system is NOT loading the lquota.ko module right away. I believe that this is because multiple lustre modules are trying to load lquota at once. Question: Why do several lustre modules call cfs_request_module("lquota") ? Are they using a service or a variable exported by lquota? I don't think so. If they were, then modprobe would force lquota to be loaded first, which is not the case. In particular, the lustre and the osc modules DO NOT have a dependency on lquota. So, why are these modules calling request_module("lquota")? Roger Spellman |
| Comments |
| Comment by Peter Jones [ 02/Mar/12 ] |
|
Roger Do you still see this issue with the latest master code? Peter |
| Comment by Roger Spellman (Inactive) [ 05/Mar/12 ] |
|
Peter, I'm waiting to get onto our system so that I can retest it. The system is in the middle of a multi-day test that should complete tomorrow. |
| Comment by Roger Spellman (Inactive) [ 06/Mar/12 ] |
|
Peter, I was not able to reproduce this bug on 2.1.1.RC4. Roger |
| Comment by Peter Jones [ 06/Mar/12 ] |
|
Thanks for confirming Roger. I knew that there had been some quota-related fixes. |