Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
None
-
None
-
3
-
10609
Description
Parallel calls to mount targets when ldiskfs is not already loaded can lead to hitting a race in the kernel when it attempts to load the module, which can result in the second mount failing. This race is not unique to ldiskfs and can affect any module that does not protect itself with some sort of locking mechanism.
This bug was fixed in kernel-3.7.0, and is documented here: https://bugzilla.redhat.com/show_bug.cgi?id=771285
Which links to this fix here: http://thread.gmane.org/gmane.linux.kernel/1358707/focus=1358709
We have confirmed that this fix has not been backported to the 2.6.32 kernel yet. We have opened a bug with RedHat regarding the issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1009704
This can cause parallel calls to mkfs.lustre to fail as well, as the mounts in ldiskfs_write_ldd can hit this race if ldiskfs is not already loaded.
I think there are two outstanding questions here:
(1) Do we want to try to do the backport ourselves and not wait on RedHat?
(2) Is it safe to explicitly "modprobe ldiskfs" prior to calling mkfs.lustre to protect ourselves against the race? Or could loading the module explicitly cause some other issues with Lustre?