[LU-3975] Race loading ldiskfs with parallel mounts - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
10609

Description

Parallel calls to mount targets when ldiskfs is not already loaded can lead to hitting a race in the kernel when it attempts to load the module, which can result in the second mount failing. This race is not unique to ldiskfs and can affect any module that does not protect itself with some sort of locking mechanism.

This bug was fixed in kernel-3.7.0, and is documented here: https://bugzilla.redhat.com/show_bug.cgi?id=771285

Which links to this fix here: http://thread.gmane.org/gmane.linux.kernel/1358707/focus=1358709

We have confirmed that this fix has not been backported to the 2.6.32 kernel yet. We have opened a bug with RedHat regarding the issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1009704

This can cause parallel calls to mkfs.lustre to fail as well, as the mounts in ldiskfs_write_ldd can hit this race if ldiskfs is not already loaded.

I think there are two outstanding questions here:

(1) Do we want to try to do the backport ourselves and not wait on RedHat?
(2) Is it safe to explicitly "modprobe ldiskfs" prior to calling mkfs.lustre to protect ourselves against the race? Or could loading the module explicitly cause some other issues with Lustre?

Attachments

Issue Links

duplicates

LU-1279 failure trying to mount two targets at the same time after boot

Resolved

is related to

LU-1279 failure trying to mount two targets at the same time after boot

Resolved

Activity

[LU-3975] Race loading ldiskfs with parallel mounts

Andreas Dilger added a comment - 27/Nov/13 7:13 PM

Closing this as a duplicate of ~~LU-1279~~.

Andreas Dilger added a comment - 27/Nov/13 7:13 PM Closing this as a duplicate of LU-1279 .

Patrick Farrell (Inactive) added a comment - 08/Nov/13 5:16 PM - edited

Hongchao,

Trying to port this kernel patch back to CentOS 6.4. It doesn't land cleanly it all, in fact it seems to depend on a fairly significant rewrite of module.c.

For example, the second component of this patch shows module_mutex being locked (in load_module), but in 2.6.32, that mutex isn't referenced in load_module at all, as best I can tell.

There are other differences, such as there's no label 'free_arch_cleanup' in load_module, and the code that returns EEXIST appears to have been relocated in load_module as well.

I'm trying to get the kernel git repo checked out so I can try tracking down more of the patch history here, but I'm getting 404s trying to check out the kernel.org Linux repo... We'll see.

Kelsey, has there been any update on the RedHat bug? It'd be great if they'd do the porting.

Here's the problematic part of the patch I was referring to:

+again:
 	mutex_lock(&module_mutex);
-	if (find_module(mod->name)) {
+	if ((old = find_module(mod->name)) != NULL) {
+		if (old->state == MODULE_STATE_COMING) {
+			/* Wait in case it fails to load. */
+			mutex_unlock(&module_mutex);
+			err = wait_event_interruptible(module_wq,
+					       finished_loading(mod->name));
+			if (err)
+				goto free_arch_cleanup;
+			goto again;
+		}
 		err = -EEXIST;
 		goto unlock;
 	}

Patrick Farrell (Inactive) added a comment - 08/Nov/13 5:16 PM - edited Hongchao, Trying to port this kernel patch back to CentOS 6.4. It doesn't land cleanly it all, in fact it seems to depend on a fairly significant rewrite of module.c. For example, the second component of this patch shows module_mutex being locked (in load_module), but in 2.6.32, that mutex isn't referenced in load_module at all, as best I can tell. There are other differences, such as there's no label 'free_arch_cleanup' in load_module, and the code that returns EEXIST appears to have been relocated in load_module as well. I'm trying to get the kernel git repo checked out so I can try tracking down more of the patch history here, but I'm getting 404s trying to check out the kernel.org Linux repo... We'll see. Kelsey, has there been any update on the RedHat bug? It'd be great if they'd do the porting. Here's the problematic part of the patch I was referring to: +again: mutex_lock(&module_mutex); - if (find_module(mod->name)) { + if ((old = find_module(mod->name)) != NULL) { + if (old->state == MODULE_STATE_COMING) { + /* Wait in case it fails to load. */ + mutex_unlock(&module_mutex); + err = wait_event_interruptible(module_wq, + finished_loading(mod->name)); + if (err) + goto free_arch_cleanup; + goto again; + } err = -EEXIST; goto unlock; }

Peter Jones added a comment - 01/Nov/13 12:55 PM

Hongchao

Could you please confirm whether this is a duplicate of ~~LU-1279~~?

Thanks

Peter

Peter Jones added a comment - 01/Nov/13 12:55 PM Hongchao Could you please confirm whether this is a duplicate of LU-1279 ? Thanks Peter

Ashley Pittman (Inactive) added a comment - 23/Sep/13 1:45 PM

This looks like a duplicate of ~~LU-1279~~. We are able to avoid this by pre-loading modules in the correct order although that approach isn't without it's problems, in particular the module list that we need to load is very long and Lustre version specific, we've also found ~~LU-3948~~.

Ashley Pittman (Inactive) added a comment - 23/Sep/13 1:45 PM This looks like a duplicate of LU-1279 . We are able to avoid this by pre-loading modules in the correct order although that approach isn't without it's problems, in particular the module list that we need to load is very long and Lustre version specific, we've also found LU-3948 .

Brian Murrell (Inactive) added a comment - 20/Sep/13 12:16 AM

From my understanding of the problem, modprobing the module doesn't really eliminate the race but rather just mitigates it by allowing a sufficient amount of time to go by so that everyone that will want the module will have it loaded by the time they get to a point of mounting something. So in order for mkfs.lustre to benefit from this work-around, it should issue the modprobe before starting the mke2fs to give as much time as possible before it's mount(2) will want it to be there.

Brian Murrell (Inactive) added a comment - 20/Sep/13 12:16 AM From my understanding of the problem, modprobing the module doesn't really eliminate the race but rather just mitigates it by allowing a sufficient amount of time to go by so that everyone that will want the module will have it loaded by the time they get to a point of mounting something. So in order for mkfs.lustre to benefit from this work-around, it should issue the modprobe before starting the mke2fs to give as much time as possible before it's mount(2) will want it to be there.

Andreas Dilger added a comment - 19/Sep/13 9:46 PM

The ldiskfs module is standalone, so there shouldn't be any problems loading it explicitly. It would be good to load it in mount.lustre and mkfs.lustre so that this problem is fixed for all Lustre users and we don't have to debug it again. There is also no problem to have IML load ldiskfs and patch mkfs.lustre and mount.lustre to do the same.

It would also be possible to patch the kernel, since we already patch the RHEL kernel for other reasons, though we are trying to eliminate the kernel patches. This would be better from the POV that the patch would naturally disappear when the fix is backported (unlike the workarounds), but has the problem that it only works for the specific kernels that are patched (i.e. we'd need to patch SLES11 in addition to RHEL6) and would likely be more effort to catch the various Lustre kernels in use.

Summary: any/all of the above fixes/workarounds are acceptable.

Andreas Dilger added a comment - 19/Sep/13 9:46 PM The ldiskfs module is standalone, so there shouldn't be any problems loading it explicitly. It would be good to load it in mount.lustre and mkfs.lustre so that this problem is fixed for all Lustre users and we don't have to debug it again. There is also no problem to have IML load ldiskfs and patch mkfs.lustre and mount.lustre to do the same. It would also be possible to patch the kernel, since we already patch the RHEL kernel for other reasons, though we are trying to eliminate the kernel patches. This would be better from the POV that the patch would naturally disappear when the fix is backported (unlike the workarounds), but has the problem that it only works for the specific kernels that are patched (i.e. we'd need to patch SLES11 in addition to RHEL6) and would likely be more effort to catch the various Lustre kernels in use. Summary: any/all of the above fixes/workarounds are acceptable.

People

Assignee:: Hongchao Zhang

Reporter:: Kelsey Prantis (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 19/Sep/13 9:21 PM

Updated:: 23/Dec/13 9:35 PM

Resolved:: 27/Nov/13 7:13 PM