Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3975

Race loading ldiskfs with parallel mounts

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • 3
    • 10609

    Description

      Parallel calls to mount targets when ldiskfs is not already loaded can lead to hitting a race in the kernel when it attempts to load the module, which can result in the second mount failing. This race is not unique to ldiskfs and can affect any module that does not protect itself with some sort of locking mechanism.

      This bug was fixed in kernel-3.7.0, and is documented here: https://bugzilla.redhat.com/show_bug.cgi?id=771285

      Which links to this fix here: http://thread.gmane.org/gmane.linux.kernel/1358707/focus=1358709

      We have confirmed that this fix has not been backported to the 2.6.32 kernel yet. We have opened a bug with RedHat regarding the issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1009704

      This can cause parallel calls to mkfs.lustre to fail as well, as the mounts in ldiskfs_write_ldd can hit this race if ldiskfs is not already loaded.

      I think there are two outstanding questions here:

      (1) Do we want to try to do the backport ourselves and not wait on RedHat?
      (2) Is it safe to explicitly "modprobe ldiskfs" prior to calling mkfs.lustre to protect ourselves against the race? Or could loading the module explicitly cause some other issues with Lustre?

      Attachments

        Issue Links

          Activity

            [LU-3975] Race loading ldiskfs with parallel mounts

            Closing this as a duplicate of LU-1279.

            adilger Andreas Dilger added a comment - Closing this as a duplicate of LU-1279 .

            Hongchao,

            Trying to port this kernel patch back to CentOS 6.4. It doesn't land cleanly it all, in fact it seems to depend on a fairly significant rewrite of module.c.

            For example, the second component of this patch shows module_mutex being locked (in load_module), but in 2.6.32, that mutex isn't referenced in load_module at all, as best I can tell.

            There are other differences, such as there's no label 'free_arch_cleanup' in load_module, and the code that returns EEXIST appears to have been relocated in load_module as well.

            I'm trying to get the kernel git repo checked out so I can try tracking down more of the patch history here, but I'm getting 404s trying to check out the kernel.org Linux repo... We'll see.

            Kelsey, has there been any update on the RedHat bug? It'd be great if they'd do the porting.

            Here's the problematic part of the patch I was referring to:

            +again:
             	mutex_lock(&module_mutex);
            -	if (find_module(mod->name)) {
            +	if ((old = find_module(mod->name)) != NULL) {
            +		if (old->state == MODULE_STATE_COMING) {
            +			/* Wait in case it fails to load. */
            +			mutex_unlock(&module_mutex);
            +			err = wait_event_interruptible(module_wq,
            +					       finished_loading(mod->name));
            +			if (err)
            +				goto free_arch_cleanup;
            +			goto again;
            +		}
             		err = -EEXIST;
             		goto unlock;
             	}
            
            paf Patrick Farrell (Inactive) added a comment - - edited Hongchao, Trying to port this kernel patch back to CentOS 6.4. It doesn't land cleanly it all, in fact it seems to depend on a fairly significant rewrite of module.c. For example, the second component of this patch shows module_mutex being locked (in load_module), but in 2.6.32, that mutex isn't referenced in load_module at all, as best I can tell. There are other differences, such as there's no label 'free_arch_cleanup' in load_module, and the code that returns EEXIST appears to have been relocated in load_module as well. I'm trying to get the kernel git repo checked out so I can try tracking down more of the patch history here, but I'm getting 404s trying to check out the kernel.org Linux repo... We'll see. Kelsey, has there been any update on the RedHat bug? It'd be great if they'd do the porting. Here's the problematic part of the patch I was referring to: +again: mutex_lock(&module_mutex); - if (find_module(mod->name)) { + if ((old = find_module(mod->name)) != NULL) { + if (old->state == MODULE_STATE_COMING) { + /* Wait in case it fails to load. */ + mutex_unlock(&module_mutex); + err = wait_event_interruptible(module_wq, + finished_loading(mod->name)); + if (err) + goto free_arch_cleanup; + goto again; + } err = -EEXIST; goto unlock; }
            pjones Peter Jones added a comment -

            Hongchao

            Could you please confirm whether this is a duplicate of LU-1279?

            Thanks

            Peter

            pjones Peter Jones added a comment - Hongchao Could you please confirm whether this is a duplicate of LU-1279 ? Thanks Peter

            This looks like a duplicate of LU-1279. We are able to avoid this by pre-loading modules in the correct order although that approach isn't without it's problems, in particular the module list that we need to load is very long and Lustre version specific, we've also found LU-3948.

            apittman Ashley Pittman (Inactive) added a comment - This looks like a duplicate of LU-1279 . We are able to avoid this by pre-loading modules in the correct order although that approach isn't without it's problems, in particular the module list that we need to load is very long and Lustre version specific, we've also found LU-3948 .

            From my understanding of the problem, modprobing the module doesn't really eliminate the race but rather just mitigates it by allowing a sufficient amount of time to go by so that everyone that will want the module will have it loaded by the time they get to a point of mounting something. So in order for mkfs.lustre to benefit from this work-around, it should issue the modprobe before starting the mke2fs to give as much time as possible before it's mount(2) will want it to be there.

            brian Brian Murrell (Inactive) added a comment - From my understanding of the problem, modprobing the module doesn't really eliminate the race but rather just mitigates it by allowing a sufficient amount of time to go by so that everyone that will want the module will have it loaded by the time they get to a point of mounting something. So in order for mkfs.lustre to benefit from this work-around, it should issue the modprobe before starting the mke2fs to give as much time as possible before it's mount(2) will want it to be there.

            The ldiskfs module is standalone, so there shouldn't be any problems loading it explicitly. It would be good to load it in mount.lustre and mkfs.lustre so that this problem is fixed for all Lustre users and we don't have to debug it again. There is also no problem to have IML load ldiskfs and patch mkfs.lustre and mount.lustre to do the same.

            It would also be possible to patch the kernel, since we already patch the RHEL kernel for other reasons, though we are trying to eliminate the kernel patches. This would be better from the POV that the patch would naturally disappear when the fix is backported (unlike the workarounds), but has the problem that it only works for the specific kernels that are patched (i.e. we'd need to patch SLES11 in addition to RHEL6) and would likely be more effort to catch the various Lustre kernels in use.

            Summary: any/all of the above fixes/workarounds are acceptable.

            adilger Andreas Dilger added a comment - The ldiskfs module is standalone, so there shouldn't be any problems loading it explicitly. It would be good to load it in mount.lustre and mkfs.lustre so that this problem is fixed for all Lustre users and we don't have to debug it again. There is also no problem to have IML load ldiskfs and patch mkfs.lustre and mount.lustre to do the same. It would also be possible to patch the kernel, since we already patch the RHEL kernel for other reasons, though we are trying to eliminate the kernel patches. This would be better from the POV that the patch would naturally disappear when the fix is backported (unlike the workarounds), but has the problem that it only works for the specific kernels that are patched (i.e. we'd need to patch SLES11 in addition to RHEL6) and would likely be more effort to catch the various Lustre kernels in use. Summary: any/all of the above fixes/workarounds are acceptable.

            People

              hongchao.zhang Hongchao Zhang
              kelsey Kelsey Prantis (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: