Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3975

Race loading ldiskfs with parallel mounts

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • None
    • None
    • None
    • 3
    • 10609

    Description

      Parallel calls to mount targets when ldiskfs is not already loaded can lead to hitting a race in the kernel when it attempts to load the module, which can result in the second mount failing. This race is not unique to ldiskfs and can affect any module that does not protect itself with some sort of locking mechanism.

      This bug was fixed in kernel-3.7.0, and is documented here: https://bugzilla.redhat.com/show_bug.cgi?id=771285

      Which links to this fix here: http://thread.gmane.org/gmane.linux.kernel/1358707/focus=1358709

      We have confirmed that this fix has not been backported to the 2.6.32 kernel yet. We have opened a bug with RedHat regarding the issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1009704

      This can cause parallel calls to mkfs.lustre to fail as well, as the mounts in ldiskfs_write_ldd can hit this race if ldiskfs is not already loaded.

      I think there are two outstanding questions here:

      (1) Do we want to try to do the backport ourselves and not wait on RedHat?
      (2) Is it safe to explicitly "modprobe ldiskfs" prior to calling mkfs.lustre to protect ourselves against the race? Or could loading the module explicitly cause some other issues with Lustre?

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              kelsey Kelsey Prantis (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: