Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-986

Possible Race Condition

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.1.1
    • Lustre 2.1.0
    • None
    • Centos 6.0
    • 3
    • 6489

    Description

      I believe that I have found a possible race condition.

      I have an OSS with four OSTs. If I mount them one at a time, then they always mount just fine. That is, the following always works:

      mount -t lustre /dev/mapper/map00 /mnt/ost00
      mount -t lustre /dev/mapper/map01 /mnt/ost01
      mount -t lustre /dev/mapper/map02 /mnt/ost02
      mount -t lustre /dev/mapper/map03 /mnt/ost03

      If I mount them all at the same time, like the following, then it sometimes fails.

      mount -t lustre /dev/mapper/map00 /mnt/ost00 &
      mount -t lustre /dev/mapper/map01 /mnt/ost01 &
      mount -t lustre /dev/mapper/map02 /mnt/ost02 &
      mount -t lustre /dev/mapper/map03 /mnt/ost03 &

      The failures are because some modules do not load successfully. I get errors such as:

      kernel: lov: gave up waiting for init of module osc.
      kernel: lov: Unknown symbol osc_update_enqueue

      To track this down, I added printk's to osc_init() in osc_request.c, and to init_lustre_quota() in quota_interface.c (these are the module init routines for those two modules).

      If I mount the targets without the ampersand (and sometimes when I mount the targets with the ampersand), then lquota is initialized first before osc_init. In these cases, everything mounts just fine.

      In the cases when there is a problem, osc_init is called before lquota.

      osc_init() calls:

      cfs_request_module("lquota");

      Using printk's, I have shown that when osc_init() runs before init_lustre_quota(), then that call to cfs_request_module does not return quickly, meaning that the system is NOT loading the lquota.ko module right away. I believe that this is because multiple lustre modules are trying to load lquota at once.

      Question: Why do several lustre modules call cfs_request_module("lquota") ?

      Are they using a service or a variable exported by lquota? I don't think so. If they were, then modprobe would force lquota to be loaded first, which is not the case. In particular, the lustre and the osc modules DO NOT have a dependency on lquota. So, why are these modules calling request_module("lquota")?

      Roger Spellman
      Staff Engineer
      Terascala, Inc.
      508-588-1501
      www.terascala.com <http://www.terascala.com/>

      Attachments

        Activity

          People

            pjones Peter Jones
            rspellman Roger Spellman (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: