Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1279

failure trying to mount two targets at the same time after boot

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.7.0, Lustre 2.5.4
    • Lustre 2.4.0, Lustre 2.1.2
    • lustre-modules-2.1.1-2.6.32_220.4.2.el6_lustre.gcbb4fad.x86_64_gae03fc8.x86_64
    • 3
    • 4025

    Description

      After booting an OSS, two OSTs are mounted simultaneously. The mounts fail due to module loading failure:

      Lustre: OBD class driver, http://wiki.whamcloud.com/
      Lustre:         Lustre Version: 2.1.1
      Lustre:         Build Version: jenkins-gae03fc8-PRISTINE-2.6.32-220.4.2.el6_lustre.gcbb4fad.x86_64
      Lustre: Lustre LU module (ffffffffa0578c60).
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol RQF_FLD_QUERY
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_server_pack
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_client_get
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_queue_wait
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_fini
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_init
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_set
      INFO: task hydra-agent:1590 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      hydra-agent   D 0000000000000000     0  1590      1 0x00000080
       ffff88003db09d68 0000000000000082 ffff88003d740a88 ffff88003bad0250
       ffff88003db09d68 ffffffff8113fb78 800000002c760065 0000000000000086
       ffff880037c1c678 ffff88003db09fd8 000000000000f4e8 ffff880037c1c678
      Call Trace:
       [<ffffffff8113fb78>] ? vma_adjust+0x128/0x590
       [<ffffffff814ee35e>] __mutex_lock_slowpath+0x13e/0x180
       [<ffffffff814ee1fb>] mutex_lock+0x2b/0x50
       [<ffffffff810aaafd>] m_start+0x1d/0x40
       [<ffffffff81198cc0>] seq_read+0x90/0x3f0
       [<ffffffff811dae0e>] proc_reg_read+0x7e/0xc0
       [<ffffffff81176cb5>] vfs_read+0xb5/0x1a0
       [<ffffffff810d4582>] ? audit_syscall_entry+0x272/0x2a0
       [<ffffffff81176df1>] sys_read+0x51/0x90
       [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      INFO: task modprobe:1679 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      modprobe      D 0000000000000000     0  1679   1651 0x00000080
       ffff88002ed35aa8 0000000000000082 ffff88002ed35a58 ffffffff810097cc
       ffff88003ef260f8 0000000000000000 0000000000d35a68 ffff880002213b00
       ffff880037415a78 ffff88002ed35fd8 000000000000f4e8 ffff880037415a78
      Call Trace:
       [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
       [<ffffffff814ecd0e>] ? thread_return+0x4e/0x760
       [<ffffffff814edb75>] schedule_timeout+0x215/0x2e0
       [<ffffffff8104c9e9>] ? __wake_up_common+0x59/0x90
       [<ffffffff814ed7f3>] wait_for_common+0x123/0x180
       [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
       [<ffffffff8108b741>] ? __queue_work+0x41/0x50
       [<ffffffff814ed90d>] wait_for_completion+0x1d/0x20
       [<ffffffff81089c90>] call_usermodehelper_exec+0xe0/0xf0
       [<ffffffffa04966d2>] ? lnet_startup_lndnis+0x262/0x6f0 [lnet]
       [<ffffffff81089feb>] __request_module+0x18b/0x210
       [<ffffffffa0498e00>] ? lnet_parse_networks+0x90/0x7e0 [lnet]
       [<ffffffffa041aa13>] ? cfs_alloc+0x63/0x90 [libcfs]
       [<ffffffffa04966d2>] lnet_startup_lndnis+0x262/0x6f0 [lnet]
       [<ffffffffa041aa13>] ? cfs_alloc+0x63/0x90 [libcfs]
       [<ffffffffa0496c85>] LNetNIInit+0x125/0x1f0 [lnet]
       [<ffffffffa06aa13a>] ? init_module+0x0/0x597 [ptlrpc]
       [<ffffffffa05f1c89>] ptlrpc_ni_init+0x29/0x170 [ptlrpc]
       [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
       [<ffffffffa05f2053>] ptlrpc_init_portals+0x13/0xd0 [ptlrpc]
       [<ffffffffa06aa13a>] ? init_module+0x0/0x597 [ptlrpc]
       [<ffffffffa06aa21a>] init_module+0xe0/0x597 [ptlrpc]
       [<ffffffff81096d15>] ? __blocking_notifier_call_chain+0x65/0x80
       [<ffffffff8100204c>] do_one_initcall+0x3c/0x1d0
       [<ffffffff810af4e1>] sys_init_module+0xe1/0x250
       [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      INFO: task modprobe:1688 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      modprobe      D 0000000000000000     0  1688   1687 0x00000080
       ffff88003d6d3eb8 0000000000000086 ffff88003d6d3e18 0000000000000082
       ffff88003d6d1ab8 ffff88003d6d3fd8 000000000000f4e8 ffff88003d6d1ac0
       ffff88003d6d1ab8 ffff88003d6d3fd8 000000000000f4e8 ffff88003d6d1ab8
      Call Trace:
       [<ffffffff814f39dd>] ? kprobes_module_callback+0xdd/0x170
       [<ffffffff814ee35e>] __mutex_lock_slowpath+0x13e/0x180
       [<ffffffff81096d15>] ? __blocking_notifier_call_chain+0x65/0x80
       [<ffffffff814ee1fb>] mutex_lock+0x2b/0x50
       [<ffffffff810af533>] sys_init_module+0x133/0x250
       [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_server_get
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_at_set_req_timeout
      type=1305 audit(1333395888.750:31878): auid=4294967295 ses=4294967295 op="remove rule" key=(null) list=4 res=1
      type=1305 audit(1333395888.750:31879): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 res=1
      readahead-collector: starting delayed service auditd
      readahead-collector: sorting
      readahead-collector: finished
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_request_alloc_pack
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol RMF_FLD_OPC
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_request_set_replen
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol RMF_FLD_MDFLD
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_req_finished
      LustreError: 1679:0:(socklnd.c:2420:ksocknal_base_startup()) Can't spawn socknal scheduler[0]: -513
      LustreError: 105-4: Error -100 starting up LNI tcp
      LustreError: 1679:0:(events.c:728:ptlrpc_init_portals()) network initialisation failed
      

      Attachments

        Issue Links

          Activity

            [LU-1279] failure trying to mount two targets at the same time after boot
            spitzcor Cory Spitz added a comment - - edited

            I'm sure that #5512 will workaround this problem, but isn't there a larger issue to resolve? Patrick noted that modules like mdd and lquota also fail in the same manner. Should we investigate fixing racy module load for all modules?

            spitzcor Cory Spitz added a comment - - edited I'm sure that #5512 will workaround this problem, but isn't there a larger issue to resolve? Patrick noted that modules like mdd and lquota also fail in the same manner. Should we investigate fixing racy module load for all modules?

            James:

            Possibly. I have yet to replicate those on a system where I can install arbitrary Lustre code [it's only happening on some of our systems, seems very timing dependent], but I'll give it a try. (Replication first of all...)

            paf Patrick Farrell (Inactive) added a comment - James: Possibly. I have yet to replicate those on a system where I can install arbitrary Lustre code [it's only happening on some of our systems, seems very timing dependent] , but I'll give it a try. (Replication first of all...)

            Patrick can you try http://review.whamcloud.com/5512 to see if the fsfilt issues go away for you?

            simmonsja James A Simmons added a comment - Patrick can you try http://review.whamcloud.com/5512 to see if the fsfilt issues go away for you?

            It is important to note that the fsfilt code that is loading the fsfilt_ldiskfs module will be deleted in Lustre 2.6. The code is almost obsolete in Lustre 2.5, but the patch to remove it (http://review.whamcloud.com/5512) didn't get landed before the feature freeze. Any fixes made in this area (e.g. having mount.lustre load fsfilt_ldiskfs for ldiskfs servers, or putting a mutex around the probe/load/check code) does not need to be landed for Lustre 2.5 or later.

            I'd prefer to just land 5512 to b2_5 to delete the affected code to fix this problem instead. The patch may be large, but is mostly just moving code around so that lvfs and fsfilt can be deleted.

            adilger Andreas Dilger added a comment - It is important to note that the fsfilt code that is loading the fsfilt_ldiskfs module will be deleted in Lustre 2.6. The code is almost obsolete in Lustre 2.5, but the patch to remove it ( http://review.whamcloud.com/5512 ) didn't get landed before the feature freeze. Any fixes made in this area (e.g. having mount.lustre load fsfilt_ldiskfs for ldiskfs servers, or putting a mutex around the probe/load/check code) does not need to be landed for Lustre 2.5 or later. I'd prefer to just land 5512 to b2_5 to delete the affected code to fix this problem instead. The patch may be large, but is mostly just moving code around so that lvfs and fsfilt can be deleted.

            I've also attached the module-init-tools source with my instrumented modprobe. See the note on the attachment for more details.

            One further note - This ticket gives the affected version as 2.1. We're seeing the related issue I described above in 2.4.1.

            paf Patrick Farrell (Inactive) added a comment - - edited I've also attached the module-init-tools source with my instrumented modprobe. See the note on the attachment for more details. One further note - This ticket gives the affected version as 2.1. We're seeing the related issue I described above in 2.4.1.

            Full source of module-init-tools-3.9.tar.gz, with lightly instrumented modprobe.c.

            This version of modprobe (built just by doing configure, then make, and then found in the /build/ directory) creates and appends debug output to a file called /tmp/modprobe_output

            It's made from a version of module-init-tools very close to that used with CentOS 6.4 and can be dropped in in place of the existing modprobe in /sbin/modprobe

            paf Patrick Farrell (Inactive) added a comment - Full source of module-init-tools-3.9.tar.gz, with lightly instrumented modprobe.c. This version of modprobe (built just by doing configure, then make, and then found in the /build/ directory) creates and appends debug output to a file called /tmp/modprobe_output It's made from a version of module-init-tools very close to that used with CentOS 6.4 and can be dropped in in place of the existing modprobe in /sbin/modprobe

            I'm not sure it necessarily represents -EPERM, but -1 is the return value from init_module:
            This is the code from modprobe (inside the function 'insmod', around line 690 in modprobe.c) , with the debug line I added:

                    ret = init_module(module->data, module->len, optstring);
                    if (ret != 0) {
                            fprintf(debug_fp,"pid %d: init_module failed, error: %d\n",(int) pid, ret);
            

            The fourth mount in the file is the one which failed (line 596 is one of the fsfilt_ldiskfs failures from that mount, note there are three failures and one success for fsfilt_ldiskfs in that mount).

            paf Patrick Farrell (Inactive) added a comment - I'm not sure it necessarily represents -EPERM, but -1 is the return value from init_module: This is the code from modprobe (inside the function 'insmod', around line 690 in modprobe.c) , with the debug line I added: ret = init_module(module->data, module->len, optstring); if (ret != 0) { fprintf(debug_fp, "pid %d: init_module failed, error: %d\n" ,( int ) pid, ret); The fourth mount in the file is the one which failed (line 596 is one of the fsfilt_ldiskfs failures from that mount, note there are three failures and one success for fsfilt_ldiskfs in that mount).

            is the error -1 (-EPERM) the actual error returned by "init_module"?

            the modules could be loaded more than once in "class_get_type"(obdclass/genops.c" if the previous loads failed.

            btw, which mount is the failed one in the attached log?

            Thanks!

            hongchao.zhang Hongchao Zhang added a comment - is the error -1 (-EPERM) the actual error returned by "init_module"? the modules could be loaded more than once in "class_get_type"(obdclass/genops.c" if the previous loads failed. btw, which mount is the failed one in the attached log? Thanks!

            I think you may find that this patch is not sufficient to allow parallel mounting of multiple targets. We've recently observed other issues with that same operation.

            Cray uses a script to mount all OSTs at the same time - The standard mount -t lustre commands are issued in parallel.

            Recently, we've started occasionally noticing that the fsfilt_ldiskfs module sometimes fails to load.
            The error comes from fsfilt_get_ops, which loads the module like this:

                            if (!(rc = cfs_request_module("%s", name))) {
                                    fs_ops = fsfilt_search_type(type);
                                    CDEBUG(D_INFO, "Loaded module '%s'\n", name);
                                    if (!fs_ops)
                                            rc = -ENOENT;
                            }
            
            
                            if (rc) {
                                    CERROR("Can't find %s interface\n", name);
                                    RETURN(ERR_PTR(rc < 0 ? rc : -rc));
                                    /* u
                            }
            

            So when the module load returns a non-zero RC, it checks to see if the module was really loaded with fsfilt_search_type (it checks for a pointer to something from the module).
            If that happens, that ENOENT is set. It's converted to something else and ends up back at userspace as something like "ENOTSUPPORTED", but this is the source of it.

            We get the "Can't find fsfilt_ldiskfs interface" message, and the mount fails.

            In investigating this, I switched to using an instrumented version of modprobe to see what step was failing.

            I've logged a number of mount attempts with a four OST OSS, some of which failed with the error above but most of which succeeded.

            I'll attach the log from it, but here's an example of what I found, when logging a (successful, in this case) Lustre mount:
            pid 28287 Modprobe started.
            pid 28287: modulearg is: osd-ldiskfs
            pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/net/lustre/libcfs.ko result: 1
            pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/lvfs.ko result: 1
            pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/net/lustre/lnet.ko result: 1
            pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/obdclass.ko result: 1
            pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/ptlrpc.ko result: 1
            pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/fld.ko result: 1
            pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/fid.ko result: 1
            pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/mdd.ko result: 1
            pid 28287: init_module failed, error: -1
            pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/kernel/fs/jbd2/jbd2.ko result: 1
            pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/lquota.ko result: 1
            pid 28287: init_module failed, error: -1
            pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/kernel/fs/mbcache.ko result: 1
            pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre-ldiskfs/ldiskfs.ko result: 1
            pid 28287: init_module failed, error: -1
            pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/osd_ldiskfs.ko result: 1
            pid 28287: init_module failed, error: -1

            Modulearg is the module name modprobe was called with. It then attempts to load any dependencies, and finally the module given in the argument.
            The pid is the pid of the particular instance of modprobe, then the file being opened is the module it is trying to load.
            When you see "init_module" failed, that is a failure to init the module listed on the "File open of file" line above.
            In the attached log file, mounts of Lustre are separated by "Mount starting" and "Mount complete".

            There is a separate modprobe of osd-ldiskfs which succeeded as part of that mount.

            If you look through the attached file, you will see many modules failing to init on and off on various mounts, and various module inits being attempted more than once.
            Often, fsfilt_ldiskfs will fail to init one or more times, but the mount will still succeed.
            It's also worth noting that even when the mount fails, fsfilt_ldiskfs is init'ed successfully at least once.
            This suggests that possibly the check that the module is loaded is racing with the module load itself.

            modprobe fsfilt_ldiskfs before doing the mount avoids the mount failure.

            paf Patrick Farrell (Inactive) added a comment - - edited I think you may find that this patch is not sufficient to allow parallel mounting of multiple targets. We've recently observed other issues with that same operation. Cray uses a script to mount all OSTs at the same time - The standard mount -t lustre commands are issued in parallel. Recently, we've started occasionally noticing that the fsfilt_ldiskfs module sometimes fails to load. The error comes from fsfilt_get_ops, which loads the module like this: if (!(rc = cfs_request_module( "%s" , name))) { fs_ops = fsfilt_search_type(type); CDEBUG(D_INFO, "Loaded module '%s' \n" , name); if (!fs_ops) rc = -ENOENT; } if (rc) { CERROR( "Can't find %s interface \n" , name); RETURN(ERR_PTR(rc < 0 ? rc : -rc)); /* u } So when the module load returns a non-zero RC, it checks to see if the module was really loaded with fsfilt_search_type (it checks for a pointer to something from the module). If that happens, that ENOENT is set. It's converted to something else and ends up back at userspace as something like "ENOTSUPPORTED", but this is the source of it. We get the "Can't find fsfilt_ldiskfs interface" message, and the mount fails. In investigating this, I switched to using an instrumented version of modprobe to see what step was failing. I've logged a number of mount attempts with a four OST OSS, some of which failed with the error above but most of which succeeded. I'll attach the log from it, but here's an example of what I found, when logging a (successful, in this case) Lustre mount: pid 28287 Modprobe started. pid 28287: modulearg is: osd-ldiskfs pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/net/lustre/libcfs.ko result: 1 pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/lvfs.ko result: 1 pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/net/lustre/lnet.ko result: 1 pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/obdclass.ko result: 1 pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/ptlrpc.ko result: 1 pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/fld.ko result: 1 pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/fid.ko result: 1 pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/mdd.ko result: 1 pid 28287: init_module failed, error: -1 pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/kernel/fs/jbd2/jbd2.ko result: 1 pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/lquota.ko result: 1 pid 28287: init_module failed, error: -1 pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/kernel/fs/mbcache.ko result: 1 pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre-ldiskfs/ldiskfs.ko result: 1 pid 28287: init_module failed, error: -1 pid 28287: File open of file /lib/modules/2.6.32-358.6.1.el6.x86_64/updates/kernel/fs/lustre/osd_ldiskfs.ko result: 1 pid 28287: init_module failed, error: -1 Modulearg is the module name modprobe was called with. It then attempts to load any dependencies, and finally the module given in the argument. The pid is the pid of the particular instance of modprobe, then the file being opened is the module it is trying to load. When you see "init_module" failed, that is a failure to init the module listed on the "File open of file" line above. In the attached log file, mounts of Lustre are separated by "Mount starting" and "Mount complete". There is a separate modprobe of osd-ldiskfs which succeeded as part of that mount. If you look through the attached file, you will see many modules failing to init on and off on various mounts, and various module inits being attempted more than once. Often, fsfilt_ldiskfs will fail to init one or more times, but the mount will still succeed. It's also worth noting that even when the mount fails, fsfilt_ldiskfs is init'ed successfully at least once. This suggests that possibly the check that the module is loaded is racing with the module load itself. modprobe fsfilt_ldiskfs before doing the mount avoids the mount failure.

            the patch is updated

            hongchao.zhang Hongchao Zhang added a comment - the patch is updated

            This is needed for https://jira.hpdd.intel.com/browse/HYD-2311. Could we try to get this landed for 2.5 please?

            bryon Bryon Neitzel (Inactive) added a comment - This is needed for https://jira.hpdd.intel.com/browse/HYD-2311 . Could we try to get this landed for 2.5 please?

            People

              hongchao.zhang Hongchao Zhang
              brian Brian Murrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: