Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1279

failure trying to mount two targets at the same time after boot

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.4.0, Lustre 2.1.2
    • Fix Version/s: Lustre 2.7.0, Lustre 2.5.4
    • Labels:
    • Environment:
      lustre-modules-2.1.1-2.6.32_220.4.2.el6_lustre.gcbb4fad.x86_64_gae03fc8.x86_64
    • Severity:
      3
    • Rank (Obsolete):
      4025

      Description

      After booting an OSS, two OSTs are mounted simultaneously. The mounts fail due to module loading failure:

      Lustre: OBD class driver, http://wiki.whamcloud.com/
      Lustre:         Lustre Version: 2.1.1
      Lustre:         Build Version: jenkins-gae03fc8-PRISTINE-2.6.32-220.4.2.el6_lustre.gcbb4fad.x86_64
      Lustre: Lustre LU module (ffffffffa0578c60).
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol RQF_FLD_QUERY
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_server_pack
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_client_get
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_queue_wait
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_fini
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_init
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_set
      INFO: task hydra-agent:1590 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      hydra-agent   D 0000000000000000     0  1590      1 0x00000080
       ffff88003db09d68 0000000000000082 ffff88003d740a88 ffff88003bad0250
       ffff88003db09d68 ffffffff8113fb78 800000002c760065 0000000000000086
       ffff880037c1c678 ffff88003db09fd8 000000000000f4e8 ffff880037c1c678
      Call Trace:
       [<ffffffff8113fb78>] ? vma_adjust+0x128/0x590
       [<ffffffff814ee35e>] __mutex_lock_slowpath+0x13e/0x180
       [<ffffffff814ee1fb>] mutex_lock+0x2b/0x50
       [<ffffffff810aaafd>] m_start+0x1d/0x40
       [<ffffffff81198cc0>] seq_read+0x90/0x3f0
       [<ffffffff811dae0e>] proc_reg_read+0x7e/0xc0
       [<ffffffff81176cb5>] vfs_read+0xb5/0x1a0
       [<ffffffff810d4582>] ? audit_syscall_entry+0x272/0x2a0
       [<ffffffff81176df1>] sys_read+0x51/0x90
       [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      INFO: task modprobe:1679 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      modprobe      D 0000000000000000     0  1679   1651 0x00000080
       ffff88002ed35aa8 0000000000000082 ffff88002ed35a58 ffffffff810097cc
       ffff88003ef260f8 0000000000000000 0000000000d35a68 ffff880002213b00
       ffff880037415a78 ffff88002ed35fd8 000000000000f4e8 ffff880037415a78
      Call Trace:
       [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
       [<ffffffff814ecd0e>] ? thread_return+0x4e/0x760
       [<ffffffff814edb75>] schedule_timeout+0x215/0x2e0
       [<ffffffff8104c9e9>] ? __wake_up_common+0x59/0x90
       [<ffffffff814ed7f3>] wait_for_common+0x123/0x180
       [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
       [<ffffffff8108b741>] ? __queue_work+0x41/0x50
       [<ffffffff814ed90d>] wait_for_completion+0x1d/0x20
       [<ffffffff81089c90>] call_usermodehelper_exec+0xe0/0xf0
       [<ffffffffa04966d2>] ? lnet_startup_lndnis+0x262/0x6f0 [lnet]
       [<ffffffff81089feb>] __request_module+0x18b/0x210
       [<ffffffffa0498e00>] ? lnet_parse_networks+0x90/0x7e0 [lnet]
       [<ffffffffa041aa13>] ? cfs_alloc+0x63/0x90 [libcfs]
       [<ffffffffa04966d2>] lnet_startup_lndnis+0x262/0x6f0 [lnet]
       [<ffffffffa041aa13>] ? cfs_alloc+0x63/0x90 [libcfs]
       [<ffffffffa0496c85>] LNetNIInit+0x125/0x1f0 [lnet]
       [<ffffffffa06aa13a>] ? init_module+0x0/0x597 [ptlrpc]
       [<ffffffffa05f1c89>] ptlrpc_ni_init+0x29/0x170 [ptlrpc]
       [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
       [<ffffffffa05f2053>] ptlrpc_init_portals+0x13/0xd0 [ptlrpc]
       [<ffffffffa06aa13a>] ? init_module+0x0/0x597 [ptlrpc]
       [<ffffffffa06aa21a>] init_module+0xe0/0x597 [ptlrpc]
       [<ffffffff81096d15>] ? __blocking_notifier_call_chain+0x65/0x80
       [<ffffffff8100204c>] do_one_initcall+0x3c/0x1d0
       [<ffffffff810af4e1>] sys_init_module+0xe1/0x250
       [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      INFO: task modprobe:1688 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      modprobe      D 0000000000000000     0  1688   1687 0x00000080
       ffff88003d6d3eb8 0000000000000086 ffff88003d6d3e18 0000000000000082
       ffff88003d6d1ab8 ffff88003d6d3fd8 000000000000f4e8 ffff88003d6d1ac0
       ffff88003d6d1ab8 ffff88003d6d3fd8 000000000000f4e8 ffff88003d6d1ab8
      Call Trace:
       [<ffffffff814f39dd>] ? kprobes_module_callback+0xdd/0x170
       [<ffffffff814ee35e>] __mutex_lock_slowpath+0x13e/0x180
       [<ffffffff81096d15>] ? __blocking_notifier_call_chain+0x65/0x80
       [<ffffffff814ee1fb>] mutex_lock+0x2b/0x50
       [<ffffffff810af533>] sys_init_module+0x133/0x250
       [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_server_get
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_at_set_req_timeout
      type=1305 audit(1333395888.750:31878): auid=4294967295 ses=4294967295 op="remove rule" key=(null) list=4 res=1
      type=1305 audit(1333395888.750:31879): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 res=1
      readahead-collector: starting delayed service auditd
      readahead-collector: sorting
      readahead-collector: finished
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_request_alloc_pack
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol RMF_FLD_OPC
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_request_set_replen
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol RMF_FLD_MDFLD
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_req_finished
      LustreError: 1679:0:(socklnd.c:2420:ksocknal_base_startup()) Can't spawn socknal scheduler[0]: -513
      LustreError: 105-4: Error -100 starting up LNI tcp
      LustreError: 1679:0:(events.c:728:ptlrpc_init_portals()) network initialisation failed
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hongchao.zhang Hongchao Zhang
                Reporter:
                brian Brian Murrell (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                24 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: