Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.4.0, Lustre 2.1.2
-
lustre-modules-2.1.1-2.6.32_220.4.2.el6_lustre.gcbb4fad.x86_64_gae03fc8.x86_64
-
3
-
4025
Description
After booting an OSS, two OSTs are mounted simultaneously. The mounts fail due to module loading failure:
Lustre: OBD class driver, http://wiki.whamcloud.com/ Lustre: Lustre Version: 2.1.1 Lustre: Build Version: jenkins-gae03fc8-PRISTINE-2.6.32-220.4.2.el6_lustre.gcbb4fad.x86_64 Lustre: Lustre LU module (ffffffffa0578c60). fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol RQF_FLD_QUERY fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol req_capsule_server_pack fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol req_capsule_client_get fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol ptlrpc_queue_wait fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol req_capsule_fini fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol req_capsule_init fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol req_capsule_set INFO: task hydra-agent:1590 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. hydra-agent D 0000000000000000 0 1590 1 0x00000080 ffff88003db09d68 0000000000000082 ffff88003d740a88 ffff88003bad0250 ffff88003db09d68 ffffffff8113fb78 800000002c760065 0000000000000086 ffff880037c1c678 ffff88003db09fd8 000000000000f4e8 ffff880037c1c678 Call Trace: [<ffffffff8113fb78>] ? vma_adjust+0x128/0x590 [<ffffffff814ee35e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814ee1fb>] mutex_lock+0x2b/0x50 [<ffffffff810aaafd>] m_start+0x1d/0x40 [<ffffffff81198cc0>] seq_read+0x90/0x3f0 [<ffffffff811dae0e>] proc_reg_read+0x7e/0xc0 [<ffffffff81176cb5>] vfs_read+0xb5/0x1a0 [<ffffffff810d4582>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff81176df1>] sys_read+0x51/0x90 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b INFO: task modprobe:1679 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. modprobe D 0000000000000000 0 1679 1651 0x00000080 ffff88002ed35aa8 0000000000000082 ffff88002ed35a58 ffffffff810097cc ffff88003ef260f8 0000000000000000 0000000000d35a68 ffff880002213b00 ffff880037415a78 ffff88002ed35fd8 000000000000f4e8 ffff880037415a78 Call Trace: [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320 [<ffffffff814ecd0e>] ? thread_return+0x4e/0x760 [<ffffffff814edb75>] schedule_timeout+0x215/0x2e0 [<ffffffff8104c9e9>] ? __wake_up_common+0x59/0x90 [<ffffffff814ed7f3>] wait_for_common+0x123/0x180 [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20 [<ffffffff8108b741>] ? __queue_work+0x41/0x50 [<ffffffff814ed90d>] wait_for_completion+0x1d/0x20 [<ffffffff81089c90>] call_usermodehelper_exec+0xe0/0xf0 [<ffffffffa04966d2>] ? lnet_startup_lndnis+0x262/0x6f0 [lnet] [<ffffffff81089feb>] __request_module+0x18b/0x210 [<ffffffffa0498e00>] ? lnet_parse_networks+0x90/0x7e0 [lnet] [<ffffffffa041aa13>] ? cfs_alloc+0x63/0x90 [libcfs] [<ffffffffa04966d2>] lnet_startup_lndnis+0x262/0x6f0 [lnet] [<ffffffffa041aa13>] ? cfs_alloc+0x63/0x90 [libcfs] [<ffffffffa0496c85>] LNetNIInit+0x125/0x1f0 [lnet] [<ffffffffa06aa13a>] ? init_module+0x0/0x597 [ptlrpc] [<ffffffffa05f1c89>] ptlrpc_ni_init+0x29/0x170 [ptlrpc] [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20 [<ffffffffa05f2053>] ptlrpc_init_portals+0x13/0xd0 [ptlrpc] [<ffffffffa06aa13a>] ? init_module+0x0/0x597 [ptlrpc] [<ffffffffa06aa21a>] init_module+0xe0/0x597 [ptlrpc] [<ffffffff81096d15>] ? __blocking_notifier_call_chain+0x65/0x80 [<ffffffff8100204c>] do_one_initcall+0x3c/0x1d0 [<ffffffff810af4e1>] sys_init_module+0xe1/0x250 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b INFO: task modprobe:1688 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. modprobe D 0000000000000000 0 1688 1687 0x00000080 ffff88003d6d3eb8 0000000000000086 ffff88003d6d3e18 0000000000000082 ffff88003d6d1ab8 ffff88003d6d3fd8 000000000000f4e8 ffff88003d6d1ac0 ffff88003d6d1ab8 ffff88003d6d3fd8 000000000000f4e8 ffff88003d6d1ab8 Call Trace: [<ffffffff814f39dd>] ? kprobes_module_callback+0xdd/0x170 [<ffffffff814ee35e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff81096d15>] ? __blocking_notifier_call_chain+0x65/0x80 [<ffffffff814ee1fb>] mutex_lock+0x2b/0x50 [<ffffffff810af533>] sys_init_module+0x133/0x250 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol req_capsule_server_get fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol ptlrpc_at_set_req_timeout type=1305 audit(1333395888.750:31878): auid=4294967295 ses=4294967295 op="remove rule" key=(null) list=4 res=1 type=1305 audit(1333395888.750:31879): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 res=1 readahead-collector: starting delayed service auditd readahead-collector: sorting readahead-collector: finished fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol ptlrpc_request_alloc_pack fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol RMF_FLD_OPC fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol ptlrpc_request_set_replen fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol RMF_FLD_MDFLD fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol ptlrpc_req_finished LustreError: 1679:0:(socklnd.c:2420:ksocknal_base_startup()) Can't spawn socknal scheduler[0]: -513 LustreError: 105-4: Error -100 starting up LNI tcp LustreError: 1679:0:(events.c:728:ptlrpc_init_portals()) network initialisation failed
Attachments
Issue Links
- duplicates
-
LU-4311 Mount sometimes fails with EIO on OSS with several mounts in parallel
-
- Closed
-
- is duplicated by
-
LU-3975 Race loading ldiskfs with parallel mounts
-
- Resolved
-
-
LU-5961 Concurrent mount of ZFS targets fails when modules are not loaded
-
- Resolved
-
- is related to
-
LU-5159 Lustre MGS/MDT fails to start using initscripts using 2.4.2 based packages
-
- Resolved
-
-
LU-2456 Dynamic LNet Config Main Development Work
-
- Resolved
-
- is related to
-
LU-3975 Race loading ldiskfs with parallel mounts
-
- Resolved
-
- mentioned in
-
Page Loading...
This has never (to my knowledge) been reported on SLES, but reported from multiple sources on RHEL 6.x, so I think it is reasonable to mark this as resolved for 2.5.4 and 2.7 based on Li Wei's fix having landed. If this is ever seen on SLES then we can track that issue under a new ticket.