[LU-1279] failure trying to mount two targets at the same time after boot - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.7.0, Lustre 2.5.4
Affects Version/s: Lustre 2.4.0, Lustre 2.1.2
Labels:
- llnl
Environment:
lustre-modules-2.1.1-2.6.32_220.4.2.el6_lustre.gcbb4fad.x86_64_gae03fc8.x86_64

Severity:
3
Rank (Obsolete):
4025

Description

After booting an OSS, two OSTs are mounted simultaneously. The mounts fail due to module loading failure:

Lustre: OBD class driver, http://wiki.whamcloud.com/
Lustre:         Lustre Version: 2.1.1
Lustre:         Build Version: jenkins-gae03fc8-PRISTINE-2.6.32-220.4.2.el6_lustre.gcbb4fad.x86_64
Lustre: Lustre LU module (ffffffffa0578c60).
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol RQF_FLD_QUERY
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol req_capsule_server_pack
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol req_capsule_client_get
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol ptlrpc_queue_wait
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol req_capsule_fini
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol req_capsule_init
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol req_capsule_set
INFO: task hydra-agent:1590 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
hydra-agent   D 0000000000000000     0  1590      1 0x00000080
 ffff88003db09d68 0000000000000082 ffff88003d740a88 ffff88003bad0250
 ffff88003db09d68 ffffffff8113fb78 800000002c760065 0000000000000086
 ffff880037c1c678 ffff88003db09fd8 000000000000f4e8 ffff880037c1c678
Call Trace:
 [<ffffffff8113fb78>] ? vma_adjust+0x128/0x590
 [<ffffffff814ee35e>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814ee1fb>] mutex_lock+0x2b/0x50
 [<ffffffff810aaafd>] m_start+0x1d/0x40
 [<ffffffff81198cc0>] seq_read+0x90/0x3f0
 [<ffffffff811dae0e>] proc_reg_read+0x7e/0xc0
 [<ffffffff81176cb5>] vfs_read+0xb5/0x1a0
 [<ffffffff810d4582>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81176df1>] sys_read+0x51/0x90
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task modprobe:1679 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe      D 0000000000000000     0  1679   1651 0x00000080
 ffff88002ed35aa8 0000000000000082 ffff88002ed35a58 ffffffff810097cc
 ffff88003ef260f8 0000000000000000 0000000000d35a68 ffff880002213b00
 ffff880037415a78 ffff88002ed35fd8 000000000000f4e8 ffff880037415a78
Call Trace:
 [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
 [<ffffffff814ecd0e>] ? thread_return+0x4e/0x760
 [<ffffffff814edb75>] schedule_timeout+0x215/0x2e0
 [<ffffffff8104c9e9>] ? __wake_up_common+0x59/0x90
 [<ffffffff814ed7f3>] wait_for_common+0x123/0x180
 [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
 [<ffffffff8108b741>] ? __queue_work+0x41/0x50
 [<ffffffff814ed90d>] wait_for_completion+0x1d/0x20
 [<ffffffff81089c90>] call_usermodehelper_exec+0xe0/0xf0
 [<ffffffffa04966d2>] ? lnet_startup_lndnis+0x262/0x6f0 [lnet]
 [<ffffffff81089feb>] __request_module+0x18b/0x210
 [<ffffffffa0498e00>] ? lnet_parse_networks+0x90/0x7e0 [lnet]
 [<ffffffffa041aa13>] ? cfs_alloc+0x63/0x90 [libcfs]
 [<ffffffffa04966d2>] lnet_startup_lndnis+0x262/0x6f0 [lnet]
 [<ffffffffa041aa13>] ? cfs_alloc+0x63/0x90 [libcfs]
 [<ffffffffa0496c85>] LNetNIInit+0x125/0x1f0 [lnet]
 [<ffffffffa06aa13a>] ? init_module+0x0/0x597 [ptlrpc]
 [<ffffffffa05f1c89>] ptlrpc_ni_init+0x29/0x170 [ptlrpc]
 [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
 [<ffffffffa05f2053>] ptlrpc_init_portals+0x13/0xd0 [ptlrpc]
 [<ffffffffa06aa13a>] ? init_module+0x0/0x597 [ptlrpc]
 [<ffffffffa06aa21a>] init_module+0xe0/0x597 [ptlrpc]
 [<ffffffff81096d15>] ? __blocking_notifier_call_chain+0x65/0x80
 [<ffffffff8100204c>] do_one_initcall+0x3c/0x1d0
 [<ffffffff810af4e1>] sys_init_module+0xe1/0x250
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task modprobe:1688 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe      D 0000000000000000     0  1688   1687 0x00000080
 ffff88003d6d3eb8 0000000000000086 ffff88003d6d3e18 0000000000000082
 ffff88003d6d1ab8 ffff88003d6d3fd8 000000000000f4e8 ffff88003d6d1ac0
 ffff88003d6d1ab8 ffff88003d6d3fd8 000000000000f4e8 ffff88003d6d1ab8
Call Trace:
 [<ffffffff814f39dd>] ? kprobes_module_callback+0xdd/0x170
 [<ffffffff814ee35e>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff81096d15>] ? __blocking_notifier_call_chain+0x65/0x80
 [<ffffffff814ee1fb>] mutex_lock+0x2b/0x50
 [<ffffffff810af533>] sys_init_module+0x133/0x250
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol req_capsule_server_get
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol ptlrpc_at_set_req_timeout
type=1305 audit(1333395888.750:31878): auid=4294967295 ses=4294967295 op="remove rule" key=(null) list=4 res=1
type=1305 audit(1333395888.750:31879): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 res=1
readahead-collector: starting delayed service auditd
readahead-collector: sorting
readahead-collector: finished
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol ptlrpc_request_alloc_pack
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol RMF_FLD_OPC
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol ptlrpc_request_set_replen
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol RMF_FLD_MDFLD
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol ptlrpc_req_finished
LustreError: 1679:0:(socklnd.c:2420:ksocknal_base_startup()) Can't spawn socknal scheduler[0]: -513
LustreError: 105-4: Error -100 starting up LNI tcp
LustreError: 1679:0:(events.c:728:ptlrpc_init_portals()) network initialisation failed

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

modprobe_log.txt
92 kB
18/Oct/13 12:05 AM
module-init-tools-3.9.tar.gz
1.07 MB
18/Oct/13 3:46 PM

Issue Links

duplicates

LU-4311 Mount sometimes fails with EIO on OSS with several mounts in parallel

Closed

is duplicated by

LU-3975 Race loading ldiskfs with parallel mounts

Resolved

LU-5961 Concurrent mount of ZFS targets fails when modules are not loaded

Resolved

is related to

LU-5159 Lustre MGS/MDT fails to start using initscripts using 2.4.2 based packages

Resolved

LU-2456 Dynamic LNet Config Main Development Work

Resolved

is related to

LU-3975 Race loading ldiskfs with parallel mounts

Resolved

mentioned in: Page Loading...

(1 is related to , 1 mentioned in)

Activity

[LU-1279] failure trying to mount two targets at the same time after boot

Peter Jones made changes - 27/Apr/15 8:17 PM

Labels

Original: llnl lu_st

New: llnl

Andreas Dilger made changes - 28/Nov/14 6:07 PM

Link

New: This issue is duplicated by ~~LU-5961~~ [ ~~LU-5961~~ ]

Peter Jones made changes - 21/Oct/14 7:37 PM

Labels

Original: llnl lu_st mq414

New: llnl lu_st

Peter Jones made changes - 21/Oct/14 7:36 PM

Fix Version/s		New: Lustre 2.7.0 [ 10631 ]
Fix Version/s		New: Lustre 2.5.4 [ 11190 ]
Resolution		New: Fixed [ 1 ]
Status	Original: Open [ 1 ]	New: Resolved [ 5 ]

Peter Jones made changes - 29/Sep/14 7:18 PM

Labels

Original: llnl lu_st

New: llnl lu_st mq414

Jodi Levi (Inactive) made changes - 20/Aug/14 12:20 PM

Labels

Original: llnl

New: llnl lu_st

Jodi Levi (Inactive) made changes - 08/Aug/14 3:44 PM

Remote Link

New: This issue links to "Page (HPDD Community Wiki)" [ 12684 ]

Li Wei (Inactive) made changes - 25/Jul/14 9:02 AM

Affects Version/s

New: Lustre 2.4.0 [ 10154 ]

Hongchao Zhang made changes - 09/Jul/14 1:43 PM

Link

New: This issue is related to ~~LU-2456~~ [ ~~LU-2456~~ ]

Hongchao Zhang made changes - 09/Jul/14 1:38 PM

Link

New: This issue is related to ~~LU-5159~~ [ ~~LU-5159~~ ]

People

Assignee:: Hongchao Zhang

Reporter:: Brian Murrell (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 02/Apr/12 5:39 PM

Updated:: 27/Apr/15 8:17 PM

Resolved:: 21/Oct/14 7:36 PM