[LU-1279] failure trying to mount two targets at the same time after boot - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.7.0, Lustre 2.5.4
Affects Version/s: Lustre 2.4.0, Lustre 2.1.2
Labels:
- llnl
Environment:
lustre-modules-2.1.1-2.6.32_220.4.2.el6_lustre.gcbb4fad.x86_64_gae03fc8.x86_64

Severity:
3
Rank (Obsolete):
4025

Description

After booting an OSS, two OSTs are mounted simultaneously. The mounts fail due to module loading failure:

Lustre: OBD class driver, http://wiki.whamcloud.com/
Lustre:         Lustre Version: 2.1.1
Lustre:         Build Version: jenkins-gae03fc8-PRISTINE-2.6.32-220.4.2.el6_lustre.gcbb4fad.x86_64
Lustre: Lustre LU module (ffffffffa0578c60).
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol RQF_FLD_QUERY
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol req_capsule_server_pack
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol req_capsule_client_get
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol ptlrpc_queue_wait
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol req_capsule_fini
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol req_capsule_init
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol req_capsule_set
INFO: task hydra-agent:1590 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
hydra-agent   D 0000000000000000     0  1590      1 0x00000080
 ffff88003db09d68 0000000000000082 ffff88003d740a88 ffff88003bad0250
 ffff88003db09d68 ffffffff8113fb78 800000002c760065 0000000000000086
 ffff880037c1c678 ffff88003db09fd8 000000000000f4e8 ffff880037c1c678
Call Trace:
 [<ffffffff8113fb78>] ? vma_adjust+0x128/0x590
 [<ffffffff814ee35e>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814ee1fb>] mutex_lock+0x2b/0x50
 [<ffffffff810aaafd>] m_start+0x1d/0x40
 [<ffffffff81198cc0>] seq_read+0x90/0x3f0
 [<ffffffff811dae0e>] proc_reg_read+0x7e/0xc0
 [<ffffffff81176cb5>] vfs_read+0xb5/0x1a0
 [<ffffffff810d4582>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81176df1>] sys_read+0x51/0x90
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task modprobe:1679 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe      D 0000000000000000     0  1679   1651 0x00000080
 ffff88002ed35aa8 0000000000000082 ffff88002ed35a58 ffffffff810097cc
 ffff88003ef260f8 0000000000000000 0000000000d35a68 ffff880002213b00
 ffff880037415a78 ffff88002ed35fd8 000000000000f4e8 ffff880037415a78
Call Trace:
 [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
 [<ffffffff814ecd0e>] ? thread_return+0x4e/0x760
 [<ffffffff814edb75>] schedule_timeout+0x215/0x2e0
 [<ffffffff8104c9e9>] ? __wake_up_common+0x59/0x90
 [<ffffffff814ed7f3>] wait_for_common+0x123/0x180
 [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
 [<ffffffff8108b741>] ? __queue_work+0x41/0x50
 [<ffffffff814ed90d>] wait_for_completion+0x1d/0x20
 [<ffffffff81089c90>] call_usermodehelper_exec+0xe0/0xf0
 [<ffffffffa04966d2>] ? lnet_startup_lndnis+0x262/0x6f0 [lnet]
 [<ffffffff81089feb>] __request_module+0x18b/0x210
 [<ffffffffa0498e00>] ? lnet_parse_networks+0x90/0x7e0 [lnet]
 [<ffffffffa041aa13>] ? cfs_alloc+0x63/0x90 [libcfs]
 [<ffffffffa04966d2>] lnet_startup_lndnis+0x262/0x6f0 [lnet]
 [<ffffffffa041aa13>] ? cfs_alloc+0x63/0x90 [libcfs]
 [<ffffffffa0496c85>] LNetNIInit+0x125/0x1f0 [lnet]
 [<ffffffffa06aa13a>] ? init_module+0x0/0x597 [ptlrpc]
 [<ffffffffa05f1c89>] ptlrpc_ni_init+0x29/0x170 [ptlrpc]
 [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
 [<ffffffffa05f2053>] ptlrpc_init_portals+0x13/0xd0 [ptlrpc]
 [<ffffffffa06aa13a>] ? init_module+0x0/0x597 [ptlrpc]
 [<ffffffffa06aa21a>] init_module+0xe0/0x597 [ptlrpc]
 [<ffffffff81096d15>] ? __blocking_notifier_call_chain+0x65/0x80
 [<ffffffff8100204c>] do_one_initcall+0x3c/0x1d0
 [<ffffffff810af4e1>] sys_init_module+0xe1/0x250
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task modprobe:1688 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe      D 0000000000000000     0  1688   1687 0x00000080
 ffff88003d6d3eb8 0000000000000086 ffff88003d6d3e18 0000000000000082
 ffff88003d6d1ab8 ffff88003d6d3fd8 000000000000f4e8 ffff88003d6d1ac0
 ffff88003d6d1ab8 ffff88003d6d3fd8 000000000000f4e8 ffff88003d6d1ab8
Call Trace:
 [<ffffffff814f39dd>] ? kprobes_module_callback+0xdd/0x170
 [<ffffffff814ee35e>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff81096d15>] ? __blocking_notifier_call_chain+0x65/0x80
 [<ffffffff814ee1fb>] mutex_lock+0x2b/0x50
 [<ffffffff810af533>] sys_init_module+0x133/0x250
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol req_capsule_server_get
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol ptlrpc_at_set_req_timeout
type=1305 audit(1333395888.750:31878): auid=4294967295 ses=4294967295 op="remove rule" key=(null) list=4 res=1
type=1305 audit(1333395888.750:31879): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 res=1
readahead-collector: starting delayed service auditd
readahead-collector: sorting
readahead-collector: finished
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol ptlrpc_request_alloc_pack
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol RMF_FLD_OPC
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol ptlrpc_request_set_replen
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol RMF_FLD_MDFLD
fld: gave up waiting for init of module ptlrpc.
fld: Unknown symbol ptlrpc_req_finished
LustreError: 1679:0:(socklnd.c:2420:ksocknal_base_startup()) Can't spawn socknal scheduler[0]: -513
LustreError: 105-4: Error -100 starting up LNI tcp
LustreError: 1679:0:(events.c:728:ptlrpc_init_portals()) network initialisation failed

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

modprobe_log.txt
92 kB
18/Oct/13 12:05 AM
module-init-tools-3.9.tar.gz
1.07 MB
18/Oct/13 3:46 PM

Issue Links

duplicates

LU-4311 Mount sometimes fails with EIO on OSS with several mounts in parallel

Closed

is duplicated by

LU-3975 Race loading ldiskfs with parallel mounts

Resolved

LU-5961 Concurrent mount of ZFS targets fails when modules are not loaded

Resolved

is related to

LU-5159 Lustre MGS/MDT fails to start using initscripts using 2.4.2 based packages

Resolved

LU-2456 Dynamic LNet Config Main Development Work

Resolved

is related to

LU-3975 Race loading ldiskfs with parallel mounts

Resolved

mentioned in: Page Loading...

(1 is related to , 1 mentioned in)

Activity

[LU-1279] failure trying to mount two targets at the same time after boot

Andreas Dilger added a comment - 27/Nov/13 7:17 PM

If there is already a fix in the upstream kernel (http://thread.gmane.org/gmane.linux.kernel/1358707/focus=1358709), we should just apply this patch to the RHEL/SLES kernels that we support, until such a time that they backport the fix themselves. It doesn't make sense to do anything at the Lustre level if this is not a Lustre bug.

Andreas Dilger added a comment - 27/Nov/13 7:17 PM If there is already a fix in the upstream kernel ( http://thread.gmane.org/gmane.linux.kernel/1358707/focus=1358709 ), we should just apply this patch to the RHEL/SLES kernels that we support, until such a time that they backport the fix themselves. It doesn't make sense to do anything at the Lustre level if this is not a Lustre bug.

Hongchao Zhang added a comment - 04/Nov/13 10:18 AM

there is an issue in kernel related to loading modules in parallel, and the following loads after the first one didn't wait the module to be initialized,
which causes problem such as the filesystem type can't be found for the corresponding module has not been initialized. (please see ~~LU-3975~~ for details)

this issue could be a duplicate of it, could you please try with it.

Thanks!

Hongchao Zhang added a comment - 04/Nov/13 10:18 AM there is an issue in kernel related to loading modules in parallel, and the following loads after the first one didn't wait the module to be initialized, which causes problem such as the filesystem type can't be found for the corresponding module has not been initialized. (please see LU-3975 for details) this issue could be a duplicate of it, could you please try with it. Thanks!

Patrick Farrell (Inactive) added a comment - 28/Oct/13 5:34 PM

Further thoughts... There are various other possible sources for a -1. I don't think it's EPERM here.

From the init_module syscall in the kernel:
—
/* Do all the hard work */
mod = load_module(umod, len, uargs);
if (IS_ERR(mod))

{ mutex_unlock(&module_mutex); return PTR_ERR(mod); }

—

We also have:
ret = do_one_initcall(mod->init);

And I suspect it's coming from one of those.

Patrick Farrell (Inactive) added a comment - 28/Oct/13 5:34 PM Further thoughts... There are various other possible sources for a -1. I don't think it's EPERM here. From the init_module syscall in the kernel: — /* Do all the hard work */ mod = load_module(umod, len, uargs); if (IS_ERR(mod)) { mutex_unlock(&module_mutex); return PTR_ERR(mod); } — We also have: ret = do_one_initcall(mod->init); And I suspect it's coming from one of those.

Patrick Farrell (Inactive) added a comment - 21/Oct/13 2:34 PM

Hongchao,

A very good guess. In this case, no, module loading is definitely not disabled. We disable module load on the compute nodes of our mainframes, but this is not those nodes (which are heavily stripped down because their OS runs exclusively in memory). This problem is seen on an external CentOS 6.4 system running Lustre. In essence, it's just vanilla CentOS 6.4 with Lustre installed on it - No significant changes to the OS or settings.

I'd encourage you to test target mounts in parallel with the modified modprobe with multiple OSTs on a test system of your own - You may not see the outright mount failure we observe, but I suspect you will see a number of -1s returned.

Patrick Farrell (Inactive) added a comment - 21/Oct/13 2:34 PM Hongchao, A very good guess. In this case, no, module loading is definitely not disabled. We disable module load on the compute nodes of our mainframes, but this is not those nodes (which are heavily stripped down because their OS runs exclusively in memory). This problem is seen on an external CentOS 6.4 system running Lustre. In essence, it's just vanilla CentOS 6.4 with Lustre installed on it - No significant changes to the OS or settings. I'd encourage you to test target mounts in parallel with the modified modprobe with multiple OSTs on a test system of your own - You may not see the outright mount failure we observe, but I suspect you will see a number of -1s returned.

Hongchao Zhang added a comment - 21/Oct/13 9:47 AM

the "-1" should be the error returned by the syscall "sys_init_module", and is -EPERM
there are two conditions to return -EPERM,

SYSCALL_DEFINE3(init_module, void __user *, umod,
                unsigned long, len, const char __user *, uargs)
{
        struct module *mod;
        int ret = 0;

        /* Must have permission */
        if (!capable(CAP_SYS_MODULE) || modules_disabled)
                return -EPERM;

the "modules_disabled" is controlled by "/proc/sys/modules_disabled", I remembered that you mentioned Cray will disable module load after boot completes,
is it implemented by this way? and is there any chance ""/proc/sys/modules_disabled" is set during issuing parallel mounting commands?

Thanks

Hongchao Zhang added a comment - 21/Oct/13 9:47 AM the "-1" should be the error returned by the syscall "sys_init_module", and is -EPERM there are two conditions to return -EPERM, SYSCALL_DEFINE3(init_module, void __user *, umod, unsigned long , len, const char __user *, uargs) { struct module *mod; int ret = 0; /* Must have permission */ if (!capable(CAP_SYS_MODULE) || modules_disabled) return -EPERM; the "modules_disabled" is controlled by "/proc/sys/modules_disabled", I remembered that you mentioned Cray will disable module load after boot completes, is it implemented by this way? and is there any chance ""/proc/sys/modules_disabled" is set during issuing parallel mounting commands? Thanks

Patrick Farrell (Inactive) added a comment - 18/Oct/13 9:36 PM

James - I just tried briefly and I'm not able to replicate the issue on a test system with >2 OSTs/OSS (we're seeing it on our 'production' development systems, rather than our dedicated Lustre testing systems).

But as Cory said, it seems obvious that #5512 would fix the problem. If the module does not exist, loading it can hardly be a problem.

Still, I'm worried we might just be dodging this issue and leaving others waiting to be discovered, as this is not the only module load failure in parallel mounts. It's just the only one causing a noticeable problem now. Unless we can point to a clear difference in how the other modules are loaded that shows why they're not failing the mount when one of their inits fails, I think we may see problems there in the future.

Patrick Farrell (Inactive) added a comment - 18/Oct/13 9:36 PM James - I just tried briefly and I'm not able to replicate the issue on a test system with >2 OSTs/OSS (we're seeing it on our 'production' development systems, rather than our dedicated Lustre testing systems). But as Cory said, it seems obvious that #5512 would fix the problem. If the module does not exist, loading it can hardly be a problem. Still, I'm worried we might just be dodging this issue and leaving others waiting to be discovered, as this is not the only module load failure in parallel mounts. It's just the only one causing a noticeable problem now. Unless we can point to a clear difference in how the other modules are loaded that shows why they're not failing the mount when one of their inits fails, I think we may see problems there in the future.

Cory Spitz added a comment - 18/Oct/13 7:58 PM - edited

I'm sure that #5512 will workaround this problem, but isn't there a larger issue to resolve? Patrick noted that modules like mdd and lquota also fail in the same manner. Should we investigate fixing racy module load for all modules?

Cory Spitz added a comment - 18/Oct/13 7:58 PM - edited I'm sure that #5512 will workaround this problem, but isn't there a larger issue to resolve? Patrick noted that modules like mdd and lquota also fail in the same manner. Should we investigate fixing racy module load for all modules?

Patrick Farrell (Inactive) added a comment - 18/Oct/13 7:14 PM

James:

Possibly. I have yet to replicate those on a system where I can install arbitrary Lustre code [it's only happening on some of our systems, seems very timing dependent], but I'll give it a try. (Replication first of all...)

Patrick Farrell (Inactive) added a comment - 18/Oct/13 7:14 PM James: Possibly. I have yet to replicate those on a system where I can install arbitrary Lustre code [it's only happening on some of our systems, seems very timing dependent] , but I'll give it a try. (Replication first of all...)

James A Simmons added a comment - 18/Oct/13 7:11 PM

Patrick can you try http://review.whamcloud.com/5512 to see if the fsfilt issues go away for you?

James A Simmons added a comment - 18/Oct/13 7:11 PM Patrick can you try http://review.whamcloud.com/5512 to see if the fsfilt issues go away for you?

Andreas Dilger added a comment - 18/Oct/13 6:47 PM

It is important to note that the fsfilt code that is loading the fsfilt_ldiskfs module will be deleted in Lustre 2.6. The code is almost obsolete in Lustre 2.5, but the patch to remove it (http://review.whamcloud.com/5512) didn't get landed before the feature freeze. Any fixes made in this area (e.g. having mount.lustre load fsfilt_ldiskfs for ldiskfs servers, or putting a mutex around the probe/load/check code) does not need to be landed for Lustre 2.5 or later.

I'd prefer to just land 5512 to b2_5 to delete the affected code to fix this problem instead. The patch may be large, but is mostly just moving code around so that lvfs and fsfilt can be deleted.

Andreas Dilger added a comment - 18/Oct/13 6:47 PM It is important to note that the fsfilt code that is loading the fsfilt_ldiskfs module will be deleted in Lustre 2.6. The code is almost obsolete in Lustre 2.5, but the patch to remove it ( http://review.whamcloud.com/5512 ) didn't get landed before the feature freeze. Any fixes made in this area (e.g. having mount.lustre load fsfilt_ldiskfs for ldiskfs servers, or putting a mutex around the probe/load/check code) does not need to be landed for Lustre 2.5 or later. I'd prefer to just land 5512 to b2_5 to delete the affected code to fix this problem instead. The patch may be large, but is mostly just moving code around so that lvfs and fsfilt can be deleted.

Patrick Farrell (Inactive) added a comment - 18/Oct/13 3:46 PM - edited

I've also attached the module-init-tools source with my instrumented modprobe. See the note on the attachment for more details.

One further note - This ticket gives the affected version as 2.1. We're seeing the related issue I described above in 2.4.1.

Patrick Farrell (Inactive) added a comment - 18/Oct/13 3:46 PM - edited I've also attached the module-init-tools source with my instrumented modprobe. See the note on the attachment for more details. One further note - This ticket gives the affected version as 2.1. We're seeing the related issue I described above in 2.4.1.

People

Assignee:: Hongchao Zhang

Reporter:: Brian Murrell (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 02/Apr/12 5:39 PM

Updated:: 27/Apr/15 8:17 PM

Resolved:: 21/Oct/14 7:36 PM