[LU-10525] racer test_1: Failure to initialize cl object - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.10.3, Lustre 2.10.4, Lustre 2.10.5
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

racer test_1 - test_1 failed with 2
^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/60989662-f935-11e7-a7cd-52540065bddc

test_1 failed with the following error:

test_1 failed with 2

server: 2.10.3 RC1 EL7
client: 2.10.3 RC1 EL6.9

test log

Lustre: Skipped 3 previous similar messages
Lustre: lustre-MDT0000-mdc-ffff880041847c00: Connection restored to 10.9.5.195@tcp (at 10.9.5.195@tcp)
Lustre: Skipped 3 previous similar messages
LustreError: 16503:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x2366:0x0]: -16
LustreError: 32509:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x2486:0x0]: -16
INFO: task dir_create.sh:12824 blocked for more than 120 seconds.
      Tainted: G        W  -- ------------    2.6.32-696.18.7.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
dir_create.sh D 0000000000000000     0 12824  12691 0x00000080
 ffff88007a03fb68 0000000000000086 ffff88007a03fba8 0000000000000080
 0000000000000000 ffff880059833149 0000004b00000000 ffffffffa103b393
 0000000000000098 0020000000000080 ffff88002c1a1068 ffff88007a03ffd8
Call Trace:
 [<ffffffff8154d146>] __mutex_lock_slowpath+0x96/0x210
 [<ffffffff811b5f68>] ? __d_lookup+0xd8/0x150
 [<ffffffff8154cc6b>] mutex_lock+0x2b/0x50
 [<ffffffff811aa63b>] do_lookup+0x11b/0x230
 [<ffffffff811aace0>] __link_path_walk+0x200/0x1060
 [<ffffffff81342f5c>] ? memory_open+0x3c/0xa0
 [<ffffffff811abdfa>] path_walk+0x6a/0xe0
 [<ffffffff811ad5da>] do_filp_open+0x1fa/0xd20
 [<ffffffff8115aeca>] ? handle_mm_fault+0x2aa/0x3f0
 [<ffffffff812a97fa>] ? strncpy_from_user+0x4a/0x90
 [<ffffffff811bac52>] ? alloc_fd+0x92/0x160
 [<ffffffff81197507>] do_sys_open+0x67/0x130
 [<ffffffff8155660b>] ? system_call_after_swapgs+0x16b/0x220
 [<ffffffff81556604>] ? system_call_after_swapgs+0x164/0x220
 [<ffffffff815565fd>] ? system_call_after_swapgs+0x15d/0x220
 [<ffffffff81197610>] sys_open+0x20/0x30
 [<ffffffff815566d6>] system_call_fastpath+0x16/0x1b
 [<ffffffff8155656a>] ? system_call_after_swapgs+0xca/0x220
INFO: task ls:18508 blocked for more than 120 seconds.

Attachments

Issue Links

is related to

LU-6758 racer test_1: test failed to respond and timed out

Closed

mentioned in: Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.

Activity

[LU-10525] racer test_1: Failure to initialize cl object

James Nunez (Inactive) added a comment - 14/Aug/18 9:49 PM

racer test 1 times out with dir_create, dd, mv, lfs, etc. hung is a D state for servers and client el7 with logs at https://testing.whamcloud.com/test_sets/a3be8a94-9c02-11e8-a9f7-52540065bddc. We don't see the 'initialize cl object' error in this hang.

James Nunez (Inactive) added a comment - 14/Aug/18 9:49 PM racer test 1 times out with dir_create, dd, mv, lfs, etc. hung is a D state for servers and client el7 with logs at https://testing.whamcloud.com/test_sets/a3be8a94-9c02-11e8-a9f7-52540065bddc . We don't see the 'initialize cl object' error in this hang.

Sarah Liu added a comment - 21/May/18 4:38 PM

+1 on b2_10 https://testing.hpdd.intel.com/test_sets/03bf84be-5c0e-11e8-b9d3-52540065bddc

Sarah Liu added a comment - 21/May/18 4:38 PM +1 on b2_10 https://testing.hpdd.intel.com/test_sets/03bf84be-5c0e-11e8-b9d3-52540065bddc

Oleg Drokin added a comment - 17/Jan/18 8:05 PM

the message quoted here is just a symptom on the client that something is hogging MDS resources so the client cannot send more than one request and that one request is stuck in processing on mds.

I looked at hte logs on mds and it seems to be somewhat confirmed there with all the "cannot add more time, not sending early replies" messages, but no definite point at what is stuck where and since it was not a crash, we did not collect any crashdumps either, I guess. As such we don't really know all that much about what was going on here that lead to MDS being stuck.

Oleg Drokin added a comment - 17/Jan/18 8:05 PM the message quoted here is just a symptom on the client that something is hogging MDS resources so the client cannot send more than one request and that one request is stuck in processing on mds. I looked at hte logs on mds and it seems to be somewhat confirmed there with all the "cannot add more time, not sending early replies" messages, but no definite point at what is stuck where and since it was not a crash, we did not collect any crashdumps either, I guess. As such we don't really know all that much about what was going on here that lead to MDS being stuck.

racer test_1: Failure to initialize cl object

Details

Description

Attachments

Issue Links

Activity

People

Dates