[LU-10525] racer test_1: Failure to initialize cl object Created: 16/Jan/18 Updated: 14/Aug/18 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.3, Lustre 2.10.4, Lustre 2.10.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
racer test_1 - test_1 failed with 2 This issue was created by maloo for sarah_lw <wei3.liu@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/60989662-f935-11e7-a7cd-52540065bddc test_1 failed with the following error: test_1 failed with 2 server: 2.10.3 RC1 EL7 test log Lustre: Skipped 3 previous similar messages
Lustre: lustre-MDT0000-mdc-ffff880041847c00: Connection restored to 10.9.5.195@tcp (at 10.9.5.195@tcp)
Lustre: Skipped 3 previous similar messages
LustreError: 16503:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x2366:0x0]: -16
LustreError: 32509:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x2486:0x0]: -16
INFO: task dir_create.sh:12824 blocked for more than 120 seconds.
Tainted: G W -- ------------ 2.6.32-696.18.7.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
dir_create.sh D 0000000000000000 0 12824 12691 0x00000080
ffff88007a03fb68 0000000000000086 ffff88007a03fba8 0000000000000080
0000000000000000 ffff880059833149 0000004b00000000 ffffffffa103b393
0000000000000098 0020000000000080 ffff88002c1a1068 ffff88007a03ffd8
Call Trace:
[<ffffffff8154d146>] __mutex_lock_slowpath+0x96/0x210
[<ffffffff811b5f68>] ? __d_lookup+0xd8/0x150
[<ffffffff8154cc6b>] mutex_lock+0x2b/0x50
[<ffffffff811aa63b>] do_lookup+0x11b/0x230
[<ffffffff811aace0>] __link_path_walk+0x200/0x1060
[<ffffffff81342f5c>] ? memory_open+0x3c/0xa0
[<ffffffff811abdfa>] path_walk+0x6a/0xe0
[<ffffffff811ad5da>] do_filp_open+0x1fa/0xd20
[<ffffffff8115aeca>] ? handle_mm_fault+0x2aa/0x3f0
[<ffffffff812a97fa>] ? strncpy_from_user+0x4a/0x90
[<ffffffff811bac52>] ? alloc_fd+0x92/0x160
[<ffffffff81197507>] do_sys_open+0x67/0x130
[<ffffffff8155660b>] ? system_call_after_swapgs+0x16b/0x220
[<ffffffff81556604>] ? system_call_after_swapgs+0x164/0x220
[<ffffffff815565fd>] ? system_call_after_swapgs+0x15d/0x220
[<ffffffff81197610>] sys_open+0x20/0x30
[<ffffffff815566d6>] system_call_fastpath+0x16/0x1b
[<ffffffff8155656a>] ? system_call_after_swapgs+0xca/0x220
INFO: task ls:18508 blocked for more than 120 seconds.
|
| Comments |
| Comment by Oleg Drokin [ 17/Jan/18 ] |
|
the message quoted here is just a symptom on the client that something is hogging MDS resources so the client cannot send more than one request and that one request is stuck in processing on mds. I looked at hte logs on mds and it seems to be somewhat confirmed there with all the "cannot add more time, not sending early replies" messages, but no definite point at what is stuck where and since it was not a crash, we did not collect any crashdumps either, I guess. As such we don't really know all that much about what was going on here that lead to MDS being stuck. |
| Comment by Sarah Liu [ 21/May/18 ] |
|
+1 on b2_10 https://testing.hpdd.intel.com/test_sets/03bf84be-5c0e-11e8-b9d3-52540065bddc |
| Comment by James Nunez (Inactive) [ 14/Aug/18 ] |
|
racer test 1 times out with dir_create, dd, mv, lfs, etc. hung is a D state for servers and client el7 with logs at https://testing.whamcloud.com/test_sets/a3be8a94-9c02-11e8-a9f7-52540065bddc. We don't see the 'initialize cl object' error in this hang. |