[LU-10870] sanityn test 40a, 40b, 40c, 40d, 40e fail with 'create is blocked' Created: 01/Apr/18 Updated: 22/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.13.0, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.5, Lustre 2.12.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Yang Sheng |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||
| Description |
|
sanityn test_40a, b, c, d, e all fail with 'create is blocked'
The output for all these tests in the test_log is == sanityn test 40a: pdirops: create vs others ======================================================= 05:31:57 (1522474317) CMD: trevis-50vm10 lctl set_param fail_loc=0x80000145 fail_loc=0x80000145 Conflict sanityn test_40a: @@@@@@ FAIL: create is blocked
The most interesting output in the console and dmesg logs is in the MDS logs [18200.405863] Lustre: DEBUG MARKER: == sanityn test 40a: pdirops: create vs others ======================================================= 05:31:57 (1522474317) [18200.592175] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000145 [18200.744375] LustreError: 1331:0:(fail.c:129:__cfs_fail_timeout_set()) cfs_fail_timeout id 145 sleeping for 15000ms [18200.745639] LustreError: 1331:0:(fail.c:129:__cfs_fail_timeout_set()) Skipped 2 previous similar messages [18215.746700] LustreError: 1331:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 145 awake [18215.747866] LustreError: 1331:0:(fail.c:133:__cfs_fail_timeout_set()) Skipped 2 previous similar messages [18216.960004] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanityn test_40a: @@@@@@ FAIL: create is blocked [18217.152621] Lustre: DEBUG MARKER: sanityn test_40a: @@@@@@ FAIL: create is blocked
Logs for this failure are at 2.11.0-RC3 Ubuntu clients - https://testing.hpdd.intel.com/test_sets/a680d104-3543-11e8-95c0-52540065bddc 2.10.1 Ubuntu clients - https://testing.hpdd.intel.com/test_sets/289b3946-ce96-11e7-9840-52540065bddc 2.10.3 el7/ZFS - https://testing.hpdd.intel.com/test_sets/e427ccbc-2cf4-11e8-b74b-52540065bddc
|
| Comments |
| Comment by Peter Jones [ 01/Apr/18 ] |
|
Yang Sheng Could you please investigate? Thanks Peter |
| Comment by Yang Sheng [ 02/Apr/18 ] |
|
From log:
00010000:00010000:1.0:1521525055.419920:0:15546:0:(ldlm_lockd.c:1239:ldlm_handle_enqueue0()) ### server-side enqueue handler START 00010000:00010000:1.0:1521525055.419924:0:15546:0:(ldlm_lockd.c:1319:ldlm_handle_enqueue0()) ### server-side enqueue handler, new lock created ns: mdt-lustre-MDT0000_UUID lock: ffff88005f262800/0x401f5317adeb9f2e lrc: 2/0,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x0 rrc: 4 type: IBT flags: 0x40000000000000 nid: local remote: 0x4c150bb57b22c4ec expref: -99 pid: 15546 timeout: 0 lvb_type: 0 00010000:00010000:1.0:1521525055.419941:0:15546:0:(ldlm_lock.c:743:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(PR) ns: mdt-lustre-MDT0000_UUID lock: ffff88005f263400/0x401f5317adeb9f35 lrc: 3/1,0 mode: --/PR res: [0x200000007:0x1:0x0].0x0 bits 0x0 rrc: 5 type: IBT flags: 0x40000000000000 nid: local remote: 0x0 expref: -99 pid: 15546 timeout: 0 lvb_type: 0 00010000:00010000:1.0:1521525055.419949:0:15546:0:(ldlm_lock.c:659:ldlm_add_bl_work_item()) ### lock incompatible; sending blocking AST. ns: mdt-lustre-MDT0000_UUID lock: ffff88005f262000/0x401f5317adeb9f20 lrc: 2/0,1 mode: CW/CW res: [0x200000007:0x1:0x0].0x0 bits 0x2 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 12416 timeout: 0 lvb_type: 0 00010000:00010000:1.0:1521525055.419953:0:15546:0:(ldlm_resource.c:1551:ldlm_resource_add_lock()) ### About to add this lock ns: mdt-lustre-MDT0000_UUID lock: ffff88005f263400/0x401f5317adeb9f35 lrc: 4/1,0 mode: --/PR res: [0x200000007:0x1:0x0].0x0 bits 0x13 rrc: 5 type: IBT flags: 0x50210000000000 nid: local remote: 0x0 expref: -99 pid: 15546 timeout: 0 lvb_type: 0 00010000:00010000:1.0:1521525055.419959:0:15546:0:(ldlm_request.c:357:ldlm_blocking_ast_nocheck()) ### Lock still has references, will be cancelled later ns: mdt-lustre-MDT0000_UUID lock: ffff88005f262000/0x401f5317adeb9f20 lrc: 3/0,1 mode: CW/CW res: [0x200000007:0x1:0x0].0x0 bits 0x2 rrc: 5 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 12416 timeout: 0 lvb_type: 0 Looks like MDS_INODELOCK_UPDATE flag was setted while 'touch DIR2/$tfile' enqueue lock. So It is very likely duplicated with Thanks, |
| Comment by James Nunez (Inactive) [ 18/Apr/18 ] |
|
Some recent failures at: |
| Comment by James Nunez (Inactive) [ 16/Sep/19 ] |
|
We are seeing this issue again or something similar: https://testing.whamcloud.com/test_sets/29e74e80-d6db-11e9-a25b-52540065bddc |
| Comment by Chris Horn [ 24/Oct/19 ] |
|
+1 on master https://testing.whamcloud.com/test_sessions/3f181300-d3f8-465a-88de-95756bf58f3c |
| Comment by Jian Yu [ 10/Feb/20 ] |
|
+1 on master: https://testing.whamcloud.com/test_sets/75e0b4fa-4ba4-11ea-b69a-52540065bddc |
| Comment by Emoly Liu [ 07/Jul/20 ] |
|
+1 on master: https://testing.whamcloud.com/test_sets/abe98c66-5293-41c4-a72b-c317b11bb2e2 |
| Comment by Nikitas Angelinas [ 05/Aug/20 ] |
|
+1 on master https://testing.whamcloud.com/test_sets/7f698e8e-3861-4ff7-b36a-fdd9b23bb69f for test_40b. Is it possible that test_40a which is failing in the same test run with "link is blocked" is due to the same issue, or should I open a separate ticket? |
| Comment by Etienne Aujames [ 14/Feb/22 ] |
|
+1 on b2_12 (2.12.8 - ZFS): https://testing.whamcloud.com/test_sets/1b0b4890-5b93-49ea-92de-881cd6d714fe |