[LU-9563] LBUG ldlm_lock_downgrade()) ASSERTION( lock->l_granted_mode & (LCK_PW | LCK_EX) ) failed: Created: 26/May/17  Updated: 05/Jun/17  Resolved: 05/Jun/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Cliff White (Inactive) Assignee: Lai Siyao
Resolution: Cannot Reproduce Votes: 0
Labels: soak
Environment:

soak performance cluster


Issue Links:
Duplicate
duplicates LU-9504 LBUG ptlrpc_handle_rs()) ASSERTION( l... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

System was running normally, no fault was induced.

[25789.588202] LustreError: 4166:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) ASSERTION( lock->l_granted_mode & (LCK_PW | LCK_EX) ) failed: ^M
[25789.609363] LustreError: 4166:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) LBUG^M
[25789.621115] Pid: 4166, comm: ptlrpc_hr01_005^M
[25789.629112] ^M
[25789.629112] Call Trace:^M
[25789.639919]  [<ffffffffa084d7ee>] libcfs_call_trace+0x4e/0x60 [libcfs]^M
[25789.650059]  [<ffffffffa084d87c>] lbug_with_loc+0x4c/0xb0 [libcfs]^M
[25789.659583]  [<ffffffffa0b5883b>] ldlm_lock_downgrade+0x19b/0x1d0 [ptlrpc]^M
[25789.669915]  [<ffffffffa0bae96f>] ptlrpc_hr_main+0x5bf/0x910 [ptlrpc]^M
[25789.679492]  [<ffffffff810c8345>] ? sched_clock_cpu+0x85/0xc0^M
[25789.688107]  [<ffffffff810c54c0>] ? default_wake_function+0x0/0x20^M
[25789.697330]  [<ffffffffa0bae3b0>] ? ptlrpc_hr_main+0x0/0x910 [ptlrpc]^M
[25789.706522]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0^M
[25789.714056]  [<ffffffff810b0980>] ? kthread+0x0/0xe0^M
[25789.721469]  [<ffffffff81697318>] ret_from_fork+0x58/0x90^M
[25789.729355]  [<ffffffff810b0980>] ? kthread+0x0/0xe0^M
[25789.736613] ^M
[25789.740048] Kernel panic - not syncing: LBUG^M

System wedged hard at this time. Will reboot and run with full debug



 Comments   
Comment by James Nunez (Inactive) [ 26/May/17 ]

Soak was running the build described at https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Soak#SoakTestingonSoak-20170525

Comment by Peter Jones [ 26/May/17 ]

Lai

Could you please advise on this one?

Thanks

Peter

Comment by Lai Siyao [ 27/May/17 ]

https://review.whamcloud.com/#/c/27207/ for LU-9504 doesn't consider race with mdt_steal_ack_lock(), I'll update that patch.

Comment by Peter Jones [ 27/May/17 ]

ok then as LU-9504 is not landed yet, let's close this asa duplicate of LU-9504

Comment by Cliff White (Inactive) [ 31/May/17 ]

Soak is dead until this issue is fixed.

Comment by Cliff White (Inactive) [ 31/May/17 ]

Tested with latest patch of LU-9504
https://review.whamcloud.com/#/c/27207/'

Soak hit this issue immediately.

ay 31 17:11:18 soak-9 kernel: LustreError: 4177:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) ASSERTION( lock->l_granted_mode & (LCK_PW | LCK_EX) ) failed:
May 31 17:11:18 soak-9 kernel: LustreError: 4177:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) LBUG
May 31 17:11:18 soak-9 kernel: Pid: 4177, comm: ptlrpc_hr01_003
May 31 17:11:18 soak-9 kernel: #012Call Trace:
May 31 17:11:18 soak-9 kernel: [<ffffffffa08247ee>] libcfs_call_trace+0x4e/0x60 [libcfs]
May 31 17:11:19 soak-9 kernel: [<ffffffffa082487c>] lbug_with_loc+0x4c/0xb0 [libcfs]
May 31 17:11:19 soak-9 kernel: [<ffffffffa0b6283b>] ldlm_lock_downgrade+0x19b/0x1d0 [ptlrpc]
May 31 17:11:19 soak-9 kernel: [<ffffffffa0bb9620>] ptlrpc_handle_rs+0x3f0/0x640 [ptlrpc]
May 31 17:11:19 soak-9 kernel: [<ffffffffa0bb9955>] ptlrpc_hr_main+0xe5/0x2c0 [ptlrpc]
May 31 17:11:19 soak-9 kernel: [<ffffffff810c54c0>] ? default_wake_function+0x0/0x20
May 31 17:11:19 soak-9 kernel: [<ffffffffa0bb9870>] ? ptlrpc_hr_main+0x0/0x2c0 [ptlrpc]
May 31 17:11:19 soak-9 kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0
May 31 17:11:19 soak-9 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0
May 31 17:11:19 soak-9 kernel: [<ffffffff81697318>] ret_from_fork+0x58/0x90
May 31 17:11:19 soak-9 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0
May 31 17:11:19 soak-9 kernel:
May 31 17:11:19 soak-9 kernel: Kernel panic - not syncing: LBUG

Soak is dead until we see a fix for this.

Comment by Cliff White (Inactive) [ 05/Jun/17 ]

Attempted to test latest patch for LU-9504, hit this issue again immediately.

Jun  5 16:51:12 soak-11 kernel: LustreError: 4204:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) ASSERTION( lock->l_granted_mode & (LCK_PW | LCK_EX) ) failed:
Jun  5 16:51:12 soak-11 kernel: LustreError: 4204:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) LBUG
Jun  5 16:51:12 soak-11 kernel: Pid: 4204, comm: ptlrpc_hr01_002
Jun  5 16:51:12 soak-11 kernel: #012Call Trace:
Jun  5 16:51:12 soak-11 kernel: [<ffffffffa08637ee>] libcfs_call_trace+0x4e/0x60 [libcfs]
Jun  5 16:51:12 soak-11 kernel: [<ffffffffa086387c>] lbug_with_loc+0x4c/0xb0 [libcfs]
Jun  5 16:51:12 soak-11 kernel: [<ffffffffa0b6e83b>] ldlm_lock_downgrade+0x19b/0x1d0 [ptlrpc]
Jun  5 16:51:12 soak-11 kernel: [<ffffffffa0bc57f8>] ptlrpc_handle_rs+0x5c8/0x700 [ptlrpc]
Jun  5 16:51:12 soak-11 kernel: [<ffffffffa0bc5a15>] ptlrpc_hr_main+0xe5/0x2c0 [ptlrpc]
Jun  5 16:51:12 soak-11 kernel: [<ffffffff810c54c0>] ? default_wake_function+0x0/0x20
Jun  5 16:51:12 soak-11 kernel: [<ffffffffa0bc5930>] ? ptlrpc_hr_main+0x0/0x2c0 [ptlrpc]
Jun  5 16:51:12 soak-11 kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0
Jun  5 16:51:12 soak-11 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0
Jun  5 16:51:12 soak-11 kernel: [<ffffffff81697318>] ret_from_fork+0x58/0x90
Jun  5 16:51:12 soak-11 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0
Comment by Cliff White (Inactive) [ 05/Jun/17 ]

Still hitting this issue, with every new LU-9504 patch. It's not fixed.

Comment by Peter Jones [ 05/Jun/17 ]

Cliff

Please add the results of the debug logs to LU-9504 - this patch is unlanded to master so does not need a new ticket

Peter

Generated at Sat Feb 10 02:27:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.