[LU-15757] sanityn test_109: Oops in ll_md_blocking_ast() at umount Created: 19/Apr/22 Updated: 08/Nov/22 Resolved: 12/May/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Upstream, Lustre 2.15.0 |
| Fix Version/s: | Lustre 2.15.0, Lustre 2.15.2 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Alex Zhuravlev | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
Trace:
PID: 622621 TASK: ffff89d05b2117c0 CPU: 1 COMMAND: "ll_imp_inval"
#0 [ffff89d04d307b70] panic at ffffffff860af786
/tmp/kernel/kernel/panic.c: 299
#1 [ffff89d04d307c00] ll_md_blocking_ast at ffffffffc1919052 [lustre]
/home/lustre/master-mine/lustre/llite/namei.c: 388
#2 [ffff89d04d307c60] ldlm_cancel_callback at ffffffffc0fc074c [ptlrpc]
/home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_lock.c: 2445
#3 [ffff89d04d307cb0] ldlm_cli_cancel_local at ffffffffc0fd8ce6 [ptlrpc]
/home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_request.c: 1244
#4 [ffff89d04d307cd0] ldlm_cli_cancel at ffffffffc0fde6a8 [ptlrpc]
/home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_request.c: 1569
#5 [ffff89d04d307d30] cleanup_resource_queue at ffffffffc0fc278c [ptlrpc]
/home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_resource.c: 1091
#6 [ffff89d04d307d80] ldlm_resource_cleanup at ffffffffc0fc2914 [ptlrpc]
/home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_resource.c: 1106
#7 [ffff89d04d307d98] ldlm_resource_clean_hash at ffffffffc0fc294c [ptlrpc]
/home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_resource.c: 1119
#8 [ffff89d04d307da8] cfs_hash_for_each_relax at ffffffffc0b39cc5 [libcfs]
/home/lustre/master-mine/libcfs/libcfs/hash.c: 1644
#9 [ffff89d04d307e20] cfs_hash_for_each_nolock at ffffffffc0b3d40f [libcfs]
/home/lustre/master-mine/libcfs/include/libcfs/libcfs_hash.h: 402
#10 [ffff89d04d307e48] ldlm_namespace_cleanup at ffffffffc0fc2ce6 [ptlrpc]
/home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_resource.c: 1163
#11 [ffff89d04d307e60] mdc_import_event at ffffffffc13456f6 [mdc]
/home/lustre/master-mine/lustre/mdc/mdc_request.c: 2712
#12 [ffff89d04d307e90] ptlrpc_invalidate_import at ffffffffc102460d [ptlrpc]
/home/lustre/master-mine/libcfs/include/libcfs/libcfs_debug.h: 154
after adding an additional assertiong:
if ((bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM))) {
LASSERT(inode);
LASSERT(inode->i_sb);
LASSERT(inode->i_sb->s_root);
}
it's: LustreError: 622621:0:(namei.c:389:ll_lock_cancel_bits()) ASSERTION( inode->i_sb->s_root ) failed BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 |
| Comments |
| Comment by Gerrit Updater [ 19/Apr/22 ] |
|
"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47086 |
| Comment by Alex Zhuravlev [ 19/Apr/22 ] |
|
not sure how llite's umount can be blocked until all import's invalidations are done, so the patch above just checks s_root to ensure it's not NULL. |
| Comment by John Hammond [ 10/May/22 ] |
|
Do you think this might be related to LU-15835? I looked at several recent sanityn sessions this morning. It appears that when we hit the warning in test_102 then the client crashes in unmount in test_109. And conversely if there was no warning then there is no crash. |
| Comment by Andreas Dilger [ 10/May/22 ] |
|
+2 timeouts on master in sanityn test_109 in the past few days: Not sure if this is a regression because of some recent patch landing, or just coincidence. I hadn't seen John's comment that there are also crashes in this test. That would be more of a smoking gun - most crashes are 2022-05-05 or later. There was patch https://review.whamcloud.com/46693 " |
| Comment by John Hammond [ 11/May/22 ] |
|
Seems to be reproduced for fstype=zfs, mdtcount=4 (or > 1?) by sanityn test_102 followed by umount. See https://review.whamcloud.com/#/c/47277/ and https://testing.whamcloud.com/test_sets/74867bfb-0bd6-4bec-80f4-16c5f67d8d7c |
| Comment by Gerrit Updater [ 12/May/22 ] |
|
"John L. Hammond <jhammond@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47311 |
| Comment by John Hammond [ 12/May/22 ] |
|
> Seems to be reproduced for fstype=zfs, mdtcount=4 (or > 1?) by sanityn test_102 followed by umount. Correction. zfs is just more likely to produce the timing needed for this. |
| Comment by Gerrit Updater [ 12/May/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47086/ |
| Comment by Andreas Dilger [ 12/May/22 ] |
|
This was introduced by patch https://review.whamcloud.com/40293 "LU-6142 lustre: use is_root_inode()" landed as commit v2_14_50-100-gfca56be02b. |
| Comment by Gerrit Updater [ 18/May/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47311/ |
| Comment by Gerrit Updater [ 01/Sep/22 ] |
|
"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48406 |
| Comment by Gerrit Updater [ 08/Nov/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48406/ |