[LU-15305] sanityn test_109 crash: list_del corruption in class_del_profile() Created: 01/Dec/21  Updated: 29/Nov/22  Resolved: 25/Oct/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0, Lustre 2.15.2

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Dongyang Li
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-15757 sanityn test_109: Oops in ll_md_block... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for James Simmons <uja.ornl@gmail.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/3cb54b77-9376-469f-bb36-38fa18be5e69



 Comments   
Comment by Alexander Boyko [ 07/Apr/22 ]

One more crash https://testing.whamcloud.com/test_sets/06cda596-9526-4a22-9027-120c58116996

[21783.621925] Lustre: Unmounted lustre-client
[21783.622895] list_del corruption, ffff93715ef03a80->next is LIST_POISON1 (dead000000000100)
[21783.624539] ------------[ cut here ]------------
[21783.625450] kernel BUG at lib/list_debug.c:47!
[21783.626358] invalid opcode: 0000 [#1] SMP PTI
[21783.627216] CPU: 1 PID: 868331 Comm: umount Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-240.22.1.el8_3.x86_64 #1
[21783.629408] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[21783.630535] RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x4c
[21783.631636] Code: cd ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 d8 ce 8e 91 e8 3c 4d cd ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 68 cf 8e 91 e8 28 4d cd ff <0f> 0b 48 c7 c7 18 d0 8e 91 e8 1a 4d cd ff 0f 0b 48 89 f2 48 89 fe
[21783.634997] RSP: 0018:ffffb32cc1b6bd60 EFLAGS: 00010246
[21783.635995] RAX: 000000000000004e RBX: ffff93715ef03a80 RCX: 0000000000000000
[21783.637347] RDX: 0000000000000000 RSI: ffff93717fd16a08 RDI: ffff93717fd16a08
[21783.638683] RBP: ffff9371243c380c R08: 000000000000109a R09: ffff9370c00b8f00
[21783.640018] R10: 0720072007200720 R11: 0720072007200720 R12: ffff937123b6a000
[21783.641363] R13: ffff9371243c3800 R14: ffff937123b6f800 R15: ffff93712107b230
[21783.642700] FS:  00007f93ea343080(0000) GS:ffff93717fd00000(0000) knlGS:0000000000000000
[21783.644199] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21783.645294] CR2: 000055a6811be0b8 CR3: 0000000063b32001 CR4: 00000000000606e0
[21783.646643] Call Trace:
[21783.647224]  class_del_profile+0x4c/0x1c0 [obdclass]
[21783.648254]  ll_put_super+0x250/0xef0 [lustre]
[21783.649157]  ? fsnotify_destroy_marks+0x22/0xe0
[21783.650061]  ? clear_inode+0x35/0x90
[21783.650779]  ? fsnotify_unmount_inodes+0x11c/0x1a0
[21783.651729]  ? dispose_list+0x4d/0x60
[21783.652451]  ? evict_inodes+0x160/0x1b0
[21783.653239]  generic_shutdown_super+0x6c/0x100
[21783.654130]  kill_anon_super+0x14/0x30
[21783.654876]  deactivate_locked_super+0x34/0x70
[21783.655738]  cleanup_mnt+0x3b/0x70
[21783.656438]  task_work_run+0x8a/0xb0
[21783.657172]  exit_to_usermode_loop+0xeb/0xf0
[21783.658030]  do_syscall_64+0x198/0x1a0
[21783.658793]  entry_SYSCALL_64_after_hwframe+0x65/0xca
Comment by Andreas Dilger [ 06/May/22 ]

+1 on master during sanityn test_109 unmount:
https://testing.whamcloud.com/test_sets/8a6b75c5-697f-4d48-a33f-e5a3cc07d022

Comment by Andreas Dilger [ 06/May/22 ]

First failure was 2021-10-24 but on a patch that still hasn't landed yet.

Comment by Gerrit Updater [ 07/Oct/22 ]

"Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48802
Subject: LU-15305 obdclass: fix race in class_del_profile
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b76e5c3307ae1e7487d624670318240e7748081f

Comment by Gerrit Updater [ 25/Oct/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48802/
Subject: LU-15305 obdclass: fix race in class_del_profile
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 83d3f42118579d7fb7c3002533c047badcf41e0d

Comment by Peter Jones [ 25/Oct/22 ]

Landed for 2.16

Comment by Xing Huang [ 16/Nov/22 ]

+1 on b2_15: https://testing.whamcloud.com/test_sets/d9f13cfa-44fd-4a55-a685-690874258789.
Hi Dongyang, maybe I should port this patch to b2_15?

Comment by Dongyang Li [ 16/Nov/22 ]

Yeah, if b2_15 doesn't have the patch.

Comment by Gerrit Updater [ 17/Nov/22 ]

"Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49178
Subject: LU-15305 obdclass: fix race in class_del_profile
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 1485d22e70ce9afe8b5d86e1ee13435f33255171

Comment by Gerrit Updater [ 29/Nov/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49178/
Subject: LU-15305 obdclass: fix race in class_del_profile
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 906b5d9dbe82beed41f191bd69ce1f72504a77c5

Generated at Sat Feb 10 03:17:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.