Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
While testing Lustre client code against a 5.15 kernel system, soft cpu lockups were caused when doing some FIO based tests:
[Wed Aug 10 13:40:59 2022] watchdog: BUG: soft lockup - CPU#9 stuck for 48s! [ptlrpcd_04_01:1734] [Wed Aug 10 13:40:59 2022] CPU: 9 PID: 1734 Comm: ptlrpcd_04_01 Tainted: G O L 5.15.43.hrtdev #1 [Wed Aug 10 13:40:59 2022] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.12.0-1 04/01/2014 [Wed Aug 10 13:40:59 2022] RIP: 0010:_raw_spin_unlock_irqrestore+0x21/0x30 [Wed Aug 10 13:40:59 2022] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 40 00 f7 c6 00 02 00 00 75 02 5d c3 fb 66 0f 1f 44 00 00 <5d> c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 [Wed Aug 10 13:40:59 2022] RSP: 0018:ffffb91341a4bba8 EFLAGS: 00000206 [Wed Aug 10 13:40:59 2022] RAX: ffffe4a150371b80 RBX: ffffe4a150371b80 RCX: 00000000ffffffff [Wed Aug 10 13:40:59 2022] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffa406f4e9b050 [Wed Aug 10 13:40:59 2022] RBP: ffffb91341a4bba8 R08: 000000000000007d R09: 00000000000b6557 [Wed Aug 10 13:40:59 2022] R10: 0000000000000009 R11: ffffb91341a4bb78 R12: ffffa406f4e9b000 [Wed Aug 10 13:40:59 2022] R13: 0000000000000002 R14: 0000000000000003 R15: 0000000000000002 [Wed Aug 10 13:40:59 2022] FS: 0000000000000000(0000) GS:ffffa40d51a40000(0000) knlGS:0000000000000000 [Wed Aug 10 13:40:59 2022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Wed Aug 10 13:40:59 2022] CR2: 0000000000e319cc CR3: 00000001099dc000 CR4: 00000000003506e0 [Wed Aug 10 13:40:59 2022] Call Trace: [Wed Aug 10 13:40:59 2022] <TASK> [Wed Aug 10 13:40:59 2022] __page_cache_release+0x1d5/0x220 [Wed Aug 10 13:40:59 2022] __put_page+0x3a/0x90 [Wed Aug 10 13:40:59 2022] ptlrpc_release_bulk_page_pin+0x51/0x90 [ptlrpc] [Wed Aug 10 13:40:59 2022] ptlrpc_free_bulk+0x95/0x500 [ptlrpc] [Wed Aug 10 13:40:59 2022] __ptlrpc_req_finished+0x350/0x730 [ptlrpc] [Wed Aug 10 13:40:59 2022] ptlrpc_free_request+0x65/0x70 [ptlrpc] [Wed Aug 10 13:40:59 2022] ptlrpc_free_committed+0x110/0x6f0 [ptlrpc] [Wed Aug 10 13:40:59 2022] after_reply+0x8ea/0xd80 [ptlrpc] [Wed Aug 10 13:40:59 2022] ptlrpc_check_set+0xb29/0x1c90 [ptlrpc] [Wed Aug 10 13:40:59 2022] ptlrpcd_check+0x399/0x580 [ptlrpc] [Wed Aug 10 13:40:59 2022] ? timer_update_keys+0x40/0x40 [Wed Aug 10 13:40:59 2022] ptlrpcd+0x3c9/0x4d0 [ptlrpc] [Wed Aug 10 13:40:59 2022] ? wait_woken+0x70/0x70 [Wed Aug 10 13:40:59 2022] ? ptlrpcd_check+0x580/0x580 [ptlrpc]
This is pretty reproducible by just running fio and doing buffered writes.
Landed for 2.16