Details
-
Bug
-
Resolution: Unresolved
-
Critical
-
Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>
This issue relates to the following test suite run:
https://testing.whamcloud.com/test_sets/f43b22b1-5c6c-444b-b9be-ecfe70a1c164
Test session details:
clients: https://build.whamcloud.com/job/lustre-master/4475 - 4.12.14-122.133-default
servers: https://build.whamcloud.com/job/lustre-master/4475 - 4.18.0-477.27.1.el8_lustre.x86_64
It looks like the sles12.5 client is crashing 100% of test runs on master right at unmount:
2025.506089] Lustre: Unmounted lustre-client [ 2025.507805] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [ 2025.509455] IP: wb_workfn+0x2b/0x450 [ 2025.511544] CPU: 0 PID: 282 Comm: kworker/u4:3 Tainted: G OE 4.12.14-122.133-default #1 SLE12-SP5 [ 2025.513428] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 2025.514554] Workqueue: writeback wb_workfn [ 2025.516554] RIP: 0010:wb_workfn+0x2b/0x450 [ 2025.529303] Call Trace: [ 2025.532599] process_one_work+0x14c/0x390 [ 2025.533464] worker_thread+0x1c3/0x3e0 [ 2025.534241] kthread+0xf6/0x130
It looks like some kind of workqueue that is not flushed before unmount, or maybe RCU related?
This is commit v2_15_58-183-g21295b169b (2 commits before 2.15.59).