Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.1.6
-
bullx supercomputer suite
-
3
-
14747
Description
OSS hits a general protection fault with the following trace:
PID: 22785 TASK: ffff880612f90830 CPU: 1 COMMAND: "ll_ost_io_1011" #0 [ffff880612fd7620] machine_kexec at ffffffff8102902b #1 [ffff880612fd7680] crash_kexec at ffffffff810a5292 #2 [ffff880612fd7750] oops_end at ffffffff8149a050 #3 [ffff880612fd7780] die at ffffffff8100714b #4 [ffff880612fd77b0] do_general_protection at ffffffff81499be2 #5 [ffff880612fd77e0] general_protection at ffffffff814993b5 [exception RIP: radix_tree_lookup_slot+5] RIP: ffffffff81261465 RSP: ffff880612fd7890 RFLAGS: 00010286 RAX: e940201000000010 RBX: e940201000000008 RCX: 0000000000000000 RDX: 00000000000200d2 RSI: 0000000000000000 RDI: e940201000000008 RBP: ffff880612fd78b0 R8: ffff880612fdc140 R9: 0000000000000008 R10: 0000000000001000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: e940201000000000 R15: 20105fa000080221 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffff880612fd7898] find_get_page at ffffffff810ffe8e #7 [ffff880612fd78b8] find_lock_page at ffffffff8110112a #8 [ffff880612fd78e8] find_or_create_page at ffffffff8110129f #9 [ffff880612fd7938] filter_get_page at ffffffffa0c4b065 [obdfilter] #10 [ffff880612fd7968] filter_preprw_read at ffffffffa0c4d64d [obdfilter] #11 [ffff880612fd7a98] filter_preprw at ffffffffa0c4dedc [obdfilter] #12 [ffff880612fd7ad8] obd_preprw at ffffffffa0c09051 [ost] #13 [ffff880612fd7b48] ost_brw_read at ffffffffa0c10091 [ost] #14 [ffff880612fd7c88] ost_handle at ffffffffa0c16423 [ost] #15 [ffff880612fd7da8] ptlrpc_main at ffffffffa07fd4e6 [ptlrpc] #16 [ffff880612fd7f48] kernel_thread at ffffffff8100412a
Taking a look at the dump, it seems there is a race leading to corruption of the struct inode which is passed from filter_preprw_read to filter_get_page.
I attached my complete dump analysis log.
I can also upload the dump if you request it.