Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.1.6
-
bullx supercomputer suite
-
3
-
14747
Description
OSS hits a general protection fault with the following trace:
PID: 22785 TASK: ffff880612f90830 CPU: 1 COMMAND: "ll_ost_io_1011"
#0 [ffff880612fd7620] machine_kexec at ffffffff8102902b
#1 [ffff880612fd7680] crash_kexec at ffffffff810a5292
#2 [ffff880612fd7750] oops_end at ffffffff8149a050
#3 [ffff880612fd7780] die at ffffffff8100714b
#4 [ffff880612fd77b0] do_general_protection at ffffffff81499be2
#5 [ffff880612fd77e0] general_protection at ffffffff814993b5
[exception RIP: radix_tree_lookup_slot+5]
RIP: ffffffff81261465 RSP: ffff880612fd7890 RFLAGS: 00010286
RAX: e940201000000010 RBX: e940201000000008 RCX: 0000000000000000
RDX: 00000000000200d2 RSI: 0000000000000000 RDI: e940201000000008
RBP: ffff880612fd78b0 R8: ffff880612fdc140 R9: 0000000000000008
R10: 0000000000001000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: e940201000000000 R15: 20105fa000080221
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffff880612fd7898] find_get_page at ffffffff810ffe8e
#7 [ffff880612fd78b8] find_lock_page at ffffffff8110112a
#8 [ffff880612fd78e8] find_or_create_page at ffffffff8110129f
#9 [ffff880612fd7938] filter_get_page at ffffffffa0c4b065 [obdfilter]
#10 [ffff880612fd7968] filter_preprw_read at ffffffffa0c4d64d [obdfilter]
#11 [ffff880612fd7a98] filter_preprw at ffffffffa0c4dedc [obdfilter]
#12 [ffff880612fd7ad8] obd_preprw at ffffffffa0c09051 [ost]
#13 [ffff880612fd7b48] ost_brw_read at ffffffffa0c10091 [ost]
#14 [ffff880612fd7c88] ost_handle at ffffffffa0c16423 [ost]
#15 [ffff880612fd7da8] ptlrpc_main at ffffffffa07fd4e6 [ptlrpc]
#16 [ffff880612fd7f48] kernel_thread at ffffffff8100412a
Taking a look at the dump, it seems there is a race leading to corruption of the struct inode which is passed from filter_preprw_read to filter_get_page.
I attached my complete dump analysis log.
I can also upload the dump if you request it.