[LU-5284] GPF in radix_tree_lookup_slot on OSS Created: 02/Jul/14 Updated: 07/Jun/17 Resolved: 07/Jun/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sebastien Piechurski | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | p4b | ||
| Environment: |
bullx supercomputer suite |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 14747 |
| Description |
|
OSS hits a general protection fault with the following trace: PID: 22785 TASK: ffff880612f90830 CPU: 1 COMMAND: "ll_ost_io_1011"
#0 [ffff880612fd7620] machine_kexec at ffffffff8102902b
#1 [ffff880612fd7680] crash_kexec at ffffffff810a5292
#2 [ffff880612fd7750] oops_end at ffffffff8149a050
#3 [ffff880612fd7780] die at ffffffff8100714b
#4 [ffff880612fd77b0] do_general_protection at ffffffff81499be2
#5 [ffff880612fd77e0] general_protection at ffffffff814993b5
[exception RIP: radix_tree_lookup_slot+5]
RIP: ffffffff81261465 RSP: ffff880612fd7890 RFLAGS: 00010286
RAX: e940201000000010 RBX: e940201000000008 RCX: 0000000000000000
RDX: 00000000000200d2 RSI: 0000000000000000 RDI: e940201000000008
RBP: ffff880612fd78b0 R8: ffff880612fdc140 R9: 0000000000000008
R10: 0000000000001000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: e940201000000000 R15: 20105fa000080221
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffff880612fd7898] find_get_page at ffffffff810ffe8e
#7 [ffff880612fd78b8] find_lock_page at ffffffff8110112a
#8 [ffff880612fd78e8] find_or_create_page at ffffffff8110129f
#9 [ffff880612fd7938] filter_get_page at ffffffffa0c4b065 [obdfilter]
#10 [ffff880612fd7968] filter_preprw_read at ffffffffa0c4d64d [obdfilter]
#11 [ffff880612fd7a98] filter_preprw at ffffffffa0c4dedc [obdfilter]
#12 [ffff880612fd7ad8] obd_preprw at ffffffffa0c09051 [ost]
#13 [ffff880612fd7b48] ost_brw_read at ffffffffa0c10091 [ost]
#14 [ffff880612fd7c88] ost_handle at ffffffffa0c16423 [ost]
#15 [ffff880612fd7da8] ptlrpc_main at ffffffffa07fd4e6 [ptlrpc]
#16 [ffff880612fd7f48] kernel_thread at ffffffff8100412a
Taking a look at the dump, it seems there is a race leading to corruption of the struct inode which is passed from filter_preprw_read to filter_get_page. I attached my complete dump analysis log. I can also upload the dump if you request it. |
| Comments |
| Comment by Peter Jones [ 02/Jul/14 ] |
|
Bruno Could you please look into this one? Thanks Peter |
| Comment by Bruno Faccini (Inactive) [ 02/Jul/14 ] |
|
Hello Seb, Or may be you can upload the crash-dump if you want ? |
| Comment by Sebastien Piechurski [ 02/Jul/14 ] |
|
As seen in the stack of filter_get_page(), the base pointer of filter_preprw_read() is really 0xffff880612fd7a90 (I think you swapped the addresses in your comment with the one of the inode struct): #9 [ffff880612fd7938] filter_get_page at ffffffffa0c4b065 [obdfilter]
ffff880612fd7940: 00000000531db489 000000001c4be826
ffff880612fd7950: ffff880612fd7c4c ffff880610f480b8
ffff880612fd7960: ffff880612fd7a90 ffffffffa0c4d64d
#10 [ffff880612fd7968] filter_preprw_read at ffffffffa0c4d64d [obdfilter]
[...]
ffff880612fd7a80: ffff880091cab1c8 ffff8805235be548
ffff880612fd7a90: ffff880612fd7ad0 ffffffffa0c4dedc
#11 [ffff880612fd7a98] filter_preprw at ffffffffa0c4dedc [obdfilter]
Looking for the struct address in the slab, it would seem the ldiskfs_inode_cache slab itself is corrupted: crash> kmem -s ffff880040aa57c0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad next pointer: 0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad prev pointer: 0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad inuse counter: 0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad s_mem pointer: 0 kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280 bad next pointer: 0 kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280 bad s_mem pointer: 0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad next pointer: 0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad s_mem pointer: 0 kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280 bad next pointer: 0 kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280 bad s_mem pointer: 0 kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280 bad next pointer: 0 kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280 bad s_mem pointer: 0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad next pointer: 0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad s_mem pointer: 0 kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280 bad next pointer: 0 kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280 bad s_mem pointer: 0 kmem: ldiskfs_inode_cache: address not found in cache: ffff880040aa57c0 kmem: 16 errors encountered I am uploading the dump as well on the ftp site. Should be available tomorrow. |
| Comment by Bruno Faccini (Inactive) [ 03/Jul/14 ] |
|
Seb, Yes you are right, I should have written 0xffff880612fd7960 for the location/save of %rbp value at the time of filter_get_page() call in filter_preprw_read(), and I in fact since it was not available in your crash extracts for cut+paste, I have simply used 0xffff880612fd7970 but then have forgotten to manually modify it !!! And BTW, the "kmem" sub-command confirms the validity of the inode address and its errors confirm the Slab corruption !!... Thanks for uploading the crash-dump. |
| Comment by Bruno Faccini (Inactive) [ 28/Jul/14 ] |
|
Crash dump analysis findings : _ there are 2 additional threads with references to the same page/slab/inode address range, they both triggered the same GPF too, with the same stack! crash> bt 21929 22191
PID: 21929 TASK: ffff880c12206830 CPU: 2 COMMAND: "ll_ost_io_155"
#0 [ffff880c12277780] die at ffffffff8100711e
#1 [ffff880c122777b0] do_general_protection at ffffffff81499be2
#2 [ffff880c122777e0] general_protection at ffffffff814993b5
[exception RIP: radix_tree_lookup_slot+5]
RIP: ffffffff81261465 RSP: ffff880c12277890 RFLAGS: 00010286
RAX: e940201000000010 RBX: e940201000000008 RCX: 0000000000000000
RDX: 00000000000200d2 RSI: 0000000000000008 RDI: e940201000000008
RBP: ffff880c122778b0 R8: ffff880614e926c0 R9: 00000000000000f8
R10: 0000000000001000 R11: 0000000000000001 R12: 0000000000000008
R13: 0000000000000008 R14: e940201000000000 R15: 20105fa000080221
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#3 [ffff880c12277898] find_get_page at ffffffff810ffe8e
#4 [ffff880c122778b8] find_lock_page at ffffffff8110112a
#5 [ffff880c122778e8] find_or_create_page at ffffffff8110129f
#6 [ffff880c12277938] filter_get_page at ffffffffa0c4b065 [obdfilter]
#7 [ffff880c12277968] filter_preprw_read at ffffffffa0c4d64d [obdfilter]
#8 [ffff880c12277a98] filter_preprw at ffffffffa0c4dedc [obdfilter]
#9 [ffff880c12277ad8] obd_preprw at ffffffffa0c09051 [ost]
#10 [ffff880c12277b48] ost_brw_read at ffffffffa0c10091 [ost]
#11 [ffff880c12277c88] ost_handle at ffffffffa0c16423 [ost]
#12 [ffff880c12277da8] ptlrpc_main at ffffffffa07fd4e6 [ptlrpc]
#13 [ffff880c12277f48] kernel_thread at ffffffff8100412a
PID: 22191 TASK: ffff8806144fd0c0 CPU: 0 COMMAND: "ll_ost_io_417"
#0 [ffff880614617780] die at ffffffff8100711e
#1 [ffff8806146177b0] do_general_protection at ffffffff81499be2
#2 [ffff8806146177e0] general_protection at ffffffff814993b5
[exception RIP: radix_tree_lookup_slot+5]
RIP: ffffffff81261465 RSP: ffff880614617890 RFLAGS: 00010286
RAX: e940201000000010 RBX: e940201000000008 RCX: 0000000000000000
RDX: 00000000000200d2 RSI: 0000000000000100 RDI: e940201000000008
RBP: ffff8806146178b0 R8: ffff88061461c140 R9: 0000000000000008
R10: 0000000000001000 R11: 0000000000000001 R12: 0000000000000100
R13: 0000000000000100 R14: e940201000000000 R15: 20105fa000080221
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#3 [ffff880614617898] find_get_page at ffffffff810ffe8e
#4 [ffff8806146178b8] find_lock_page at ffffffff8110112a
#5 [ffff8806146178e8] find_or_create_page at ffffffff8110129f
#6 [ffff880614617938] filter_get_page at ffffffffa0c4b065 [obdfilter]
#7 [ffff880614617968] filter_preprw_read at ffffffffa0c4d64d [obdfilter]
#8 [ffff880614617a98] filter_preprw at ffffffffa0c4dedc [obdfilter]
#9 [ffff880614617ad8] obd_preprw at ffffffffa0c09051 [ost]
#10 [ffff880614617b48] ost_brw_read at ffffffffa0c10091 [ost]
#11 [ffff880614617c88] ost_handle at ffffffffa0c16423 [ost]
#12 [ffff880614617da8] ptlrpc_main at ffffffffa07fd4e6 [ptlrpc]
#13 [ffff880614617f48] kernel_thread at ffffffff8100412a
crash>
_ The ldiskfs_inode slab page corruption looks like : crash> rd 0xffff880040aa5000 512 | grep -v "0000000000000000 0000000000000000" | head ffff880040aa5600: 0000041100000401 20025bc900000421 ........!....[. ffff880040aa5610: 0000000000040002 d514200200000000 ............. .. ffff880040aa5620: 0000041200000402 2010176b00000622 ........"...k.. ffff880040aa5630: 0000000000050000 3e44201000000000 ............. D> ffff880040aa5640: 0000041300000403 20103b4500000823 ........#...E;. ffff880040aa5650: 0000000000050000 ec37201000000000 ............. 7. ffff880040aa5660: 0000041400000404 20102f7700000a24 ........$...w/. ffff880040aa5670: 0000000000050000 3928201000000000 ............. (9 ffff880040aa5680: 0000041500000405 2010299500000c25 ........%....). ffff880040aa5690: 0000000000050000 d013201000000000 ............. .. grep: write error crash> crash> rd 0xffff880040aa5000 512 | grep -v "0000000000000000 0000000000000000" | tail ffff880040aa5f60: 0020001b0020000b 201080000020162b .. ... .+. .... ffff880040aa5f70: 0000000000070000 2a74201000000000 ............. t* ffff880040aa5f80: 0020001c0020000c 201080000020182c .. ... .,. .... ffff880040aa5f90: 0000000000070000 6e46201000000000 ............. Fn ffff880040aa5fa0: 0020001d0020000d 2010800000201a2d .. ... .-. .... ffff880040aa5fb0: 0000000000070000 024d201000000000 ............. M. ffff880040aa5fc0: 0020001e0020000e 2010800000201c2e .. ... ... .... ffff880040aa5fd0: 0000000000070000 b650201000000000 ............. P. ffff880040aa5fe0: 0020001f0020000f 2010800000201e2f .. ... ./. .... ffff880040aa5ff0: 0000000000070000 da5b201000000000 ............. [. crash> _ dump is missing the 2 previous/next 4K virtual/real pages (inside the same 2M page!) because they are un-mapped/free. |
| Comment by Bruno Faccini (Inactive) [ 28/Jul/14 ] |
|
Even if this will impact crash-dump timing and on-disk space, could it be possible to set kdump config to allow full memory/all pages to be dumped next time ?? |
| Comment by Bruno Faccini (Inactive) [ 27/Aug/14 ] |
|
Seb, any news or new occurence ?? |
| Comment by Sebastien Piechurski [ 07/Jun/17 ] |
|
Given that we did not see this issue for almost 3 years, and all our customers have migrated away from 2.1, I guess you can close this ticket. |
| Comment by Peter Jones [ 07/Jun/17 ] |
|
ok , thanks |