[LU-8917] OSS crash with kernel BUG at drivers/md/raid5.c:529! Created: 07/Dec/16 Updated: 05/Feb/18 Resolved: 06/Sep/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Stephane Thiell | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Not a Bug | Votes: | 1 |
| Labels: | None | ||
| Environment: |
Lustre 2.8.60, kernel-3.10.0-327.3.1.el7_lustre.x86_64 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Hi, During some relatively heavy IOR benchmarks on a system that is not yet in production, we hit a OSS crash in a lustre thread (ll_ost_io01_012). So far, we were not able to reproduce the issue, but wanted to report it just in case somebody has some hints. We're using Lustre servers based on community Lustre 2.9 (tag 2.8.60) + kernel-3.10.0-327.3.1.el7_lustre.x86_64 (CentOS 7.2) and our clients are IEEL 3.0.0.0 (Lustre 2.7) connected through a single Lustre IB FDR/FDR router. The impacted server has 3 md raid6 10-disk arrays as OST backend. [495232.536334] kernel BUG at drivers/md/raid5.c:529! [495232.541676] invalid opcode: 0000 [#1] SMP the stack is: crash> bt 103442
PID: 103442 TASK: ffff883e0c564500 CPU: 23 COMMAND: "ll_ost_io01_012"
#0 [ffff883e099f3268] machine_kexec at ffffffff81051beb
#1 [ffff883e099f32c8] crash_kexec at ffffffff810f2542
#2 [ffff883e099f3398] oops_end at ffffffff8163e368
#3 [ffff883e099f33c0] die at ffffffff8101859b
#4 [ffff883e099f33f0] do_trap at ffffffff8163da20
#5 [ffff883e099f3440] do_invalid_op at ffffffff81015204
#6 [ffff883e099f34f0] invalid_op at ffffffff8164721e
[exception RIP: get_active_stripe+1838]
RIP: ffffffffa0f2721e RSP: ffff883e099f35a8 RFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff883fa118e820 RCX: dead000000200200
RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff883f40875a30
RBP: ffff883e099f3650 R8: ffff883f40875a40 R9: 0000000000000080
R10: 0000000000000007 R11: 0000000000000000 R12: ffff883fa118e800
R13: ffff883f40875a30 R14: ffff883fa118e800 R15: 00000000aff93430
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff883e099f35a0] get_active_stripe at ffffffffa0f26e2a [raid456]
#8 [ffff883e099f3658] make_request at ffffffffa0f2cbd5 [raid456]
#9 [ffff883e099f3748] md_make_request at ffffffff814b7af5
#10 [ffff883e099f37a8] generic_make_request at ffffffff812c73e2
#11 [ffff883e099f37e0] submit_bio at ffffffff812c74a1
#12 [ffff883e099f3838] osd_submit_bio at ffffffffa0eb89ac [osd_ldiskfs]
#13 [ffff883e099f3848] osd_do_bio at ffffffffa0ebade7 [osd_ldiskfs]
#14 [ffff883e099f3968] osd_write_commit at ffffffffa0ebb974 [osd_ldiskfs]
#15 [ffff883e099f3a08] ofd_commitrw_write at ffffffffa120d774 [ofd]
#16 [ffff883e099f3a80] ofd_commitrw at ffffffffa1210f2d [ofd]
#17 [ffff883e099f3b08] obd_commitrw at ffffffffa10175a5 [ptlrpc]
#18 [ffff883e099f3b70] tgt_brw_write at ffffffffa0feff81 [ptlrpc]
#19 [ffff883e099f3cd8] tgt_request_handle at ffffffffa0fec235 [ptlrpc]
#20 [ffff883e099f3d20] ptlrpc_server_handle_request at ffffffffa0f981bb [ptlrpc]
#21 [ffff883e099f3de8] ptlrpc_main at ffffffffa0f9c270 [ptlrpc]
#22 [ffff883e099f3ec8] kthread at ffffffff810a5aef
#23 [ffff883e099f3f50] ret_from_fork at ffffffff81645a58
the BUG is triggered by line 529 in raid5.c: BUG_ON(sh->batch_head): crash> l drivers/md/raid5.c:526 521 static void init_stripe(struct stripe_head *sh, sector_t sector, int previous) 522 { 523 struct r5conf *conf = sh->raid_conf; 524 int i, seq; 525 526 BUG_ON(atomic_read(&sh->count) != 0); 527 BUG_ON(test_bit(STRIPE_HANDLE, &sh->state)); 528 BUG_ON(stripe_operations_active(sh)); 529 BUG_ON(sh->batch_head); Please note that we do have a crash dump of the OSS available. Best,
|
| Comments |
| Comment by Peter Jones [ 09/Dec/16 ] |
|
Jinshan Could you please advise on this one? Thanks Peter |
| Comment by Stephane Thiell [ 09/May/17 ] |
|
Hi, I got another occurrence of this issue on a newer el7 kernel (3.10.0-514.10.2.el7_lustre.x86_64). It seems clear to me now that it is a (rare) MD raid bug but would like to continue documenting the problem here as it is triggered when using Lustre. The issue results in a panic due to a BUG_ON(sh->batch_head) in raid5.c. During last weekend, the issue occurred during a weekly raid-check (for all volumes). This time, the crashing backtrace doesn't contain any Lustre part. The raid volumes (12 total) are pretty new, no mismatch nor hardware errors have been detected. After failover, the filesystem is still running well. The panic: [535089.369357] kernel BUG at drivers/md/raid5.c:527! [535089.374700] invalid opcode: 0000 [#1] SMP [535089.379384] Modules linked in: ... [535089.503509] CPU: 34 PID: 280061 Comm: md0_resync Tainted: G OE ------------ 3.10.0-514.10.2.el7_lustre.x86_64 #1 I have a crash dump, this is the backtrace: crash> bt 280061
PID: 280061 TASK: ffff8800757cde20 CPU: 34 COMMAND: "md0_resync"
#0 [ffff88024e217830] machine_kexec at ffffffff81059bdb
#1 [ffff88024e217890] __crash_kexec at ffffffff81105382
#2 [ffff88024e217960] crash_kexec at ffffffff81105470
#3 [ffff88024e217978] oops_end at ffffffff8168f508
#4 [ffff88024e2179a0] die at ffffffff8102e93b
#5 [ffff88024e2179d0] do_trap at ffffffff8168ebc0
#6 [ffff88024e217a20] do_invalid_op at ffffffff8102b144
#7 [ffff88024e217ad0] invalid_op at ffffffff816984de
[exception RIP: raid5_get_active_stripe+1809]
RIP: ffffffffa0e4ed71 RSP: ffff88024e217b88 RFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff883fe5e40408 RCX: dead000000000200
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8823ebf45ca0
RBP: ffff88024e217c30 R8: ffff8823ebf45cb0 R9: 0000000000000080
R10: 0000000000000006 R11: 0000000000000000 R12: ffff883fe5e40400
R13: ffff8823ebf45ca0 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff88024e217b80] raid5_get_active_stripe at ffffffffa0e4e996 [raid456]
#9 [ffff88024e217be0] raid5_release_stripe at ffffffffa0e48f24 [raid456]
#10 [ffff88024e217c38] raid5_sync_request at ffffffffa0e53c4b [raid456]
#11 [ffff88024e217ca8] md_do_sync at ffffffff814fef9b
#12 [ffff88024e217e50] md_thread at ffffffff814fb1b5
#13 [ffff88024e217ec8] kthread at ffffffff810b06ff
#14 [ffff88024e217f50] ret_from_fork at ffffffff81696b98
It appears to be triggered by BUG_ON(sh->batch_head) in init_stripe(): crash> l drivers/md/raid5.c:524 519 static void init_stripe(struct stripe_head *sh, sector_t sector, int previous) 520 { 521 struct r5conf *conf = sh->raid_conf; 522 int i, seq; 523 524 BUG_ON(atomic_read(&sh->count) != 0); 525 BUG_ON(test_bit(STRIPE_HANDLE, &sh->state)); 526 BUG_ON(stripe_operations_active(sh)); 527 BUG_ON(sh->batch_head); <<< 528 Other I/Os were processed at this time, but I am not sure how to check that they were actually on the same md: crash> ps | grep ">"
<snip>
59684 2 28 ffff88407fdc0fb0 RU 0.0 0 0 [md0_raid6]
61479 2 17 ffff883e46e80000 UN 0.0 0 0 [ll_ost_io01_001]
220748 2 23 ffff881fb7ab8fb0 UN 0.0 0 0 [ll_ost_io01_011]
220750 2 19 ffff881fb7abce70 UN 0.0 0 0 [ll_ost_io01_013]
279158 2 14 ffff883ab46b4e70 RU 0.0 0 0 [md22_resync]
280061 2 34 ffff8800757cde20 RU 0.0 0 0 [md0_resync]
280829 2 6 ffff881c72296dd0 RU 0.0 0 0 [md6_resync]
example of possible concurrent writing thread: crash> bt 61479
PID: 61479 TASK: ffff883e46e80000 CPU: 17 COMMAND: "ll_ost_io01_001"
#0 [ffff883ffc805e58] crash_nmi_callback at ffffffff8104d2e2
#1 [ffff883ffc805e68] nmi_handle at ffffffff8168f699
#2 [ffff883ffc805eb0] do_nmi at ffffffff8168f813
#3 [ffff883ffc805ef0] end_repeat_nmi at ffffffff8168ead3
[exception RIP: _raw_spin_lock_irq+63]
RIP: ffffffff8168e09f RSP: ffff883e46e3f588 RFLAGS: 00000002
RAX: 00000000000044c2 RBX: ffff883fe5e40408 RCX: 000000000000c464
RDX: 000000000000c468 RSI: 000000000000c468 RDI: ffff883fe5e40408
RBP: ffff883e46e3f588 R8: 0000000000000000 R9: 0000000000000080
R10: 0000000000000002 R11: 0000000000000000 R12: ffff883fe5e40400
R13: 0000000000000000 R14: ffff883fe0e61900 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#4 [ffff883e46e3f588] _raw_spin_lock_irq at ffffffff8168e09f
#5 [ffff883e46e3f590] raid5_get_active_stripe at ffffffffa0e4e6cb [raid456]
#6 [ffff883e46e3f648] raid5_make_request at ffffffffa0e4ef55 [raid456]
#7 [ffff883e46e3f738] md_make_request at ffffffff814f7dfc
#8 [ffff883e46e3f798] generic_make_request at ffffffff812ee939
#9 [ffff883e46e3f7e0] submit_bio at ffffffff812eea81
#10 [ffff883e46e3f838] osd_submit_bio at ffffffffa10a0bcc [osd_ldiskfs]
#11 [ffff883e46e3f848] osd_do_bio at ffffffffa10a3007 [osd_ldiskfs]
#12 [ffff883e46e3f968] osd_write_commit at ffffffffa10a3b94 [osd_ldiskfs]
#13 [ffff883e46e3fa08] ofd_commitrw_write at ffffffffa1113774 [ofd]
#14 [ffff883e46e3fa80] ofd_commitrw at ffffffffa1116f2d [ofd]
#15 [ffff883e46e3fb08] obd_commitrw at ffffffffa0c43c22 [ptlrpc]
#16 [ffff883e46e3fb70] tgt_brw_write at ffffffffa0c1bfc1 [ptlrpc]
#17 [ffff883e46e3fcd8] tgt_request_handle at ffffffffa0c18275 [ptlrpc]
#18 [ffff883e46e3fd20] ptlrpc_server_handle_request at ffffffffa0bc41fb [ptlrpc]
#19 [ffff883e46e3fde8] ptlrpc_main at ffffffffa0bc82b0 [ptlrpc]
#20 [ffff883e46e3fec8] kthread at ffffffff810b06ff
#21 [ffff883e46e3ff50] ret_from_fork at ffffffff81696b98
I found a similar occurrence of this bug on this page https://lkml.org/lkml/2016/12/23/205 (not with Lustre though). I have also described the problem on the linux-raid mailing list but didn't get any reply so far. Stephane |
| Comment by Stanford Research Computing Center [ 15/Jun/17 ] |
|
This issue hit us one more time on the same OSS. Could you assist in debugging this a little? [143161.778606] ------------[ cut here ]------------ [143161.783857] kernel BUG at drivers/md/raid5.c:527! [143161.789199] invalid opcode: 0000 [#1] SMP [143161.793883] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) raid456 async_raid6_recov async_memcpy async_pq raid6_pq libcrc32c async_xor xor async_tx ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) uas iTCO_wdt iTCO_vendor_support vfat fat dcdbas mxm_wmi intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr usb_storage mpt2sas mptctl mptbase rpcsec_gss_krb5 nfsv4 dell_rbu dns_resolver nfs fscache ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core lpc_ich mei_me mei dm_service_time ses enclosure [143161.873461] ipmi_devintf sg ipmi_si ipmi_msghandler shpchp acpi_power_meter wmi dm_multipath dm_mod nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_en i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci libahci drm mpt3sas tg3 raid_class mlx4_core crct10dif_pclmul crct10dif_common ptp crc32c_intel libata megaraid_sas i2c_core devlink scsi_transport_sas pps_core fjes [143161.918022] CPU: 14 PID: 253667 Comm: ll_ost_io00_017 Tainted: G OE ------------ 3.10.0-514.16.1.el7_lustre.x86_64 #1 [143161.931213] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.3.4 11/08/2016 [143161.939660] task: ffff883fac2b0fb0 ti: ffff883912b30000 task.ti: ffff883912b30000 [143161.948107] RIP: 0010:[<ffffffffa0fffd71>] [<ffffffffa0fffd71>] raid5_get_active_stripe+0x711/0x750 [raid456] [143161.959387] RSP: 0018:ffff883912b33598 EFLAGS: 00010082 [143161.965409] RAX: 0000000000000000 RBX: ffff883fadb7800c RCX: dead000000000200 [143161.973467] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff881e5a980f70 [143161.981525] RBP: ffff883912b33640 R08: ffff881e5a980f80 R09: 0000000000000080 [143161.989584] R10: 0000000000000000 R11: 0000000000000000 R12: ffff883fadb78000 [143161.997642] R13: ffff881e5a980f70 R14: 0000000000000000 R15: 0000000000000000 [143162.005702] FS: 0000000000000000(0000) GS:ffff881ffcfc0000(0000) knlGS:0000000000000000 [143162.014828] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [143162.021335] CR2: 00000000027b8f78 CR3: 00000000019be000 CR4: 00000000001407e0 [143162.029394] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [143162.037453] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [143162.045511] Stack: [143162.047848] ffff883fadb78000 0000000000000000 ffff881ff9b9d900 0000000059298000 [143162.056234] ffff883fadb78004 00000000aec652c8 ffff883fadb781d8 ffff883900000001 [143162.064619] ffff883fadb78000 0000000000000000 ffff881e59298000 ffff883912b33640 [143162.073004] Call Trace: [143162.075834] [<ffffffffa0ffff55>] raid5_make_request+0x1a5/0xd70 [raid456] [143162.083620] [<ffffffffa0fafbbc>] ? ldiskfs_ext_insert_extent+0x47c/0x12e0 [ldiskfs] [143162.092363] [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30 [143162.098971] [<ffffffff814f838c>] md_make_request+0xdc/0x230 [143162.105385] [<ffffffff812eedc9>] generic_make_request+0x109/0x1e0 [143162.112378] [<ffffffff812eef11>] submit_bio+0x71/0x150 [143162.118337] [<ffffffffa09b36b2>] ? lprocfs_oh_tally+0x32/0x40 [obdclass] [143162.126029] [<ffffffffa10a4bcc>] osd_submit_bio+0x1c/0x60 [osd_ldiskfs] [143162.133613] [<ffffffffa10a73c1>] osd_do_bio.isra.23+0x791/0x830 [osd_ldiskfs] [143162.141782] [<ffffffffa0f6011c>] ? __ldiskfs_journal_stop+0x3c/0xb0 [ldiskfs] [143162.149951] [<ffffffffa10a7b94>] osd_write_commit+0x344/0x940 [osd_ldiskfs] [143162.157915] [<ffffffffa116b774>] ofd_commitrw_write+0x1014/0x1c20 [ofd] [143162.165496] [<ffffffffa116ef2d>] ofd_commitrw+0x51d/0xa40 [ofd] [143162.172345] [<ffffffffa0c8dc22>] obd_commitrw+0x2ec/0x32f [ptlrpc] [143162.179479] [<ffffffffa0c65fc1>] tgt_brw_write+0xea1/0x1640 [ptlrpc] [143162.186766] [<ffffffff810ce474>] ? update_curr+0x104/0x190 [143162.193081] [<ffffffff810c9ba8>] ? __enqueue_entity+0x78/0x80 [143162.199712] [<ffffffffa0bbc560>] ? target_send_reply_msg+0x170/0x170 [ptlrpc] [143162.207903] [<ffffffffa0c62275>] tgt_request_handle+0x915/0x1320 [ptlrpc] [143162.215705] [<ffffffffa0c0e1fb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [143162.224355] [<ffffffffa08c4158>] ? lc_watchdog_touch+0x68/0x180 [libcfs] [143162.232057] [<ffffffffa0c0bdb8>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [143162.239730] [<ffffffff810c54d2>] ? default_wake_function+0x12/0x20 [143162.246821] [<ffffffff810ba628>] ? __wake_up_common+0x58/0x90 [143162.253456] [<ffffffffa0c122b0>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc] [143162.260575] [<ffffffffa0c11810>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc] [143162.268926] [<ffffffff810b0a4f>] kthread+0xcf/0xe0 [143162.274465] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140 [143162.281846] [<ffffffff81697218>] ret_from_fork+0x58/0x90 [143162.287967] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140 [143162.295346] Code: c4 04 75 a6 0f 0b e8 ff 5c 08 e0 0f 0b 48 8b 55 80 48 c7 c6 c8 a2 00 a1 48 c7 c7 90 c0 00 a1 31 c0 e8 34 cc 33 e0 e9 45 fc ff ff <0f> 0b 0f 0b 48 c7 c6 35 95 00 a1 48 c7 c7 68 c0 00 a1 31 c0 e8 [143162.317044] RIP [<ffffffffa0fffd71>] raid5_get_active_stripe+0x711/0x750 [raid456] [143162.325696] RSP <ffff883912b33598> Thanks, |
| Comment by Stephane Thiell [ 06/Sep/17 ] |
|
We fixed this issue with the linux-raid kernel folks. Please see http://www.spinics.net/lists/raid/msg59072.html for more details about the fix. Feel free to close this LU. Best, Stephane |
| Comment by Peter Jones [ 06/Sep/17 ] |
|
ok thanks Stephane. Will this fix be included in an upcoming RHEL 7.x update? |
| Comment by Stephane Thiell [ 06/Sep/17 ] |
|
Hi Peter, I don't know yet, but it clearly should. Usually Red Hat and SUSE developers are tracking linux-raid changes so we'll see. We're using CentOS 7 so I can't really open a support case for that. If you have a way to encourage the Red Hat folks to backport this patch, that would be great, and it is super easy to apply as raid5.c didn't change much. The race is very rare unless using Lustre or other setups adding some kind of latency (using virtualization for instance). Another link to the patch is https://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git/commit/?h=for-next&id=3664847d95e60a9a943858b7800f8484669740fc Thanks! Stephane |
| Comment by Peter Jones [ 06/Sep/17 ] |
|
Stephane We will pass along the details to Suse/Red Hat with the hope that this fix can be included in a future release Peter |
| Comment by Stephane Thiell [ 06/Sep/17 ] |
|
Thanks Peter, much appreciated! |
| Comment by Stephane Thiell [ 10/Oct/17 ] |
|
Landed in RHEL 7.5 and should be back ported in an upcoming RHEL 7.4 kernel update, see https://bugzilla.redhat.com/show_bug.cgi?id=1496836
It has also landed in SLES12 SP2, even more quickly, see http://kernel.suse.com/cgit/kernel-source/commit/?h=SLE12-SP2&id=5bedfa64185da4ab9cd7b1166a0097546c928234 |
| Comment by Stephane Thiell [ 05/Feb/18 ] |
|
RHEL 7.4: fixed in kernel-3.10.0-693.18.1.el7 |