[LU-6974] RHEL 7.1 lustre-initialization-1: MDS crashed while lustre mount Created: 10/Aug/15  Updated: 12/May/17  Resolved: 31/Aug/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None
Environment:

3.10.0-229.11.1.el7_lustre.g853ea39.x86_64


Issue Links:
Related
is related to LU-5888 mount.lustre: set max_sectors_kb to 2... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for ys <yang.sheng@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/cebbc5a2-3e51-11e5-8006-5254006e85c2.

The sub-test lustre-initialization_1 failed with the following error:

Test system failed to start single suite, so abandoning all hope and giving up

Please provide additional information about the failure here.

Info required for matching: lustre-initialization-1 lustre-initialization_1

MDS crashed:

03:40:55:[ 2331.721214] ldiskfs: module verification failed: signature and/or required key missing - tainting kernel
03:40:55:[ 2331.740707] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro
03:40:55:[ 2352.224410] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro
03:40:55:[ 2352.319932] BUG: unable to handle kernel paging request at 0000000000001f50
03:40:55:[ 2352.320900] IP: [<ffffffff81155b02>] page_waitqueue+0x62/0x80
03:40:55:[ 2352.320900] PGD 36560067 PUD 7bb3a067 PMD 0 
03:40:55:[ 2352.320900] Oops: 0000 [#1] SMP 
03:40:55:[ 2352.320900] Modules linked in: ldiskfs(OF) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ppdev ib_cm iw_cm serio_raw pcspkr virtio_balloon i2c_piix4 parport_pc parport ib_sa ib_mad ib_core ib_addr ext4 mbcache jbd2 ata_generic pata_acpi cirrus syscopyarea sysfillrect virtio_blk sysimgblt drm_kms_helper 8139too floppy ttm ata_piix drm libata 8139cp mii virtio_pci virtio_ring i2c_core virtio
03:40:55:[ 2352.320900] CPU: 1 PID: 0 Comm: swapper/1 Tainted: GF          O--------------   3.10.0-229.11.1.el7_lustre.g853ea39.x86_64 #1
03:40:55:[ 2352.320900] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
03:40:55:[ 2352.320900] task: ffff88007c052220 ti: ffff88007c080000 task.ti: ffff88007c080000
03:40:55:[ 2352.320900] RIP: 0010:[<ffffffff81155b02>]  [<ffffffff81155b02>] page_waitqueue+0x62/0x80
03:40:55:[ 2352.320900] RSP: 0018:ffff88007fd03d50  EFLAGS: 00010086
03:40:55:[ 2352.320900] RAX: 0000000000001800 RBX: ffff88007c163100 RCX: 0000000000000040
03:40:55:[ 2352.320900] RDX: 97fd97a7b8163100 RSI: 0000000000000000 RDI: 0000000000000000
03:40:55:[ 2352.320900] RBP: ffff88007fd03d50 R08: 8800000000000000 R09: 2001f058c4000000
03:40:55:[ 2352.320900] R10: dffd97a7b8163100 R11: ffffffffa001a265 R12: ffff880035ea3500
03:40:55:[ 2352.320900] R13: 0000000000000001 R14: 0000000000000000 R15: ffff88003631e6c0
03:40:55:[ 2352.320900] FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
03:40:55:[ 2352.320900] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
03:40:55:[ 2352.320900] CR2: 0000000000001f50 CR3: 000000003652c000 CR4: 00000000000006e0
03:40:55:[ 2352.320900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
03:40:55:[ 2352.320900] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
03:40:55:[ 2352.320900] Stack:
03:40:55:[ 2352.320900]  ffff88007fd03d68 ffffffff81156f8e ffff880035ea3568 ffff88007fd03d90
03:40:55:[ 2352.320900]  ffffffff8120652e ffff880035ea3500 0000000000000000 0000000000000000
03:40:55:[ 2352.320900]  ffff88007fd03da0 ffffffff811fd9dd ffff88007fd03de0 ffffffff812adf60
03:40:55:[ 2352.320900] Call Trace:
03:40:56:[ 2352.320900]  <IRQ> 
03:40:56:[ 2352.320900]  [<ffffffff81156f8e>] unlock_page+0x1e/0x30
03:40:56:[ 2352.320900]  [<ffffffff8120652e>] mpage_end_io+0x3e/0xb0
03:40:56:[ 2352.320900]  [<ffffffff811fd9dd>] bio_endio+0x1d/0x30
03:40:56:[ 2352.320900]  [<ffffffff812adf60>] blk_update_request+0x90/0x350
03:40:56:[ 2352.320900]  [<ffffffff812b708a>] blk_mq_end_request+0x1a/0x70
03:40:56:[ 2352.320900]  [<ffffffffa01202f2>] virtblk_request_done+0x32/0x80 [virtio_blk]
03:40:56:[ 2352.320900]  [<ffffffff812b783d>] __blk_mq_complete_request+0x7d/0x100
03:40:56:[ 2352.320900]  [<ffffffff812b78e1>] blk_mq_complete_request+0x21/0x30
03:40:56:[ 2352.320900]  [<ffffffffa0120076>] virtblk_done+0x76/0x100 [virtio_blk]
03:40:56:[ 2352.320900]  [<ffffffffa001a4a8>] vring_interrupt+0x38/0x90 [virtio_ring]
03:40:56:[ 2352.320900]  [<ffffffff8110b84e>] handle_irq_event_percpu+0x3e/0x1e0
03:40:56:[ 2352.320900]  [<ffffffff8110ba2d>] handle_irq_event+0x3d/0x60
03:40:56:[ 2352.320900]  [<ffffffff8110e6c7>] handle_edge_irq+0x77/0x130
03:40:56:[ 2352.320900]  [<ffffffff81015cff>] handle_irq+0xbf/0x150
03:40:57:[ 2352.320900]  [<ffffffff81610b4a>] ? atomic_notifier_call_chain+0x1a/0x20
03:40:57:[ 2352.320900]  [<ffffffff816175ef>] do_IRQ+0x4f/0xf0
03:40:57:[ 2352.320900]  [<ffffffff8160c82d>] common_interrupt+0x6d/0x6d
03:40:57:[ 2352.320900]  <EOI> 
03:40:57:[ 2352.320900]  [<ffffffff8109b938>] ? hrtimer_start+0x18/0x20
03:40:57:[ 2352.320900]  [<ffffffff81052dd6>] ? native_safe_halt+0x6/0x10
03:40:57:[ 2352.320900]  [<ffffffff8101c93f>] default_idle+0x1f/0xc0
03:40:57:[ 2352.320900]  [<ffffffff8101d236>] arch_cpu_idle+0x26/0x30
03:40:57:[ 2352.320900]  [<ffffffff810c6985>] cpu_startup_entry+0xf5/0x290
03:40:57:[ 2352.320900]  [<ffffffff810423ca>] start_secondary+0x1ba/0x230
03:40:57:[ 2352.320900] Code: 48 03 04 d5 80 31 a2 81 48 89 fa 48 c1 e1 39 48 c1 e2 36 48 c1 e7 3f 48 89 e5 4c 01 d2 4c 29 c2 48 01 f2 48 29 ca b9 40 00 00 00 <2b> 88 50 07 00 00 48 01 d7 48 8b 80 40 07 00 00 5d 48 d3 ef 48 
03:40:57:[ 2352.320900] RIP  [<ffffffff81155b02>] page_waitqueue+0x62/0x80
03:40:57:[ 2352.320900]  RSP <ffff88007fd03d50>
03:40:57:[ 2352.320900] CR2: 0000000000001f50
03:40:57:[ 2352.320900] ------------[ cut here ]------------
03:40:57:[ 2352.320900] kernel BUG at mm/vmalloc.c:1339!
03:40:57:[ 2352.320900] invalid opcode: 0000 [#2] SMP 
03:40:57:[ 2352.320900] Modules linked in: ldiskfs(OF) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ppdev ib_cm iw_cm serio_raw pcspkr virtio_balloon i2c_piix4 parport_pc parport ib_sa ib_mad ib_core ib_addr ext4 mbcache jbd2 ata_generic pata_acpi cirrus syscopyarea sysfillrect virtio_blk sysimgblt drm_kms_helper 8139too floppy ttm ata_piix drm libata 8139cp mii virtio_pci virtio_ring i2c_core virtio
03:40:57:[ 2352.320900] CPU: 1 PID: 0 Comm: swapper/1 Tainted: GF          O--------------   3.10.0-229.11.1.el7_lustre.g853ea39.x86_64 #1
03:40:57:[ 2352.320900] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
03:40:57:[ 2352.320900] task: ffff88007c052220 ti: ffff88007c080000 task.ti: ffff88007c080000
03:40:57:[ 2352.320900] RIP: 0010:[<ffffffff8119060e>]  [<ffffffff8119060e>] __get_vm_area_node+0x1ce/0x1d0
03:40:57:[ 2352.320900] RSP: 0018:ffff88007fd032b0  EFLAGS: 00010006
03:40:57:[ 2352.320900] RAX: ffff88007c083fd8 RBX: 00000000ffffffff RCX: ffffc90000000000
03:40:57:[ 2352.320900] RDX: 0000000000000022 RSI: 0000000000000001 RDI: 0000000000002000
03:40:57:[ 2352.320900] RBP: ffff88007fd03310 R08: ffffe8ffffffffff R09: 00000000ffffffff
03:40:57:[ 2352.320900] R10: ffff880077e6f480 R11: ffff8800362c48d0 R12: ffffffffa00d42c9
03:40:57:[ 2352.320900] R13: 0000000000001200 R14: 00000000000080d2 R15: ffffea0000d8e940
03:40:57:[ 2352.320900] FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
03:40:57:[ 2352.320900] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
03:40:57:[ 2352.320900] CR2: 0000000000001f50 CR3: 000000003652c000 CR4: 00000000000006e0
03:40:57:[ 2352.320900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
03:40:57:[ 2352.320900] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
03:40:57:[ 2352.320900] Stack:
03:40:57:[ 2352.320900]  ffffffff81191d7d 00000000000080d2 ffffffffa00d42c9 8000000000000163
03:40:57:[ 2352.320900]  000080d200000000 0000000000000000 681e1a9a428ae35b ffff880077e6f480
03:40:57:[ 2352.320900]  ffff8800362c48b0 0000000000240000 0000000000000080 ffffea0000d8e940
03:40:57:[ 2352.320900] Call Trace:
03:40:57:[ 2352.320900]  <IRQ> 
03:40:57:[ 2352.320900]  [<ffffffff81191d7d>] ? __vmalloc_node_range+0x7d/0x270
03:40:57:[ 2352.320900]  [<ffffffffa00d42c9>] ? ttm_tt_init+0x69/0xb0 [ttm]
03:40:58:[ 2352.320900]  [<ffffffff81191fb1>] __vmalloc+0x41/0x50
03:40:58:[ 2352.320900]  [<ffffffffa00d42c9>] ? ttm_tt_init+0x69/0xb0 [ttm]
03:40:58:[ 2352.320900]  [<ffffffffa00d42c9>] ttm_tt_init+0x69/0xb0 [ttm]
03:40:58:[ 2352.320900]  [<ffffffffa013b4e8>] cirrus_ttm_tt_create+0x58/0x90 [cirrus]
03:40:58:[ 2352.320900]  [<ffffffffa00d4a7d>] ttm_bo_add_ttm+0x8d/0xc0 [ttm]
03:40:58:[ 2352.320900]  [<ffffffffa00d60f1>] ttm_bo_handle_move_mem+0x571/0x5b0 [ttm]
03:40:58:[ 2352.320900]  [<ffffffff81601ae4>] ? __slab_free+0x10e/0x277
03:40:58:[ 2352.320900]  [<ffffffffa00d674a>] ? ttm_bo_mem_space+0x10a/0x310 [ttm]
03:40:58:[ 2352.320900]  [<ffffffffa00d6e17>] ttm_bo_validate+0x247/0x260 [ttm]
03:40:58:[ 2352.320900]  [<ffffffff81059e69>] ? iounmap+0x79/0xa0
03:40:58:[ 2352.320900]  [<ffffffff81050069>] ? kgdb_arch_late+0xe9/0x180
03:40:58:[ 2352.320900]  [<ffffffffa013bac2>] cirrus_bo_push_sysram+0x82/0xe0 [cirrus]
03:40:58:[ 2352.320900]  [<ffffffffa0139c84>] cirrus_crtc_do_set_base.isra.8.constprop.10+0x84/0x430 [cirrus]
03:40:58:[ 2352.320900]  [<ffffffffa013a479>] cirrus_crtc_mode_set+0x449/0x4d0 [cirrus]
03:40:58:[ 2352.320900]  [<ffffffffa0107939>] drm_crtc_helper_set_mode+0x2e9/0x520 [drm_kms_helper]
03:40:58:[ 2352.320900]  [<ffffffffa01086bf>] drm_crtc_helper_set_config+0x87f/0xaa0 [drm_kms_helper]
03:40:58:[ 2352.320900]  [<ffffffff816092eb>] ? __ww_mutex_lock+0x1b/0xa0
03:40:58:[ 2352.320900]  [<ffffffffa0094711>] drm_mode_set_config_internal+0x61/0xe0 [drm]
03:40:58:[ 2352.320900]  [<ffffffffa0110a94>] drm_fb_helper_pan_display+0x94/0xf0 [drm_kms_helper]
03:40:58:[ 2352.320900]  [<ffffffff81326309>] fb_pan_display+0xc9/0x190
03:40:58:[ 2352.320900]  [<ffffffff81335390>] bit_update_start+0x20/0x50
03:40:58:[ 2352.320900]  [<ffffffff81334dbd>] fbcon_switch+0x39d/0x5a0
03:40:58:[ 2352.320900]  [<ffffffff813a35f9>] redraw_screen+0x1a9/0x270
03:40:58:[ 2352.320900]  [<ffffffff8132650e>] ? fb_blank+0xae/0xc0
03:40:58:[ 2352.320900]  [<ffffffff813322da>] fbcon_blank+0x22a/0x2f0
03:40:58:[ 2352.320900]  [<ffffffff81070384>] ? wake_up_klogd+0x34/0x50
04:40:23:********** Timeout by autotest system **********


 Comments   
Comment by Yang Sheng [ 10/Aug/15 ]

Hi, YuJian, I found this issue just exist in autotest system. Could you tell me whether i can access test nodes to figure out what happens.

Comment by Bob Glossman (Inactive) [ 10/Aug/15 ]

Am I correct in thinking this panic is only seen in tests of the new 3.10.0-229.11.1.el7 kernel version? It seems possible this is a TEI issue and not a lustre one. I haven't seen any such panics during local build & test of the new kernel version.

Comment by Yang Sheng [ 10/Aug/15 ]

Not only 229.11.1 kernel. Also 229.7.2 kernel. https://testing.hpdd.intel.com/test_sets/83040eee-3d20-11e5-9684-5254006e85c2

I agree it looks like a TEI issue rather than LU issue. Please feel free to change it if any.

Comment by Andreas Dilger [ 12/Aug/15 ]

Yang Sheng, if this is a kernel crash, I don't think it is correct to assign it to the TEI team. Is the crash stack always the same? It looks like a problem in the block layer. How often is this bug hit? It is currently marked minor priority.

Comment by Yang Sheng [ 13/Aug/15 ]

Hi, Andreas, This issue just occurred in autotest system, never hit on my local box. So i think it may relate to autotest environment. It is very frequently in recent el7.1 tests.

Comment by Bob Glossman (Inactive) [ 13/Aug/15 ]

another seen in el7 test on master:
https://testing.hpdd.intel.com/test_sets/6baa67cc-4188-11e5-9e18-5254006e85c2

Comment by Andreas Dilger [ 13/Aug/15 ]

It does appear that the oops is the same in each case. In particular:

BUG: unable to handle kernel paging request at 0000000000001f50
03:40:55:[ 2352.320900] IP: [<ffffffff81155b02>] page_waitqueue+0x62/0x80

is always the same.

Yang Sheng, can you please see what is being done at that code, and what pointer is bad? Even if the failure is happening only on autotest nodes and not on your local test system, this is something that the Lustre team needs to debug. You could reserve a node and use loadjenkinsbuild to set up a similar test environment if needed. Maybe it depends on a difference between the RHEL7 guest VM and the specific host VM version?

Comment by Yang Sheng [ 13/Aug/15 ]

Hi, Andreas,

I suspect it relate to some special test environment. Looks like Onyx is full reserved and i have also requested a account for Shadow cluster(TEI-3782). So i can take a closer look for this issue.

Comment by James Nunez (Inactive) [ 18/Aug/15 ]

Similar (same?) failure on master on shadow at 2015-08-17 21:09:48 - https://testing.hpdd.intel.com/test_sets/2f886b10-452d-11e5-afca-5254006e85c2

21:12:00:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 
21:12:00:BUG: unable to handle kernel NULL pointer dereference at (null)
21:12:00:IP: [<ffffffff811ca1b0>] mpage_end_io_read+0x30/0x90
21:12:00:PGD 6a0eb067 PUD 5f9bd067 PMD 0 
21:12:00:Oops: 0002 [#1] SMP 
21:12:00:last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/virtio0/block/vda/queue/scheduler
21:12:00:CPU 1 
21:12:00:Modules linked in: ldiskfs(U) jbd2 nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
21:12:00:
21:12:00:Pid: 3109, comm: modprobe Not tainted 2.6.32-431.29.2.el6_lustre.gca29247.x86_64 #1 Red Hat KVM
21:12:00:RIP: 0010:[<ffffffff811ca1b0>]  [<ffffffff811ca1b0>] mpage_end_io_read+0x30/0x90
21:12:00:RSP: 0018:ffff880002303d58  EFLAGS: 00010002
21:12:00:RAX: ffff880057915ec0 RBX: ffff880057915ea0 RCX: 0000000000000000
21:12:00:RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
21:12:00:RBP: ffff880002303d78 R08: ffff880057947e60 R09: 0000000000000000
21:12:00:R10: ffff880037d2b640 R11: 0000000000000001 R12: ffff880065275c00
21:12:00:R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
21:12:00:FS:  00007fcb045a5700(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
21:12:00:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
21:12:00:CR2: 0000000000000000 CR3: 0000000079aa3000 CR4: 00000000000006e0
21:12:00:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
21:12:00:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
21:12:00:Process modprobe (pid: 3109, threadinfo ffff880079454000, task ffff880079160040)
21:12:00:Stack:
21:12:00: ffff880037da2fa8 ffff880065275c00 0000000000000000 ffff880057947e60
21:12:00:<d> ffff880002303d88 ffffffff811c314d ffff880002303db8 ffffffff812661fb
21:12:00:<d> ffff880002303da8 ffff880065275c00 0000000000000000 0000000000000000
21:12:00:Call Trace:
21:12:00: <IRQ> 
21:12:00: [<ffffffff811c314d>] bio_endio+0x1d/0x40
21:12:00: [<ffffffff812661fb>] req_bio_endio+0x9b/0xe0
21:12:00: [<ffffffff81267777>] blk_update_request+0x117/0x490
21:12:00: [<ffffffff81095211>] ? __queue_work+0x41/0x50
21:12:00: [<ffffffff81267b17>] blk_update_bidi_request+0x27/0xa0
21:12:00: [<ffffffff812698be>] __blk_end_request_all+0x2e/0x60
21:12:00: [<ffffffffa005a22a>] blk_done+0x4a/0x110 [virtio_blk]
21:12:00: [<ffffffff810ecb14>] ? __rcu_process_callbacks+0x54/0x350
21:12:00: [<ffffffffa004e2ac>] vring_interrupt+0x3c/0xd0 [virtio_ring]
21:12:00: [<ffffffff810e7090>] handle_IRQ_event+0x60/0x170
21:12:00: [<ffffffff8107a64f>] ? __do_softirq+0x11f/0x1e0
21:12:00: [<ffffffff810e99ee>] handle_edge_irq+0xde/0x180
21:12:00: [<ffffffff8100faf9>] handle_irq+0x49/0xa0
21:12:00: [<ffffffff81532dbc>] do_IRQ+0x6c/0xf0
21:12:00: [<ffffffff8100b9d3>] ret_from_intr+0x0/0x11
21:12:00: <EOI> 
21:12:00: [<ffffffff8116ff8a>] ? kmem_cache_alloc+0xda/0x190
21:12:00: [<ffffffff811222e5>] ? mempool_alloc_slab+0x15/0x20
21:12:00: [<ffffffff81122483>] mempool_alloc+0x63/0x140
21:12:00: [<ffffffff811c4f3e>] bio_alloc_bioset+0x3e/0xf0
21:12:00: [<ffffffff811c5095>] bio_alloc+0x15/0x30
21:12:00: [<ffffffff811ca065>] mpage_alloc+0x35/0xa0
21:12:00: [<ffffffff811ca5de>] do_mpage_readpage+0x35e/0x5f0
21:12:00: [<ffffffff8114252e>] ? __inc_zone_page_state+0x2e/0x30
21:12:00: [<ffffffff81136b00>] ? __lru_cache_add+0x40/0x90
21:12:00: [<ffffffff811ca9c9>] mpage_readpages+0xe9/0x130
21:12:00: [<ffffffffa008c140>] ? ext3_get_block+0x0/0x120 [ext3]
21:12:00: [<ffffffffa008c140>] ? ext3_get_block+0x0/0x120 [ext3]
21:12:00: [<ffffffff81167cca>] ? alloc_pages_current+0xaa/0x110
21:12:00: [<ffffffffa0089ccd>] ext3_readpages+0x1d/0x20 [ext3]
21:12:00: [<ffffffff81135795>] __do_page_cache_readahead+0x185/0x210
21:12:00: [<ffffffff81135841>] ra_submit+0x21/0x30
21:12:00: [<ffffffff81135bb5>] ondemand_readahead+0x115/0x240
21:12:00: [<ffffffff81135dd3>] page_cache_sync_readahead+0x33/0x50
21:12:00: [<ffffffff81121868>] generic_file_aio_read+0x558/0x700
21:12:00: [<ffffffff811890ba>] do_sync_read+0xfa/0x140
21:12:00: [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
21:12:00: [<ffffffff811519d5>] ? do_mmap_pgoff+0x335/0x380
21:12:00: [<ffffffff81227366>] ? security_file_permission+0x16/0x20
21:12:00: [<ffffffff81189a75>] vfs_read+0xb5/0x1a0
21:12:00: [<ffffffff81189bb1>] sys_read+0x51/0x90
21:12:00: [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290
21:12:00: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
21:12:00:Code: 55 41 54 53 48 83 ec 08 0f 1f 44 00 00 0f b7 57 28 4c 8b 6f 18 48 8b 47 48 49 89 fc 41 83 e5 01 48 c1 e2 04 48 8d 5c 02 f0 eb 13 <f0> 80 0f 08 e8 67 5a f5 ff 49 8b 44 24 48 48 39 c3 72 2e 48 8b 
21:12:00:RIP  [<ffffffff811ca1b0>] mpage_end_io_read+0x30/0x90
21:12:00: RSP <ffff880002303d58>
21:12:00:CR2: 0000000000000000
Comment by Bob Glossman (Inactive) [ 18/Aug/15 ]

another on master, on el7.1:
https://testing.hpdd.intel.com/test_sets/b89f440a-45d2-11e5-9d0e-5254006e85c2

Comment by Yang Sheng [ 19/Aug/15 ]

Hi, Minh, I have reserved Shadow node shadow-57 trying to reproduce this issue. But i have got nothing. Could you please point whether any special configuration may help to hit a instance? I just notice autotest used LVM volume as block device. Any comment is welcome.

Comment by Yang Sheng [ 21/Aug/15 ]

I have reproduced this issue on Shadow cluster. The problem relate to block device type. Shadow VM choose virtio storage type. It would crashed either used lvm or raw-disk as lustre back storage. But I don't encountered this issue use loopback device. So could you please check our VM version or choose other storage type? I'll continue follow up this issue.

Comment by James Nunez (Inactive) [ 24/Aug/15 ]

We hit this issue again at:
2015-08-21 09:38:09 - https://testing.hpdd.intel.com/test_sets/3c45e686-47f1-11e5-9e1d-5254006e85c2
2015-08-25 01:45:31 - https://testing.hpdd.intel.com/test_sets/b0d4981c-4ad5-11e5-aa52-5254006e85c2
2015-08-25 04:15:05 - https://testing.hpdd.intel.com/test_sets/dd85e4d4-4ae8-11e5-9143-5254006e85c2

Comment by Sarah Liu [ 24/Aug/15 ]

more instance on master branch:
https://testing.hpdd.intel.com/test_sessions/f507e54c-4432-11e5-979c-5254006e85c2
https://testing.hpdd.intel.com/test_sessions/57ee80c6-4564-11e5-a472-5254006e85c2
https://testing.hpdd.intel.com/test_sessions/8b6457b0-4568-11e5-9d0e-5254006e85c2
https://testing.hpdd.intel.com/test_sessions/7ade0990-459f-11e5-9d0e-5254006e85c2
https://testing.hpdd.intel.com/test_sessions/a60eab14-442e-11e5-8175-5254006e85c2

This issue blocks the 2.8 testing

Comment by Bob Glossman (Inactive) [ 25/Aug/15 ]

while not hard evidence, I see the 3 test fails from James' last comment are all different kernels. Makes me suspect a TEI cause is more likely.

Comment by Andreas Dilger [ 25/Aug/15 ]

Sarah, if you are able to hit this problem repeatedly, could you please try the patch http://review.whamcloud.com/16035 from LU-6895 to see if this resolves the problem for you?

Comment by Yang Sheng [ 25/Aug/15 ]

In James's comment, just 2015-08-25 01:45:31 - https://testing.hpdd.intel.com/test_sets/b0d4981c-4ad5-11e5-aa52-5254006e85c2 is likely same issue. Other two is different issue.

Comment by Sarah Liu [ 25/Aug/15 ]

Andreas, sure, I will try and update the ticket.

Comment by Bob Glossman (Inactive) [ 26/Aug/15 ]

I believe noting this as RHEL 7.1 in the title is incorrect. panics with similar stack traces can be seen in tests of sles11sp4 and sles12 too.

Comment by Andreas Dilger [ 26/Aug/15 ]

Based on Fan Yong's comments in LU-6895 it is unlikely that the 16035 patch will fix this issue. It would be useful to capture a crashdump after a failure for Yang Sheng to look at, since he is unable to reproduce this problem.

Comment by Bob Glossman (Inactive) [ 26/Aug/15 ]

seen in past test fails on sles11sp3 too. didn't know what to look for to recognize this problem until more recently.

Comment by Andreas Dilger [ 26/Aug/15 ]

Yang Sheng, Sarah, if you are now able to reproduce this problem easily on the Shadow VM, can you please run tests to see if this can be isolated to a specific patch landing on master. There are a number of patches landed on Aug 3.

The last patch that has Test-Parameters: mdsdistro=el7 that passed testing appears to be http://review.whamcloud.com/15737 testing was based on 6af80537ef30 "LU-6872 lov: avoid infinite loop in lsm_alloc_plain()" (landed July 27). Your patch http://review.whamcloud.com/15854 was failing on Aug 9 was based on 50c79ee142be "LU-6490 gss: handle struct key_type match replacement" (landed Aug 5), and that is the first reported failure for this bug.

If this problem can be reproduced consistently via autotest, it would be easiest to submit several independent patches that are based on the patch just before a likely candidate (using ~1 means "one patch before the named one" for git), and then push it to autotest with fortestonly and specific to RHEL7.1 so it doesn't cause a lot of useless testing overhead, something like:

$ git checkout 50c79ee142be~1
$ echo "testing $(git describe) before 50c79ee142be" >> README
$ git commit -S -a -F -
LU-6974 test: testing RHEL7.1 before patch 50c79ee142be

Testing RHEL 7.1 before patch 50c79ee142be to see if it works.

Test-Parameters: fortestonly mdsdistro=el7 ossdistro=el7 \
  clientdistro=el7 mdsfilesystemtype=ldiskfs \
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs \
  testgroup=review-ldiskfs

^D
$ git push ssh://ys@review/fs/lustre-release HEAD:refs/for/master

Since these patches will run only minimal tests, they can all be pushed and tested in parallel rather than doing a bisection.

It is also worthwhile to test against some previous patches to see that they are passing. The patches of interest are below, and you should test the patch one before each patch to see if that resolves the problem:

bad49e39e301 LU-6096 ldiskfs: mark dir's inode dirty
71b3cc331511 LU-6544 mkfs: Improve MDT inode size calculation
af46754e1e9c LU-6904 mdd: prepare linkea before declare
bf0b93a88479 LU-6874 out: missing noop in out_update_ops
0484a16883a9 LU-5888 utils: limit max_sectors_kb tunable setting
bcef61a80ab4 LU-6768 osd: unmap reallocated blocks
11b048aaffd6 LU-6427 osd: race between destroy and load LMV

Without the LU-6096 patch (http://review.whamcloud.com/15581 landed Aug 5) there were also test failures on RHEL7.1 due to directory corruption (found by e2fsck), but I don't see a Test-Parameters: line to show that it was actually tested on RHEL7.1.

Strangely, I do see patch http://review.whamcloud.com/15130 passed RHEL7.1 testing with a master tree based on c45c8ad26004 LU-6765 obdecho: don't copy lu_site (landed Aug 11), so it is possible this isn't a 100% failure, but hopefully with multiple tests we can narrow this down to a specific patch.

Comment by Yang Sheng [ 27/Aug/15 ]

Hi, Andreas,

What i have found is that this issue can be reproduced on shadow cluster stably. I have narrow down to failure point. I can sure this issue only exist in master branch. The system was crashed as below code path:
in mgs_target_reg()

        /* send back the whole mti in the reply */
        rep_mti = req_capsule_server_get(tsi->tsi_pill, &RMF_MGS_TARGET_INFO);
        *rep_mti = *mti;

        /* Flush logs to disk */
        dt_sync(tsi->tsi_env, mgs->mgs_bottom); <<<----- crashed at this point.
        RETURN(rc);
}

I'll continue dig up root cause. But limit to poor network speed. Access Shadow cluster from China is painful experience. Also this issue only reproduced on Shadow cluster VM nodes. I cannot hit it on my local VM box.

Comment by Oleg Drokin [ 27/Aug/15 ]

Another good idea is to create a patch similar to http://review.whamcloud.com/#/c/4139/ only for RHEL7.1 and force rhel7.1 testign and we might be able to catch the party doing wrong write red-handed too.

YuJian, any chance you can do this in addition to the Andreas' suggestion?

Comment by Andreas Dilger [ 27/Aug/15 ]

Unfortunately, this code path hasn't changed in several years. It seems that there is some corruption of requests in the IO path, and when dt_sync->osd_sync() is called it flushes the corrupt IO requests to disk and the crash happens in the queue. What we need to do is find where the corruption happens, which is why I'd suggested testing with different Lustre patches reverted until the problem goes away.

Unfortunately, the test system is very backed up right now, so it seems like this will go faster by manually testing the different Lustre builds on Shadow using git bisect. Oleg also suggested to enable debug options in the kernel to see if they can detect where the corruption is hit.

Comment by Jian Yu [ 27/Aug/15 ]

YuJian, any chance you can do this in addition to the Andreas' suggestion?

Sure, let me do this.

Comment by Yang Sheng [ 27/Aug/15 ]

This issue relate to commit 0484a16883a951f27c30ee926c9a1c305f7cf489. Looks like it should set to 32 * 1024 -1 is proper value.

Comment by Jian Yu [ 27/Aug/15 ]

Here are three for-test-only patches to verify the detection from Yang Sheng: http://review.whamcloud.com/16114, http://review.whamcloud.com/16113 and http://review.whamcloud.com/16115

Comment by Bob Glossman (Inactive) [ 27/Aug/15 ]

while I'm not doubting Yang Sheng's pinpointing of a root cause I'm having trouble understanding why it's not showing up everywhere. the same utils code applies & is used on all builds and distros, including el6.6 and el6.7 where the problem isn't seen.

Comment by Gerrit Updater [ 27/Aug/15 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/16116
Subject: LU-6974 util: set max_sectors_kb properly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: cbdca7dc5d44776716d91196db450067ad32d862

Comment by James A Simmons [ 27/Aug/15 ]

Really, a one off error. I wonder if that was the reason the LU-5888 patch failed for earlier versions of RHEL6 on the VMs. I seen this before Bob in the earlier testing of LU-5888 as well. I never had problems with the patch but RHEL6.3 VMs would fail. It was assumed to a a block layer bug at the time.

Comment by Andreas Dilger [ 28/Aug/15 ]

I still think this is a block layer bug. If userspace can set the max_sectors_kb=32768 then this shouldn't crash the kernel. If that value is incorrect for some reason then the kernel shouldn't allow it. Setting it one less to make it work seems like voodoo to me, and also has the side-effect of not having an IO size that is an even multiple of 1MB.

If we're going to go down that route then it would be better to make it 16384 KB so that the IO submitted to the disk is a power-of-two size to align with the RAID.

Comment by Vinayak (Inactive) [ 28/Aug/15 ]

We have also encountered with this issue in our test environment in every single run on master recent days. I would say this is 100% reproducible.

We reverted the patch for LU-5888 i.e

commit 0484a16883a951f27c30ee926c9a1c305f7cf489
Author:     Andreas Dilger <andreas.dilger@intel.com>
AuthorDate: Wed Apr 1 12:35:29 2015 -0400
Commit:     Oleg Drokin <oleg.drokin@intel.com>
CommitDate: Mon Aug 3 01:57:17 2015 +0000

    LU-5888 utils: limit max_sectors_kb tunable setting
    
    Properly limit the max_sectors_kb tunable if larger than 32MB, as
    added in commit 73bd456e896 (http://review.whamcloud.com/12723).
    It was using the old value in "buf" instead of the updated "newval".
    
    Change write_file() to use a file descriptor directly instead of
    streams, since it only writes a single value before closing again.

and ran the test. Mount process was successful. If I am not wrong this process is not specific to RHEL7.1. Please correct If I am wrong.

Comment by Andreas Dilger [ 28/Aug/15 ]

Vinayak, is the problem also gone if the LU-5888 tunable is set to 16MB instead of reverting the patch? I've pushed a patch to that effect also, but it would be good to get independent verification.

Comment by Vinayak (Inactive) [ 29/Aug/15 ]

Hello Andreas,

We have started the build on new patch you recently submitted through http://review.whamcloud.com/16116. We will update the same once we get the results.

BTW,

I am also trying to understand how LU-5888 runs on single node set up without any MDS crash problem. Am I missing anything here ?

Comment by Vinayak (Inactive) [ 29/Aug/15 ]

We did not face this issue with your new patch on our test system.

Coming back to behavior of LU-5888.
Our test system runs on kvm and Linux version 2.6.32-431.29.2.el6. Hope we all encountered with this issue on kvm might be with different kernel version. Am I right here ?

So just wanted cross verify the LU-5888 behavior on virtual box based VM with 2 clients, 2 OSTs, 1 MDS and mount was successful. My VM box runs on 2.6.32.431.17.1.el6. I need to confirm this again as it looks bit strange.

Investigation till now on this issue making me to focus on kvm virtio.

We found a related (hope related) link. i.e http://sourceforge.net/p/kvm/bugs/516/ which talks about limiting too large iovecs.

Please correct me if I am wrong.

Comment by Vinayak (Inactive) [ 29/Aug/15 ]

Mount console logs on virtual box

successful on virtual box

Checking servers environments
Checking clients cli-3,cli-4 environments
Loading modules from /usr/lib64/lustre/tests/..
detected 2 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
debug=vfstrace rpctrace dlmtrace neterror ha config                 ioctl super lfsck
subsystem_debug=all -lnet -lnd -pinger
Setup mgs, mdt, osts
Starting mgs:   /dev/sdc /mnt/mgs
cli-1: mount.lustre: increased /sys/block/sdc/queue/max_sectors_kb from 1024 to 32767
Started MGS
Starting mds1:   /dev/sdb /mnt/mds1
cli-1: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 32767
Started lustre-MDT0000
Starting ost1:   /dev/sdb /mnt/ost1
cli-2: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 32767
Started lustre-OST0000
Starting ost2:   /dev/sdc /mnt/ost2
cli-2: mount.lustre: increased /sys/block/sdc/queue/max_sectors_kb from 1024 to 32767
Started lustre-OST0001
Starting client: cli-4:  -o user_xattr,flock cli-1@tcp:/lustre /mnt/lustre
Starting client cli-3,cli-4:  -o user_xattr,flock cli-1@tcp:/lustre /mnt/lustre
Started clients cli-3,cli-4: 
cli-1@tcp:/lustre on /mnt/lustre type lustre (rw,user_xattr,flock)
cli-1@tcp:/lustre on /mnt/lustre type lustre (rw,user_xattr,flock)
Using TIMEOUT=20
seting jobstats to procname_uid
Setting lustre.sys.jobid_var from disable to procname_uid
Waiting 90 secs for update
Updated after 7s: wanted 'procname_uid' got 'procname_uid'
disable quota as required

Please note this :
cli-2: mount.lustre: increased /sys/block/sdc/queue/max_sectors_kb from 1024 to 32767

In the above case mount.lustre did not enter

                if (newval > 32 * 1024) {
                       newval = 32 * 1024;
                        snprintf(buf, sizeof(buf), "%llu", newval);
                }

I feel the fix by Yeng Sheng is fine. ie

                if (newval > 32 * 1024) {
-                       newval = 32 * 1024;
+                       newval = 32 * 1024 - 1;
                        snprintf(buf, sizeof(buf), "%llu", newval);
                }

I did not go into much details about handing of this in block layer.

but looks max value should be calculated as
32 * 1024 - 1 = 32767

In other words the range starts from 0 to 32767 (2^15 - 1). If we consider 32768 then it might be over flow case.

We tested this on our test environment and it works.

The fact is that LU-5888 speaks about fixing the same but the new val was set to 32768 which seems should be 32767 ie (32*1024 -1).

mount.lustre: set max_sectors_kb to 2147483647

Unexpected large value (2147483647) might be because of
newval = strtoull(buf, &end, 0);

but the definition of stroull() tells that

/* Not currently supported: leading whitespace, sign, 0x prefix, zero base */

Please refer strtoull

I agree that max_sectors_kb should be handled in block layer also but unfortunately it might be missing.

Please correct if I am wrong.

Comment by Vinayak (Inactive) [ 30/Aug/15 ]

When I tried to set max_sectors_kb on virtual box to 32768 resulted in "Invalid argument" and works well if we try to set the max_sectors_kb to 32767. Looks 32767 is the max value we can set (limitation).

Is this behavior same on kvm also ?

# echo 32768 > /sys/block/sdb/queue/max_sectors_kb
bash: echo: write error: Invalid argument

# cat /sys/block/sdb/queue/max_sectors_kb
512

# echo 32767 > /sys/block/sdb/queue/max_sectors_kb

# cat /sys/block/sdb/queue/max_sectors_kb
32767
Comment by Yang Sheng [ 31/Aug/15 ]

Hi, Vinayak,

Please notice that max_hw_sectors_kb. Kernel doesn't permit max_sectors_kb exceed max_hw_sectors_kb. That is why the issue rarely hit on rhel6. rhel7 is able to handle big sector value. So max_hw_sectors_kb can be set to 2147483647(UINT_MAX/2).

Comment by James A Simmons [ 31/Aug/15 ]

Vinayak does reducing max_sectors_kb to 16384 work for you?

Comment by Vinayak (Inactive) [ 31/Aug/15 ]

>> Vinayak does reducing max_sectors_kb to 16384 work for you?
Yes James. It works for me.

Comment by Gerrit Updater [ 31/Aug/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16116/
Subject: LU-6974 util: set max_sectors_kb properly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ccf889d2a3de4c3abe9bf0f4ebd75ca82932ab8e

Comment by Joseph Gmitter (Inactive) [ 31/Aug/15 ]

Landed for 2.8.

Generated at Sat Feb 10 02:04:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.