[LU-10421] mds-survey test 1: Timeout occurred after 426 mins, last suite running was mds-survey, restarting cluster to continue tests Created: 20/Dec/17 Updated: 12/Apr/18 Resolved: 06/Mar/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0, Lustre 2.10.3 |
| Fix Version/s: | Lustre 2.11.0, Lustre 2.10.4 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Casper | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne, zfs | ||
| Environment: |
onyx, full DNE |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
session: https://testing.hpdd.intel.com/test_sessions/9e3f4edc-daff-4e9c-bb2c-5e501afcb7bf From MDS console: [22053.144258] LustreError: 13506:0:(echo_client.c:1795:echo_md_lookup()) lookup MDT0001-tests: rc = -2 [22053.145264] LustreError: 13506:0:(echo_client.c:2027:echo_md_destroy_internal()) Can't find child MDT0001-tests: rc = -2 [22053.781142] LustreError: 13611:0:(echo_client.c:1795:echo_md_lookup()) lookup MDT0001-tests3: rc = -2 [22053.782164] LustreError: 13611:0:(echo_client.c:1795:echo_md_lookup()) Skipped 2 previous similar messages [22053.783133] LustreError: 13611:0:(echo_client.c:2027:echo_md_destroy_internal()) Can't find child MDT0001-tests3: rc = -2 [22053.784222] LustreError: 13611:0:(echo_client.c:2027:echo_md_destroy_internal()) Skipped 2 previous similar messages [22055.866749] LustreError: 13891:0:(echo_client.c:1795:echo_md_lookup()) lookup MDT0003-tests: rc = -2 [22055.867931] LustreError: 13891:0:(echo_client.c:2027:echo_md_destroy_internal()) Can't find child MDT0003-tests: rc = -2 [22177.268865] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [22177.270372] IP: [<ffffffffc0bdb913>] lu_object_alloc+0x73/0x310 [obdclass] [22177.271432] PGD 48733067 PUD 3cfc0067 PMD 0 [22177.272157] Oops: 0002 [#1] SMP [22177.272692] Modules linked in: obdecho(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core dm_mod iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd nfsd pcspkr i2c_piix4 joydev virtio_balloon parport_pc i2c_core parport nfs_acl lockd auth_rpcgss grace sunrpc ip_tables ata_generic pata_acpi ext4 mbcache jbd2 ata_piix libata virtio_blk 8139too crct10dif_pclmul crct10dif_common floppy crc32c_intel virtio_pci virtio_ring serio_raw virtio 8139cp mii [22177.287656] CPU: 1 PID: 19364 Comm: lctl Tainted: P OE ------------ 3.10.0-693.5.2.el7_lustre.x86_64 #1 [22177.289215] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [22177.290055] task: ffff88003ef69fa0 ti: ffff880049e5c000 task.ti: ffff880049e5c000 [22177.291188] RIP: 0010:[<ffffffffc0bdb913>] [<ffffffffc0bdb913>] lu_object_alloc+0x73/0x310 [obdclass] [22177.292617] RSP: 0018:ffff880049e5fb20 EFLAGS: 00010246 [22177.293373] RAX: 00000002400090a0 RBX: ffff8800528d0e40 RCX: 0000000000000000 [22177.294437] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88004885a000 [22177.295492] RBP: ffff880049e5fb68 R08: 0000000000000000 R09: ffff88004885a000 [22177.296546] R10: 000000000000000d R11: 0000000000000fff R12: ffff88004885a000 [22177.297624] R13: ffff880049e5fc08 R14: ffff88005a97a1f8 R15: 0000000000000000 [22177.298634] FS: 00007f0dfe043740(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [22177.299832] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [22177.300688] CR2: 0000000000000008 CR3: 000000001c64d000 CR4: 00000000000406e0 [22177.301787] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [22177.302862] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [22177.303911] Stack: [22177.304231] ffff880053233000 ffffffffc0bd91f3 0000000000000000 ffff880058793e18 [22177.305487] ffff8800528d0e40 0000000000000000 ffff880049e5fc08 ffff88005a97a1f8 [22177.306635] ffff880053233000 ffff880049e5fbd0 ffffffffc0bdbd7c ffff880058793e18 [22177.307918] Call Trace: [22177.308341] [<ffffffffc0bd91f3>] ? htable_lookup+0x153/0x170 [obdclass] [22177.309359] [<ffffffffc0bdbd7c>] lu_object_find_at+0x16c/0x290 [obdclass] [22177.310377] [<ffffffffc11bfa9e>] echo_md_dir_stripe_choose.isra.43+0x26e/0x680 [obdecho] [22177.311601] [<ffffffffc05d77eb>] ? cfs_hash_spin_unlock+0xb/0x10 [libcfs] [22177.312625] [<ffffffffc11c0d6c>] echo_md_handler.isra.45+0xebc/0x2c20 [obdecho] [22177.313708] [<ffffffffc11c6891>] echo_client_iocontrol+0x1091/0x1ba0 [obdecho] [22177.314799] [<ffffffffc0bbc459>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [22177.315936] [<ffffffffc0ba714d>] class_handle_ioctl+0x18cd/0x1dd0 [obdclass] [22177.316937] [<ffffffff811b1e81>] ? handle_mm_fault+0x691/0xfa0 [22177.317792] [<ffffffff812b1a98>] ? security_capable+0x18/0x20 [22177.318674] [<ffffffffc0b8c602>] obd_class_ioctl+0xd2/0x170 [obdclass] [22177.319675] [<ffffffff812151bd>] do_vfs_ioctl+0x33d/0x540 [22177.320472] [<ffffffff816b0456>] ? trace_do_page_fault+0x56/0x150 [22177.321376] [<ffffffff81215461>] SyS_ioctl+0xa1/0xc0 [22177.322137] [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b |
| Comments |
| Comment by Mikhail Pershin [ 12/Jan/18 ] |
|
+1 on master |
| Comment by Jian Yu [ 19/Jan/18 ] |
|
This failure occurred at least 20 times in last two weeks. |
| Comment by Saurabh Tandan (Inactive) [ 31/Jan/18 ] |
|
Seen for 2.10.57 "SLES 12 SP3 Server/DNE/ldiskfs SLES 12 SP3 Client" as well . https://testing.hpdd.intel.com/test_sets/ace94120-fd4e-11e7-a7cd-52540065bddc |
| Comment by Minh Diep [ 06/Feb/18 ] |
|
+1 master dne-zfs https://testing.hpdd.intel.com/test_sets/1eb3e98c-0b54-11e8-a7cd-52540065bddc |
| Comment by Jian Yu [ 08/Feb/18 ] |
|
The failure occurred more than 50 times in one week, which is affecting patch testing on master branch: |
| Comment by nasf (Inactive) [ 09/Feb/18 ] |
|
+1 on master: |
| Comment by Mikhail Pershin [ 12/Feb/18 ] |
|
+1 on master: [15920.683325] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [15920.684035] IP: [<ffffffffc0c50d23>] lu_object_alloc+0x73/0x310 [obdclass] [15920.684035] PGD 800000003a856067 PUD 5fb92067 PMD 0 [15920.684035] Oops: 0002 [#1] SMP [15920.684035] Modules linked in: obdecho(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic crct10dif_common ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core dm_mod ppdev nfsd pcspkr parport_pc joydev virtio_balloon parport i2c_piix4 nfs_acl lockd auth_rpcgss grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_blk 8139too ata_piix libata virtio_pci virtio_ring serio_raw virtio 8139cp mii i2c_core floppy [15920.684035] CPU: 0 PID: 22706 Comm: lctl Tainted: P OE ------------ 3.10.0-693.17.1.el7_lustre.x86_64 #1 [15920.684035] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [15920.684035] task: ffff8800174e4f10 ti: ffff88003e728000 task.ti: ffff88003e728000 [15920.684035] RIP: 0010:[<ffffffffc0c50d23>] [<ffffffffc0c50d23>] lu_object_alloc+0x73/0x310 [obdclass] [15920.684035] RSP: 0018:ffff88003e72baf0 EFLAGS: 00010246 [15920.684035] RAX: 0000000240000bd3 RBX: ffff880055523180 RCX: 0000000000000000 [15920.684035] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880050313dc0 [15920.684035] RBP: ffff88003e72bb38 R08: 0000000000000000 R09: 0000000000000000 [15920.684035] R10: ffff880050313dc0 R11: 0000000000000fff R12: ffff880050313dc0 [15920.684035] R13: ffff88003e72bbd8 R14: ffff8800528dc228 R15: 0000000000000000 [15920.684035] FS: 00007f2d43084740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [15920.684035] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [15920.684035] CR2: 0000000000000008 CR3: 00000000401e0000 CR4: 00000000000006f0 [15920.684035] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [15920.684035] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [15920.684035] Call Trace: [15920.684035] [<ffffffffc0c4e603>] ? htable_lookup+0x153/0x170 [obdclass] [15920.684035] [<ffffffffc0c5118c>] lu_object_find_at+0x16c/0x290 [obdclass] [15920.684035] [<ffffffffc12617de>] echo_md_dir_stripe_choose.isra.43+0x26e/0x680 [obdecho] [15920.684035] [<ffffffffc126268e>] echo_md_handler.isra.45+0xa9e/0x2c20 [obdecho] [15920.684035] [<ffffffffc12658a1>] echo_client_iocontrol+0x1091/0x1ba0 [obdecho] [15920.684035] [<ffffffffc0c31829>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [15920.684035] [<ffffffffc0c1c63d>] class_handle_ioctl+0x18ed/0x1df0 [obdclass] [15920.684035] [<ffffffff811af746>] ? do_read_fault.isra.44+0xe6/0x130 [15920.684035] [<ffffffff812b3ea8>] ? security_capable+0x18/0x20 [15920.684035] [<ffffffffc0c01602>] obd_class_ioctl+0xd2/0x170 [obdclass] [15920.684035] [<ffffffff8121730d>] do_vfs_ioctl+0x33d/0x540 [15920.684035] [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20 [15920.684035] [<ffffffff810ec7ba>] ? __getnstimeofday64+0x3a/0xd0 [15920.684035] [<ffffffff812175b1>] SyS_ioctl+0xa1/0xc0 [15920.684035] [<ffffffff816b8930>] ? system_call_after_swapgs+0x15d/0x214 [15920.684035] [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b [15920.684035] [<ffffffff816b889d>] ? system_call_after_swapgs+0xca/0x214 [15920.684035] Code: 48 8b 42 10 ff 10 48 85 c0 49 89 c4 0f 84 3c 02 00 00 48 3d 00 f0 ff ff 0f 87 6f 02 00 00 48 8b 08 49 8b 57 08 49 8b 07 45 31 ff <48> 89 51 08 48 89 01 49 8b 04 24 4c 8d 70 40 48 89 44 24 08 48 [15920.684035] RIP [<ffffffffc0c50d23>] lu_object_alloc+0x73/0x310 [obdclass] |
| Comment by Patrick Farrell (Inactive) [ 15/Feb/18 ] |
|
+1 on master: |
| Comment by Patrick Farrell (Inactive) [ 15/Feb/18 ] |
|
One more: |
| Comment by Patrick Farrell (Inactive) [ 15/Feb/18 ] |
|
https://testing.hpdd.intel.com/test_sessions/0e8a10c7-fd99-4d2a-8443-8d42a144e1b7 |
| Comment by Gerrit Updater [ 16/Feb/18 ] |
|
John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/31338 |
| Comment by Minh Diep [ 23/Feb/18 ] |
|
+1 on b2_10 https://testing.hpdd.intel.com/test_sets/1c1f2c2c-11f3-11e8-bd00-52540065bddc |
| Comment by Mikhail Pershin [ 06/Mar/18 ] |
|
+1 on master |
| Comment by Gerrit Updater [ 06/Mar/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31338/ |
| Comment by Gerrit Updater [ 06/Mar/18 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31556 |
| Comment by Peter Jones [ 06/Mar/18 ] |
|
Landed for 2.11 |
| Comment by Saurabh Tandan (Inactive) [ 11/Apr/18 ] |
|
+1 on 2.10.3 https://testing.hpdd.intel.com/test_sets/6e7cb3b0-3d84-11e8-960d-52540065bddc
|
| Comment by Gerrit Updater [ 12/Apr/18 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31556/ |