[LU-9873] parallel-scale-nfsv4 no sub tests failed: test failed to respond and timed out Created: 11/Aug/17 Updated: 17/Aug/17 Resolved: 17/Aug/17 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Casper | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Trevis2, full |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
https://testing.hpdd.intel.com/test_sessions/9b7c7e8e-7b5a-4f4d-af09-400c586a8340 All subtests pass, but during clean up, a umount command is issued on the MDS. No further activity occurs after the umount command. I could not find any stuck processes on any of the nodes. From suite_log: CMD: trevis-41vm3 exportfs -u *:/mnt/lustre trevis-41vm3: exportfs: Could not find '*:/mnt/lustre' to unexport. CMD: trevis-41vm3 exportfs -v CMD: trevis-41vm3 grep -c /mnt/lustre' ' /proc/mounts Stopping client trevis-41vm3 /mnt/lustre (opts:-f) CMD: trevis-41vm3 lsof -t /mnt/lustre CMD: trevis-41vm3 umount -f /mnt/lustre 2>&1 |
| Comments |
| Comment by John Hammond [ 16/Aug/17 ] |
|
Looks like a client crashed. From parallel-scale-nfsv4.test_complete.console.trevis-41vm3.log https://testing.hpdd.intel.com/test_logs/f0b65536-799c-11e7-8e1f-5254006e85c2/show_text 03:29:00:[ 2001.634261] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test complete, duration 1826 sec ============================================= 03:28:51 (1501903731) 03:29:00:[ 2002.284696] Lustre: DEBUG MARKER: chkconfig --list nfsserver > /dev/null 2>&1 && 03:29:00:[ 2002.284696] service nfsserver stop || service nfs stop 03:29:00:[ 2002.432510] nfsd: last server has exited, flushing export cache 03:29:00:[ 2002.528874] Lustre: DEBUG MARKER: exportfs -u *:/mnt/lustre 03:29:00:[ 2002.727955] Lustre: DEBUG MARKER: exportfs -v 03:29:00:[ 2002.924616] Lustre: DEBUG MARKER: grep -c /mnt/lustre' ' /proc/mounts 03:29:00:[ 2003.106556] Lustre: DEBUG MARKER: lsof -t /mnt/lustre 03:29:00:[ 2003.388672] Lustre: DEBUG MARKER: umount -f /mnt/lustre 2>&1 03:29:00:[ 2003.486571] Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request 03:29:00:[ 2003.489389] LustreError: 6679:0:(file.c:176:ll_close_inode_openhandle()) lustre-clilmv-ffff880062f67800: inode [0x200000402:0x45de:0x0] mdc close failed: rc = -108 03:29:00:[ 2003.500346] Lustre: 3641:0:(llite_lib.c:2624:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.9.5.239@tcp:/lustre/fid: [0x200000402:0x4887:0x0]/ may get corrupted (rc -108) ... 03:29:11:[ 2003.841195] Lustre: 3642:0:(llite_lib.c:2624:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.9.5.239@tcp:/lustre/fid: [0x200000402:0x479d:0x0]/ may get corrupted (rc -108) 03:29:11:[ 2015.199627] ------------[ cut here ]------------ 03:29:11:[ 2015.202619] WARNING: at lib/kobject.c:612 kobject_put+0x50/0x60() 03:29:11:[ 2015.205602] kobject: '(null)' (ffff880062e45ee0): is not initialized, yet kobject_put() is being called. 03:29:11: 03:29:11:[ 2015.208970] Modules linked in: osc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr virtio_balloon i2c_piix4 parport_pc parport nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi 8139too virtio_blk cirrus crct10dif_pclmul crct10dif_common drm_kms_helper syscopyarea crc32c_intel sysfillrect sysimgblt serio_raw fb_sys_fops 8139cp mii virtio_pci virtio_ring virtio ata_piix ttm drm libata i2c_core floppy 03:29:11:[ 2015.239449] CPU: 1 PID: 6679 Comm: umount Tainted: G OE ------------ 3.10.0-514.26.2.el7_lustre.x86_64 #1 03:29:11:[ 2015.242767] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 03:29:11:[ 2015.245722] ffff8800628139e0 000000004f95df6b ffff880062813998 ffffffff8168739f 03:29:11:[ 2015.248803] ffff8800628139d0 ffffffff81085cb0 ffff880062e45ee0 ffff880035d09800 03:29:11:[ 2015.251879] ffff880062e45430 ffff880062e44fa0 ffff880063521c00 ffff880062813a38 03:29:11:[ 2015.254919] Call Trace: 03:29:11:[ 2015.257442] [<ffffffff8168739f>] dump_stack+0x19/0x1b 03:29:11:[ 2015.260207] [<ffffffff81085cb0>] warn_slowpath_common+0x70/0xb0 03:29:11:[ 2015.262967] [<ffffffff81085d4c>] warn_slowpath_fmt+0x5c/0x80 03:29:11:[ 2015.265664] [<ffffffff8131ae20>] kobject_put+0x50/0x60 03:29:11:[ 2015.268314] [<ffffffffa0867576>] lprocfs_obd_cleanup+0x56/0x70 [obdclass] 03:29:11:[ 2015.271044] [<ffffffffa121fcc7>] osc_precleanup+0xe7/0x2c0 [osc] 03:29:11:[ 2015.273699] [<ffffffffa087af71>] class_cleanup+0x2a1/0xcf0 [obdclass] 03:29:11:[ 2015.276353] [<ffffffffa087d9c2>] class_process_config+0x1992/0x23f0 [obdclass] 03:29:11:[ 2015.279025] [<ffffffffa0676b97>] ? libcfs_debug_msg+0x57/0x80 [libcfs] 03:29:11:[ 2015.281615] [<ffffffffa087e5e6>] class_manual_cleanup+0x1c6/0x710 [obdclass] 03:29:11:[ 2015.284219] [<ffffffffa10809d2>] lov_putref+0x302/0xa80 [lov] 03:29:11:[ 2015.286676] [<ffffffffa1087d92>] lov_disconnect+0x172/0x420 [lov] 03:29:11:[ 2015.289144] [<ffffffffa114e903>] obd_disconnect+0xb3/0x330 [lustre] 03:29:11:[ 2015.291579] [<ffffffffa1151d40>] ll_put_super+0x610/0xaa0 [lustre] 03:29:11:[ 2015.293974] [<ffffffff81138fcd>] ? call_rcu_sched+0x1d/0x20 03:29:11:[ 2015.296288] [<ffffffffa117c40c>] ? ll_destroy_inode+0x1c/0x20 [lustre] 03:29:11:[ 2015.298636] [<ffffffff8121a8f8>] ? destroy_inode+0x38/0x60 03:29:11:[ 2015.300860] [<ffffffff8121aa26>] ? evict+0x106/0x170 03:29:11:[ 2015.302980] [<ffffffff8121aace>] ? dispose_list+0x3e/0x50 03:29:11:[ 2015.305140] [<ffffffff8121b724>] ? evict_inodes+0x114/0x140 03:29:11:[ 2015.307278] [<ffffffff81200f72>] generic_shutdown_super+0x72/0xf0 03:29:11:[ 2015.309410] [<ffffffff81201342>] kill_anon_super+0x12/0x20 03:29:11:[ 2015.311467] [<ffffffffa0880ef5>] lustre_kill_super+0x45/0x50 [obdclass] 03:29:11:[ 2015.313564] [<ffffffff812016f9>] deactivate_locked_super+0x49/0x60 03:29:11:[ 2015.315603] [<ffffffff81201cf6>] deactivate_super+0x46/0x60 03:29:11:[ 2015.317548] [<ffffffff8121f145>] mntput_no_expire+0xc5/0x120 03:29:11:[ 2015.319469] [<ffffffff81220280>] SyS_umount+0xa0/0x3b0 03:29:11:[ 2015.321302] [<ffffffff81697a89>] system_call_fastpath+0x16/0x1b 03:29:11:[ 2015.323160] ---[ end trace efe121d40f7934b5 ]--- 03:29:11:[ 2015.324909] BUG: unable to handle kernel NULL pointer dereference at (null) 03:29:11:[ 2015.325882] IP: [<ffffffff81333b9b>] __list_add+0x1b/0xc0 03:29:11:[ 2015.325882] PGD 0 03:29:11:[ 2015.325882] Oops: 0000 [#1] SMP 03:29:11:[ 2015.325882] Modules linked in: osc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr virtio_balloon i2c_piix4 parport_pc parport nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi 8139too virtio_blk cirrus crct10dif_pclmul crct10dif_common drm_kms_helper syscopyarea crc32c_intel sysfillrect sysimgblt serio_raw fb_sys_fops 8139cp mii virtio_pci virtio_ring virtio ata_piix ttm drm libata i2c_core floppy 03:29:11:[ 2015.325882] CPU: 1 PID: 6679 Comm: umount Tainted: G W OE ------------ 3.10.0-514.26.2.el7_lustre.x86_64 #1 03:29:11:[ 2015.325882] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 03:29:11:[ 2015.325882] task: ffff880054f1ce70 ti: ffff880062810000 task.ti: ffff880062810000 03:29:11:[ 2015.325882] RIP: 0010:[<ffffffff81333b9b>] [<ffffffff81333b9b>] __list_add+0x1b/0xc0 03:29:11:[ 2015.325882] RSP: 0018:ffff8800628139d8 EFLAGS: 00010046 03:29:11:[ 2015.325882] RAX: ffff880062813a00 RBX: ffff880062813a18 RCX: ffff880054f1ce70 03:29:11:[ 2015.325882] RDX: ffff880062e45f30 RSI: 0000000000000000 RDI: ffff880062813a18 03:29:11:[ 2015.325882] RBP: ffff8800628139f0 R08: 0000000000000000 R09: 0000000000000000 03:29:11:[ 2015.325882] R10: 000000000000b542 R11: 0000000000000001 R12: ffff880062e45f30 03:29:11:[ 2015.325882] R13: 0000000000000000 R14: ffff880054f1ce70 R15: ffff880063521c00 03:29:11:[ 2015.325882] FS: 00007f9b5fa31880(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 03:29:11:[ 2015.325882] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 03:29:11:[ 2015.325882] CR2: 0000000000000000 CR3: 000000003c785000 CR4: 00000000000406e0 03:29:11:[ 2015.325882] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 03:29:11:[ 2015.325882] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 03:29:11:[ 2015.325882] Stack: 03:29:11:[ 2015.325882] ffff880062e45f20 ffff880062e45f28 7fffffffffffffff ffff880062813a50 03:29:11:[ 2015.325882] ffffffff8168ce1b 0000000000000001 ffff880054f1ce70 ffffffff810c54e0 03:29:11:[ 2015.325882] 0000000000000000 0000000000000000 000000004f95df6b ffff880062e44f60 03:29:11:[ 2015.325882] Call Trace: 03:29:11:[ 2015.325882] [<ffffffff8168ce1b>] wait_for_completion+0xeb/0x170 03:29:11:[ 2015.325882] [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20 03:29:11:[ 2015.325882] [<ffffffffa0867582>] lprocfs_obd_cleanup+0x62/0x70 [obdclass] 03:29:11:[ 2015.325882] [<ffffffffa121fcc7>] osc_precleanup+0xe7/0x2c0 [osc] 03:29:11:[ 2015.325882] [<ffffffffa087af71>] class_cleanup+0x2a1/0xcf0 [obdclass] 03:29:11:[ 2015.325882] [<ffffffffa087d9c2>] class_process_config+0x1992/0x23f0 [obdclass] 03:29:11:[ 2015.325882] [<ffffffffa0676b97>] ? libcfs_debug_msg+0x57/0x80 [libcfs] 03:29:11:[ 2015.325882] [<ffffffffa087e5e6>] class_manual_cleanup+0x1c6/0x710 [obdclass] 03:29:11:[ 2015.325882] [<ffffffffa10809d2>] lov_putref+0x302/0xa80 [lov] 03:29:11:[ 2015.325882] [<ffffffffa1087d92>] lov_disconnect+0x172/0x420 [lov] 03:29:11:[ 2015.325882] [<ffffffffa114e903>] obd_disconnect+0xb3/0x330 [lustre] 03:29:11:[ 2015.325882] [<ffffffffa1151d40>] ll_put_super+0x610/0xaa0 [lustre] 03:29:11:[ 2015.325882] [<ffffffff81138fcd>] ? call_rcu_sched+0x1d/0x20 03:29:11:[ 2015.325882] [<ffffffffa117c40c>] ? ll_destroy_inode+0x1c/0x20 [lustre] 03:29:11:[ 2015.325882] [<ffffffff8121a8f8>] ? destroy_inode+0x38/0x60 03:29:11:[ 2015.325882] [<ffffffff8121aa26>] ? evict+0x106/0x170 03:29:11:[ 2015.325882] [<ffffffff8121aace>] ? dispose_list+0x3e/0x50 03:29:11:[ 2015.325882] [<ffffffff8121b724>] ? evict_inodes+0x114/0x140 03:29:11:[ 2015.325882] [<ffffffff81200f72>] generic_shutdown_super+0x72/0xf0 03:29:11:[ 2015.325882] [<ffffffff81201342>] kill_anon_super+0x12/0x20 03:29:11:[ 2015.325882] [<ffffffffa0880ef5>] lustre_kill_super+0x45/0x50 [obdclass] 03:29:11:[ 2015.325882] [<ffffffff812016f9>] deactivate_locked_super+0x49/0x60 03:29:11:[ 2015.325882] [<ffffffff81201cf6>] deactivate_super+0x46/0x60 03:29:11:[ 2015.325882] [<ffffffff8121f145>] mntput_no_expire+0xc5/0x120 03:29:11:[ 2015.325882] [<ffffffff81220280>] SyS_umount+0xa0/0x3b0 03:29:11:[ 2015.325882] [<ffffffff81697a89>] system_call_fastpath+0x16/0x1b 03:29:11:[ 2015.325882] Code: ff e9 3b ff ff ff b8 f4 ff ff ff e9 31 ff ff ff 55 48 89 e5 41 55 49 89 f5 41 54 49 89 d4 53 4c 8b 42 08 48 89 fb 49 39 f0 75 2a <4d> 8b 45 00 4d 39 c4 75 68 4c 39 e3 74 3e 4c 39 eb 74 39 49 89 03:29:11:[ 2015.325882] RIP [<ffffffff81333b9b>] __list_add+0x1b/0xc0 03:29:11:[ 2015.325882] RSP <ffff8800628139d8> 03:29:11:[ 2015.325882] CR2: 0000000000000000 |
| Comment by John Hammond [ 16/Aug/17 ] |
|
Maybe |
| Comment by James Casper [ 16/Aug/17 ] |
|
This could also be a dupe of |
| Comment by John Hammond [ 16/Aug/17 ] |
|
Jim, |
| Comment by Oleg Drokin [ 17/Aug/17 ] |
|
yeah, I vote on |