[LU-9873] parallel-scale-nfsv4 no sub tests failed: test failed to respond and timed out Created: 11/Aug/17  Updated: 17/Aug/17  Resolved: 17/Aug/17

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Casper Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Trevis2, full
server: RHEL 7.3, ldiskfs, branch master, v2.10.51, b3620
client: RHEL 7.3, branch master, v2.10.51, b3620


Issue Links:
Duplicate
duplicates LU-9791 When umount client, kobject_put crash... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

https://testing.hpdd.intel.com/test_sessions/9b7c7e8e-7b5a-4f4d-af09-400c586a8340

All subtests pass, but during clean up, a umount command is issued on the MDS. No further activity occurs after the umount command. I could not find any stuck processes on any of the nodes.

From suite_log:

CMD: trevis-41vm3 exportfs -u *:/mnt/lustre
trevis-41vm3: exportfs: Could not find '*:/mnt/lustre' to unexport.
CMD: trevis-41vm3 exportfs -v
CMD: trevis-41vm3 grep -c /mnt/lustre' ' /proc/mounts
Stopping client trevis-41vm3 /mnt/lustre (opts:-f)
CMD: trevis-41vm3 lsof -t /mnt/lustre
CMD: trevis-41vm3 umount -f /mnt/lustre 2>&1  


 Comments   
Comment by John Hammond [ 16/Aug/17 ]

Looks like a client crashed. From parallel-scale-nfsv4.test_complete.console.trevis-41vm3.log https://testing.hpdd.intel.com/test_logs/f0b65536-799c-11e7-8e1f-5254006e85c2/show_text

03:29:00:[ 2001.634261] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test complete, duration 1826 sec ============================================= 03:28:51 (1501903731)
03:29:00:[ 2002.284696] Lustre: DEBUG MARKER: chkconfig --list nfsserver > /dev/null 2>&1 &&
03:29:00:[ 2002.284696] 				 service nfsserver stop || service nfs stop
03:29:00:[ 2002.432510] nfsd: last server has exited, flushing export cache
03:29:00:[ 2002.528874] Lustre: DEBUG MARKER: exportfs -u *:/mnt/lustre
03:29:00:[ 2002.727955] Lustre: DEBUG MARKER: exportfs -v
03:29:00:[ 2002.924616] Lustre: DEBUG MARKER: grep -c /mnt/lustre' ' /proc/mounts
03:29:00:[ 2003.106556] Lustre: DEBUG MARKER: lsof -t /mnt/lustre
03:29:00:[ 2003.388672] Lustre: DEBUG MARKER: umount -f /mnt/lustre 2>&1
03:29:00:[ 2003.486571] Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request
03:29:00:[ 2003.489389] LustreError: 6679:0:(file.c:176:ll_close_inode_openhandle()) lustre-clilmv-ffff880062f67800: inode [0x200000402:0x45de:0x0] mdc close failed: rc = -108
03:29:00:[ 2003.500346] Lustre: 3641:0:(llite_lib.c:2624:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.9.5.239@tcp:/lustre/fid: [0x200000402:0x4887:0x0]/ may get corrupted (rc -108)
...
03:29:11:[ 2003.841195] Lustre: 3642:0:(llite_lib.c:2624:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.9.5.239@tcp:/lustre/fid: [0x200000402:0x479d:0x0]/ may get corrupted (rc -108)
03:29:11:[ 2015.199627] ------------[ cut here ]------------
03:29:11:[ 2015.202619] WARNING: at lib/kobject.c:612 kobject_put+0x50/0x60()
03:29:11:[ 2015.205602] kobject: '(null)' (ffff880062e45ee0): is not initialized, yet kobject_put() is being called.
03:29:11:
03:29:11:[ 2015.208970] Modules linked in: osc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr virtio_balloon i2c_piix4 parport_pc parport nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi 8139too virtio_blk cirrus crct10dif_pclmul crct10dif_common drm_kms_helper syscopyarea crc32c_intel sysfillrect sysimgblt serio_raw fb_sys_fops 8139cp mii virtio_pci virtio_ring virtio ata_piix ttm drm libata i2c_core floppy
03:29:11:[ 2015.239449] CPU: 1 PID: 6679 Comm: umount Tainted: G           OE  ------------   3.10.0-514.26.2.el7_lustre.x86_64 #1
03:29:11:[ 2015.242767] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
03:29:11:[ 2015.245722]  ffff8800628139e0 000000004f95df6b ffff880062813998 ffffffff8168739f
03:29:11:[ 2015.248803]  ffff8800628139d0 ffffffff81085cb0 ffff880062e45ee0 ffff880035d09800
03:29:11:[ 2015.251879]  ffff880062e45430 ffff880062e44fa0 ffff880063521c00 ffff880062813a38
03:29:11:[ 2015.254919] Call Trace:
03:29:11:[ 2015.257442]  [<ffffffff8168739f>] dump_stack+0x19/0x1b
03:29:11:[ 2015.260207]  [<ffffffff81085cb0>] warn_slowpath_common+0x70/0xb0
03:29:11:[ 2015.262967]  [<ffffffff81085d4c>] warn_slowpath_fmt+0x5c/0x80
03:29:11:[ 2015.265664]  [<ffffffff8131ae20>] kobject_put+0x50/0x60
03:29:11:[ 2015.268314]  [<ffffffffa0867576>] lprocfs_obd_cleanup+0x56/0x70 [obdclass]
03:29:11:[ 2015.271044]  [<ffffffffa121fcc7>] osc_precleanup+0xe7/0x2c0 [osc]
03:29:11:[ 2015.273699]  [<ffffffffa087af71>] class_cleanup+0x2a1/0xcf0 [obdclass]
03:29:11:[ 2015.276353]  [<ffffffffa087d9c2>] class_process_config+0x1992/0x23f0 [obdclass]
03:29:11:[ 2015.279025]  [<ffffffffa0676b97>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
03:29:11:[ 2015.281615]  [<ffffffffa087e5e6>] class_manual_cleanup+0x1c6/0x710 [obdclass]
03:29:11:[ 2015.284219]  [<ffffffffa10809d2>] lov_putref+0x302/0xa80 [lov]
03:29:11:[ 2015.286676]  [<ffffffffa1087d92>] lov_disconnect+0x172/0x420 [lov]
03:29:11:[ 2015.289144]  [<ffffffffa114e903>] obd_disconnect+0xb3/0x330 [lustre]
03:29:11:[ 2015.291579]  [<ffffffffa1151d40>] ll_put_super+0x610/0xaa0 [lustre]
03:29:11:[ 2015.293974]  [<ffffffff81138fcd>] ? call_rcu_sched+0x1d/0x20
03:29:11:[ 2015.296288]  [<ffffffffa117c40c>] ? ll_destroy_inode+0x1c/0x20 [lustre]
03:29:11:[ 2015.298636]  [<ffffffff8121a8f8>] ? destroy_inode+0x38/0x60
03:29:11:[ 2015.300860]  [<ffffffff8121aa26>] ? evict+0x106/0x170
03:29:11:[ 2015.302980]  [<ffffffff8121aace>] ? dispose_list+0x3e/0x50
03:29:11:[ 2015.305140]  [<ffffffff8121b724>] ? evict_inodes+0x114/0x140
03:29:11:[ 2015.307278]  [<ffffffff81200f72>] generic_shutdown_super+0x72/0xf0
03:29:11:[ 2015.309410]  [<ffffffff81201342>] kill_anon_super+0x12/0x20
03:29:11:[ 2015.311467]  [<ffffffffa0880ef5>] lustre_kill_super+0x45/0x50 [obdclass]
03:29:11:[ 2015.313564]  [<ffffffff812016f9>] deactivate_locked_super+0x49/0x60
03:29:11:[ 2015.315603]  [<ffffffff81201cf6>] deactivate_super+0x46/0x60
03:29:11:[ 2015.317548]  [<ffffffff8121f145>] mntput_no_expire+0xc5/0x120
03:29:11:[ 2015.319469]  [<ffffffff81220280>] SyS_umount+0xa0/0x3b0
03:29:11:[ 2015.321302]  [<ffffffff81697a89>] system_call_fastpath+0x16/0x1b
03:29:11:[ 2015.323160] ---[ end trace efe121d40f7934b5 ]---
03:29:11:[ 2015.324909] BUG: unable to handle kernel NULL pointer dereference at           (null)
03:29:11:[ 2015.325882] IP: [<ffffffff81333b9b>] __list_add+0x1b/0xc0
03:29:11:[ 2015.325882] PGD 0 
03:29:11:[ 2015.325882] Oops: 0000 [#1] SMP 
03:29:11:[ 2015.325882] Modules linked in: osc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr virtio_balloon i2c_piix4 parport_pc parport nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi 8139too virtio_blk cirrus crct10dif_pclmul crct10dif_common drm_kms_helper syscopyarea crc32c_intel sysfillrect sysimgblt serio_raw fb_sys_fops 8139cp mii virtio_pci virtio_ring virtio ata_piix ttm drm libata i2c_core floppy
03:29:11:[ 2015.325882] CPU: 1 PID: 6679 Comm: umount Tainted: G        W  OE  ------------   3.10.0-514.26.2.el7_lustre.x86_64 #1
03:29:11:[ 2015.325882] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
03:29:11:[ 2015.325882] task: ffff880054f1ce70 ti: ffff880062810000 task.ti: ffff880062810000
03:29:11:[ 2015.325882] RIP: 0010:[<ffffffff81333b9b>]  [<ffffffff81333b9b>] __list_add+0x1b/0xc0
03:29:11:[ 2015.325882] RSP: 0018:ffff8800628139d8  EFLAGS: 00010046
03:29:11:[ 2015.325882] RAX: ffff880062813a00 RBX: ffff880062813a18 RCX: ffff880054f1ce70
03:29:11:[ 2015.325882] RDX: ffff880062e45f30 RSI: 0000000000000000 RDI: ffff880062813a18
03:29:11:[ 2015.325882] RBP: ffff8800628139f0 R08: 0000000000000000 R09: 0000000000000000
03:29:11:[ 2015.325882] R10: 000000000000b542 R11: 0000000000000001 R12: ffff880062e45f30
03:29:11:[ 2015.325882] R13: 0000000000000000 R14: ffff880054f1ce70 R15: ffff880063521c00
03:29:11:[ 2015.325882] FS:  00007f9b5fa31880(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
03:29:11:[ 2015.325882] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
03:29:11:[ 2015.325882] CR2: 0000000000000000 CR3: 000000003c785000 CR4: 00000000000406e0
03:29:11:[ 2015.325882] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
03:29:11:[ 2015.325882] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
03:29:11:[ 2015.325882] Stack:
03:29:11:[ 2015.325882]  ffff880062e45f20 ffff880062e45f28 7fffffffffffffff ffff880062813a50
03:29:11:[ 2015.325882]  ffffffff8168ce1b 0000000000000001 ffff880054f1ce70 ffffffff810c54e0
03:29:11:[ 2015.325882]  0000000000000000 0000000000000000 000000004f95df6b ffff880062e44f60
03:29:11:[ 2015.325882] Call Trace:
03:29:11:[ 2015.325882]  [<ffffffff8168ce1b>] wait_for_completion+0xeb/0x170
03:29:11:[ 2015.325882]  [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20
03:29:11:[ 2015.325882]  [<ffffffffa0867582>] lprocfs_obd_cleanup+0x62/0x70 [obdclass]
03:29:11:[ 2015.325882]  [<ffffffffa121fcc7>] osc_precleanup+0xe7/0x2c0 [osc]
03:29:11:[ 2015.325882]  [<ffffffffa087af71>] class_cleanup+0x2a1/0xcf0 [obdclass]
03:29:11:[ 2015.325882]  [<ffffffffa087d9c2>] class_process_config+0x1992/0x23f0 [obdclass]
03:29:11:[ 2015.325882]  [<ffffffffa0676b97>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
03:29:11:[ 2015.325882]  [<ffffffffa087e5e6>] class_manual_cleanup+0x1c6/0x710 [obdclass]
03:29:11:[ 2015.325882]  [<ffffffffa10809d2>] lov_putref+0x302/0xa80 [lov]
03:29:11:[ 2015.325882]  [<ffffffffa1087d92>] lov_disconnect+0x172/0x420 [lov]
03:29:11:[ 2015.325882]  [<ffffffffa114e903>] obd_disconnect+0xb3/0x330 [lustre]
03:29:11:[ 2015.325882]  [<ffffffffa1151d40>] ll_put_super+0x610/0xaa0 [lustre]
03:29:11:[ 2015.325882]  [<ffffffff81138fcd>] ? call_rcu_sched+0x1d/0x20
03:29:11:[ 2015.325882]  [<ffffffffa117c40c>] ? ll_destroy_inode+0x1c/0x20 [lustre]
03:29:11:[ 2015.325882]  [<ffffffff8121a8f8>] ? destroy_inode+0x38/0x60
03:29:11:[ 2015.325882]  [<ffffffff8121aa26>] ? evict+0x106/0x170
03:29:11:[ 2015.325882]  [<ffffffff8121aace>] ? dispose_list+0x3e/0x50
03:29:11:[ 2015.325882]  [<ffffffff8121b724>] ? evict_inodes+0x114/0x140
03:29:11:[ 2015.325882]  [<ffffffff81200f72>] generic_shutdown_super+0x72/0xf0
03:29:11:[ 2015.325882]  [<ffffffff81201342>] kill_anon_super+0x12/0x20
03:29:11:[ 2015.325882]  [<ffffffffa0880ef5>] lustre_kill_super+0x45/0x50 [obdclass]
03:29:11:[ 2015.325882]  [<ffffffff812016f9>] deactivate_locked_super+0x49/0x60
03:29:11:[ 2015.325882]  [<ffffffff81201cf6>] deactivate_super+0x46/0x60
03:29:11:[ 2015.325882]  [<ffffffff8121f145>] mntput_no_expire+0xc5/0x120
03:29:11:[ 2015.325882]  [<ffffffff81220280>] SyS_umount+0xa0/0x3b0
03:29:11:[ 2015.325882]  [<ffffffff81697a89>] system_call_fastpath+0x16/0x1b
03:29:11:[ 2015.325882] Code: ff e9 3b ff ff ff b8 f4 ff ff ff e9 31 ff ff ff 55 48 89 e5 41 55 49 89 f5 41 54 49 89 d4 53 4c 8b 42 08 48 89 fb 49 39 f0 75 2a <4d> 8b 45 00 4d 39 c4 75 68 4c 39 e3 74 3e 4c 39 eb 74 39 49 89 
03:29:11:[ 2015.325882] RIP  [<ffffffff81333b9b>] __list_add+0x1b/0xc0
03:29:11:[ 2015.325882]  RSP <ffff8800628139d8>
03:29:11:[ 2015.325882] CR2: 0000000000000000
Comment by John Hammond [ 16/Aug/17 ]

Maybe LU-9791.

Comment by James Casper [ 16/Aug/17 ]

This could also be a dupe of LU-9469.

Comment by John Hammond [ 16/Aug/17 ]

Jim, LU-9469 is an unrelated issue. Compare the backtraces leading up to the crash in this issue with those in LU-9791 and LU-9469.

Comment by Oleg Drokin [ 17/Aug/17 ]

yeah, I vote on LU-9791

Generated at Sat Feb 10 02:30:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.