Details
-
Bug
-
Resolution: Won't Fix
-
Minor
-
None
-
Lustre 2.12.0
-
ZFS
-
3
-
9223372036854775807
Description
parallel-scale-nfsv3 test_racer_on_nfs crashes the MDS. Looking at the MDS1, 3 (vm4) console log for https://testing.whamcloud.com/test_sets/2f3b07b8-fd9d-11e8-b837-52540065bddc , we see the stack trace
[103333.477207] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 22:10:23 (1544566223) [103335.041254] LustreError: 7416:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x200022ac9:0x175ef:0x0] get parent: rc = -2 [103344.816370] LustreError: 7419:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x200022ac9:0x1760c:0x0] get parent: rc = -2 … [103545.019234] LustreError: 7421:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x200022ac9:0x17ec8:0x0] get parent: rc = -2 [103545.020769] LustreError: 7421:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 25 previous similar messages [103605.968995] BUG: unable to handle kernel paging request at ffffffc0ad8da0ff [103605.970132] IP: [<ffffffc0ad8da0ff>] 0xffffffc0ad8da0ff [103605.970689] PGD 5a414067 PUD 0 [103605.971142] Oops: 0010 [#1] SMP [103605.971605] Modules linked in: nfsd nfs_acl lustre(OE) obdecho(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) ptlrpc_gss(OE) ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) zfs(POE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod zunicode(POE) zavl(POE) icp(POE) ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel zcommon(POE) znvpair(POE) spl(OE) aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr parport_pc parport virtio_balloon i2c_piix4 ip_tables ext4 mbcache jbd2 virtio_blk ata_generic pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel serio_raw floppy ata_piix libata 8139too virtio_pci virtio_ring virtio 8139cp mii [last unloaded: lnet_selftest] [103605.986800] CPU: 1 PID: 26259 Comm: mdt00_002 Kdump: loaded Tainted: P W OE ------------ 3.10.0-957.el7_lustre.x86_64 #1 [103605.987936] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [103605.988497] task: ffffa0575e598000 ti: ffffa0574d64c000 task.ti: ffffa0574d64c000 [103605.989218] RIP: 0010:[<ffffffc0ad8da0ff>] [<ffffffc0ad8da0ff>] 0xffffffc0ad8da0ff [103605.990063] RSP: 0018:ffffa0577fd03eb8 EFLAGS: 00010286 [103605.990616] RAX: ffffffc0ad8da0ff RBX: ffffffffb12784c0 RCX: ffffa0577fd162a0 [103605.991311] RDX: ffffa05719918027 RSI: ffffa0577ad01b28 RDI: ffffa05719918027 [103605.992010] RBP: ffffa0577fd03f10 R08: ffffa0575d255210 R09: ffffa0577fd162a0 [103605.992700] R10: ffffffffb12784c0 R11: ffffa0577fd03de8 R12: 000000000000000a [103605.993395] R13: 0000000000000000 R14: ffa057140e16a8ff R15: ffffa0577fd162c0 [103605.994084] FS: 0000000000000000(0000) GS:ffffa0577fd00000(0000) knlGS:0000000000000000 [103605.994857] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [103605.995424] CR2: ffffffc0ad8da0ff CR3: 000000007a856000 CR4: 00000000000606e0 [103605.996116] Call Trace: [103605.996383] <IRQ> [103605.996618] [<ffffffffb0754940>] ? rcu_process_callbacks+0x1e0/0x580 [103605.997304] [<ffffffffb06a0f05>] __do_softirq+0xf5/0x280 [103605.997854] [<ffffffffb0d7832c>] call_softirq+0x1c/0x30 [103605.998397] [<ffffffffb062e675>] do_softirq+0x65/0xa0 [103605.998942] [<ffffffffb06a1285>] irq_exit+0x105/0x110 [103605.999453] [<ffffffffb0d796c8>] smp_apic_timer_interrupt+0x48/0x60 [103606.000087] [<ffffffffb0d75df2>] apic_timer_interrupt+0x162/0x170 [103606.000749] <EOI> [103606.001073] [<ffffffffc15eda95>] ? dbuf_find+0x1d5/0x1e0 [zfs] [103606.001710] [<ffffffffb09866cd>] ? memcpy+0xd/0x110 [103606.002210] [<ffffffffb0982b84>] ? vsnprintf+0x234/0x6a0 [103606.002778] [<ffffffffc0343675>] libcfs_debug_vmsg2+0x2f5/0xb30 [libcfs] [103606.003467] [<ffffffffc15f0111>] ? dbuf_rele_and_unlock+0x371/0x4b0 [zfs] [103606.004154] [<ffffffffb0d66e72>] ? down_read+0x12/0x40 [103606.004668] [<ffffffffb0d65e12>] ? mutex_lock+0x12/0x2f [103606.005363] [<ffffffffc0c8b84b>] _ldlm_lock_debug+0x52b/0x750 [ptlrpc] [103606.006047] [<ffffffffc0c8e6c0>] ldlm_lock_addref_internal_nolock+0x80/0x100 [ptlrpc] [103606.006853] [<ffffffffc0caccbc>] ldlm_cli_enqueue_local+0x12c/0x870 [ptlrpc] [103606.007578] [<ffffffffc0caba80>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc] [103606.008396] [<ffffffffc0f6d7d0>] ? mdt_object_alloc+0x2c0/0x2c0 [mdt] [103606.009055] [<ffffffffc0f7d4ab>] mdt_object_local_lock+0x50b/0xb20 [mdt] [103606.009728] [<ffffffffc0f6d7d0>] ? mdt_object_alloc+0x2c0/0x2c0 [mdt] [103606.010385] [<ffffffffc0caba80>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc] [103606.011159] [<ffffffffc0348a90>] ? cfs_hash_nl_unlock+0x10/0x10 [libcfs] [103606.011829] [<ffffffffc0f7db30>] mdt_object_lock_internal+0x70/0x3e0 [mdt] [103606.012526] [<ffffffffc0f7df57>] mdt_object_lock_try+0x27/0xb0 [mdt] [103606.013169] [<ffffffffc0f7f697>] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] [103606.013914] [<ffffffffc0cdf2f7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] [103606.014594] [<ffffffffc0d06d9f>] ? __req_capsule_get+0x15f/0x740 [ptlrpc] [103606.015295] [<ffffffffc0cdf5ac>] ? lustre_msg_get_flags+0x2c/0xa0 [ptlrpc] [103606.015984] [<ffffffffc0f86bb5>] mdt_intent_getattr+0x2b5/0x480 [mdt] [103606.016623] [<ffffffffc0f83a18>] mdt_intent_policy+0x2e8/0xd00 [mdt] [103606.017263] [<ffffffffc0f86900>] ? mdt_intent_layout+0xcc0/0xcc0 [mdt] [103606.017925] [<ffffffffc0c92ec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] [103606.018582] [<ffffffffc0348fa3>] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs] [103606.019292] [<ffffffffc034c72e>] ? cfs_hash_add+0xbe/0x1a0 [libcfs] [103606.019934] [<ffffffffc0cbb8a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] [103606.020643] [<ffffffffc0ce37f0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] [103606.021425] [<ffffffffc0d42302>] tgt_enqueue+0x62/0x210 [ptlrpc] [103606.022109] [<ffffffffc0d4935a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [103606.022784] [<ffffffffc0343f07>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [103606.023486] [<ffffffffc0ced92b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [103606.024325] [<ffffffffb06cba9b>] ? __wake_up_common+0x5b/0x90 [103606.024927] [<ffffffffc0cf125c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [103606.025569] [<ffffffffc0cf0760>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [103606.026298] [<ffffffffb06c1c31>] kthread+0xd1/0xe0 [103606.026782] [<ffffffffb06c1b60>] ? insert_kthread_work+0x40/0x40 [103606.027389] [<ffffffffb0d74c37>] ret_from_fork_nospec_begin+0x21/0x21 [103606.028026] [<ffffffffb06c1b60>] ? insert_kthread_work+0x40/0x40 [103606.028611] Code: Bad RIP value. [103606.029005] RIP [<ffffffc0ad8da0ff>] 0xffffffc0ad8da0ff [103606.029550] RSP <ffffa0577fd03eb8> [103606.029903] CR2: ffffffc0ad8da0ff
There are several tickets open for racer_on_nfs with similar, but not exactly the same stack traces.
I can’t find another racer_on_nfs crash like this in the past month.