Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11766

parallel-scale-nfsv3 test racer_on_nfs crash MDS

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: Lustre 2.12.0
    • Fix Version/s: None
    • Labels:
    • Environment:
      ZFS
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      parallel-scale-nfsv3 test_racer_on_nfs crashes the MDS. Looking at the MDS1, 3 (vm4) console log for https://testing.whamcloud.com/test_sets/2f3b07b8-fd9d-11e8-b837-52540065bddc , we see the stack trace

      [103333.477207] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 22:10:23 (1544566223)
      [103335.041254] LustreError: 7416:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x200022ac9:0x175ef:0x0] get parent: rc = -2
      [103344.816370] LustreError: 7419:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x200022ac9:0x1760c:0x0] get parent: rc = -2
      …
      [103545.019234] LustreError: 7421:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x200022ac9:0x17ec8:0x0] get parent: rc = -2
      [103545.020769] LustreError: 7421:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 25 previous similar messages
      [103605.968995] BUG: unable to handle kernel paging request at ffffffc0ad8da0ff
      [103605.970132] IP: [<ffffffc0ad8da0ff>] 0xffffffc0ad8da0ff
      [103605.970689] PGD 5a414067 PUD 0 
      [103605.971142] Oops: 0010 [#1] SMP 
      [103605.971605] Modules linked in: nfsd nfs_acl lustre(OE) obdecho(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) ptlrpc_gss(OE) ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) zfs(POE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod zunicode(POE) zavl(POE) icp(POE) ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel zcommon(POE) znvpair(POE) spl(OE) aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr parport_pc parport virtio_balloon i2c_piix4 ip_tables ext4 mbcache jbd2 virtio_blk ata_generic pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel serio_raw floppy ata_piix libata 8139too virtio_pci virtio_ring virtio 8139cp mii [last unloaded: lnet_selftest]
      [103605.986800] CPU: 1 PID: 26259 Comm: mdt00_002 Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-957.el7_lustre.x86_64 #1
      [103605.987936] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [103605.988497] task: ffffa0575e598000 ti: ffffa0574d64c000 task.ti: ffffa0574d64c000
      [103605.989218] RIP: 0010:[<ffffffc0ad8da0ff>]  [<ffffffc0ad8da0ff>] 0xffffffc0ad8da0ff
      [103605.990063] RSP: 0018:ffffa0577fd03eb8  EFLAGS: 00010286
      [103605.990616] RAX: ffffffc0ad8da0ff RBX: ffffffffb12784c0 RCX: ffffa0577fd162a0
      [103605.991311] RDX: ffffa05719918027 RSI: ffffa0577ad01b28 RDI: ffffa05719918027
      [103605.992010] RBP: ffffa0577fd03f10 R08: ffffa0575d255210 R09: ffffa0577fd162a0
      [103605.992700] R10: ffffffffb12784c0 R11: ffffa0577fd03de8 R12: 000000000000000a
      [103605.993395] R13: 0000000000000000 R14: ffa057140e16a8ff R15: ffffa0577fd162c0
      [103605.994084] FS:  0000000000000000(0000) GS:ffffa0577fd00000(0000) knlGS:0000000000000000
      [103605.994857] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [103605.995424] CR2: ffffffc0ad8da0ff CR3: 000000007a856000 CR4: 00000000000606e0
      [103605.996116] Call Trace:
      [103605.996383]  <IRQ> 
      [103605.996618]  [<ffffffffb0754940>] ? rcu_process_callbacks+0x1e0/0x580
      [103605.997304]  [<ffffffffb06a0f05>] __do_softirq+0xf5/0x280
      [103605.997854]  [<ffffffffb0d7832c>] call_softirq+0x1c/0x30
      [103605.998397]  [<ffffffffb062e675>] do_softirq+0x65/0xa0
      [103605.998942]  [<ffffffffb06a1285>] irq_exit+0x105/0x110
      [103605.999453]  [<ffffffffb0d796c8>] smp_apic_timer_interrupt+0x48/0x60
      [103606.000087]  [<ffffffffb0d75df2>] apic_timer_interrupt+0x162/0x170
      [103606.000749]  <EOI> 
      [103606.001073]  [<ffffffffc15eda95>] ? dbuf_find+0x1d5/0x1e0 [zfs]
      [103606.001710]  [<ffffffffb09866cd>] ? memcpy+0xd/0x110
      [103606.002210]  [<ffffffffb0982b84>] ? vsnprintf+0x234/0x6a0
      [103606.002778]  [<ffffffffc0343675>] libcfs_debug_vmsg2+0x2f5/0xb30 [libcfs]
      [103606.003467]  [<ffffffffc15f0111>] ? dbuf_rele_and_unlock+0x371/0x4b0 [zfs]
      [103606.004154]  [<ffffffffb0d66e72>] ? down_read+0x12/0x40
      [103606.004668]  [<ffffffffb0d65e12>] ? mutex_lock+0x12/0x2f
      [103606.005363]  [<ffffffffc0c8b84b>] _ldlm_lock_debug+0x52b/0x750 [ptlrpc]
      [103606.006047]  [<ffffffffc0c8e6c0>] ldlm_lock_addref_internal_nolock+0x80/0x100 [ptlrpc]
      [103606.006853]  [<ffffffffc0caccbc>] ldlm_cli_enqueue_local+0x12c/0x870 [ptlrpc]
      [103606.007578]  [<ffffffffc0caba80>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc]
      [103606.008396]  [<ffffffffc0f6d7d0>] ? mdt_object_alloc+0x2c0/0x2c0 [mdt]
      [103606.009055]  [<ffffffffc0f7d4ab>] mdt_object_local_lock+0x50b/0xb20 [mdt]
      [103606.009728]  [<ffffffffc0f6d7d0>] ? mdt_object_alloc+0x2c0/0x2c0 [mdt]
      [103606.010385]  [<ffffffffc0caba80>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc]
      [103606.011159]  [<ffffffffc0348a90>] ? cfs_hash_nl_unlock+0x10/0x10 [libcfs]
      [103606.011829]  [<ffffffffc0f7db30>] mdt_object_lock_internal+0x70/0x3e0 [mdt]
      [103606.012526]  [<ffffffffc0f7df57>] mdt_object_lock_try+0x27/0xb0 [mdt]
      [103606.013169]  [<ffffffffc0f7f697>] mdt_getattr_name_lock+0x1287/0x1c30 [mdt]
      [103606.013914]  [<ffffffffc0cdf2f7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc]
      [103606.014594]  [<ffffffffc0d06d9f>] ? __req_capsule_get+0x15f/0x740 [ptlrpc]
      [103606.015295]  [<ffffffffc0cdf5ac>] ? lustre_msg_get_flags+0x2c/0xa0 [ptlrpc]
      [103606.015984]  [<ffffffffc0f86bb5>] mdt_intent_getattr+0x2b5/0x480 [mdt]
      [103606.016623]  [<ffffffffc0f83a18>] mdt_intent_policy+0x2e8/0xd00 [mdt]
      [103606.017263]  [<ffffffffc0f86900>] ? mdt_intent_layout+0xcc0/0xcc0 [mdt]
      [103606.017925]  [<ffffffffc0c92ec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]
      [103606.018582]  [<ffffffffc0348fa3>] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs]
      [103606.019292]  [<ffffffffc034c72e>] ? cfs_hash_add+0xbe/0x1a0 [libcfs]
      [103606.019934]  [<ffffffffc0cbb8a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
      [103606.020643]  [<ffffffffc0ce37f0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
      [103606.021425]  [<ffffffffc0d42302>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [103606.022109]  [<ffffffffc0d4935a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [103606.022784]  [<ffffffffc0343f07>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [103606.023486]  [<ffffffffc0ced92b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [103606.024325]  [<ffffffffb06cba9b>] ? __wake_up_common+0x5b/0x90
      [103606.024927]  [<ffffffffc0cf125c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
      [103606.025569]  [<ffffffffc0cf0760>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
      [103606.026298]  [<ffffffffb06c1c31>] kthread+0xd1/0xe0
      [103606.026782]  [<ffffffffb06c1b60>] ? insert_kthread_work+0x40/0x40
      [103606.027389]  [<ffffffffb0d74c37>] ret_from_fork_nospec_begin+0x21/0x21
      [103606.028026]  [<ffffffffb06c1b60>] ? insert_kthread_work+0x40/0x40
      [103606.028611] Code:  Bad RIP value.
      [103606.029005] RIP  [<ffffffc0ad8da0ff>] 0xffffffc0ad8da0ff
      [103606.029550]  RSP <ffffa0577fd03eb8>
      [103606.029903] CR2: ffffffc0ad8da0ff
      

      There are several tickets open for racer_on_nfs with similar, but not exactly the same stack traces.

      I can’t find another racer_on_nfs crash like this in the past month.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wc-triage WC Triage
                Reporter:
                jamesanunez James Nunez
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: