Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11222

parallel-scale-nfsv3 test racer_on_nfs crashes with ‘BUG: unable to handle kernel paging request at ffffffc09d0c20ff’

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • None
    • Lustre 2.10.5, Lustre 2.10.7, Lustre 2.15.0
    • 3
    • 9223372036854775807

    Description

      parallel-scale-nfsv3 test_racer_on_nfs crashes. The MDS console log and the kernel-crash log at https://testing.whamcloud.com/test_sets/2d3ea616-98f5-11e8-b0aa-52540065bddc have the same stack trace:

      [30173.104517] LustreError: 1490:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x200024df2:0x17c44:0x0] get parent: rc = -2
      [30173.105932] LustreError: 1490:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 1 previous similar message
      [30196.764494] LustreError: 1488:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x200024df2:0x17e63:0x0] get parent: rc = -2
      [30196.766036] LustreError: 1488:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 1 previous similar message
      [30207.660367] BUG: unable to handle kernel paging request at ffffffc09d0c20ff
      [30207.661195] IP: [<ffffffc09d0c20ff>] 0xffffffc09d0c20ff
      [30207.661746] PGD 32a12067 PUD 0 
      [30207.662121] Oops: 0010 [#1] SMP 
      [30207.665969] Modules linked in: nfsd nfs_acl osc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod iosf_mbi i2c_piix4 ppdev crc32_pclmul ghash_clmulni_intel virtio_balloon aesni_intel i2c_core lrw gf128mul joydev pcspkr glue_helper ablk_helper cryptd parport_pc parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk 8139too crct10dif_pclmul crct10dif_common ata_piix crc32c_intel libata serio_raw 8139cp mii virtio_pci virtio_ring virtio floppy [last unloaded: lnet_selftest]
      [30207.675994] CPU: 0 PID: 28926 Comm: mdt00_000 Kdump: loaded Tainted: G           OE  ------------   3.10.0-862.9.1.el7_lustre.x86_64 #1
      [30207.677274] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [30207.677887] task: ffff90e06779bf40 ti: ffff90e035688000 task.ti: ffff90e035688000
      [30207.678612] RIP: 0010:[<ffffffc09d0c20ff>]  [<ffffffc09d0c20ff>] 0xffffffc09d0c20ff
      [30207.679379] RSP: 0018:ffff90e07fc03eb8  EFLAGS: 00010286
      [30207.679905] RAX: ffffffc09d0c20ff RBX: ffffffff9d273000 RCX: 0000000003b861f2
      [30207.680591] RDX: ffff90e05c09fc27 RSI: ffff90e05fea8200 RDI: ffff90e05c09fc27
      [30207.681286] RBP: ffff90e07fc03f10 R08: 000000000001ba80 R09: ffffffff9c74b2fc
      [30207.681976] R10: ffff90e07fc1ba80 R11: ffffd7c3c1a28f40 R12: 000000000000000a
      [30207.682663] R13: 0000000000000005 R14: ff90e05fea9e28ff R15: ffff90e07fc14340
      [30207.683355] FS:  0000000000000000(0000) GS:ffff90e07fc00000(0000) knlGS:0000000000000000
      [30207.684139] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [30207.684696] CR2: ffffffc09d0c20ff CR3: 0000000077eb8000 CR4: 00000000000606f0
      [30207.685398] Call Trace:
      [30207.685677]  <IRQ> 
      [30207.716757]  [<ffffffff9c74b2b0>] ? rcu_process_callbacks+0x1e0/0x580
      [30207.718073]  [<ffffffff9c69b085>] __do_softirq+0xf5/0x280
      [30207.719573]  [<ffffffff9cd23cec>] call_softirq+0x1c/0x30
      [30207.720706]  [<ffffffff9c62d625>] do_softirq+0x65/0xa0
      [30207.721233]  [<ffffffff9c69b405>] irq_exit+0x105/0x110
      [30207.721743]  [<ffffffff9cd25068>] smp_apic_timer_interrupt+0x48/0x60
      [30207.722396]  [<ffffffff9cd217b2>] apic_timer_interrupt+0x162/0x170
      [30207.723013]  <EOI> 
      [30207.726126]  [<ffffffff9c95ae40>] ? memcpy+0x10/0x110
      [30207.726693]  [<ffffffff9c958194>] ? vsnprintf+0x234/0x6a0
      [30207.727282]  [<ffffffffc0763305>] libcfs_debug_vmsg2+0x2f5/0xb40 [libcfs]
      [30207.727973]  [<ffffffffc0763ba7>] libcfs_debug_msg+0x57/0x80 [libcfs]
      [30207.728607]  [<ffffffff9c74a39d>] ? call_rcu_sched+0x1d/0x20
      [30207.757212]  [<ffffffffc0bf0432>] ldlm_handle_enqueue0+0x2b2/0x16a0 [ptlrpc]
      [30207.758514]  [<ffffffffc0c18e00>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
      [30207.765482]  [<ffffffffc0c76452>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [30207.766199]  [<ffffffffc0c7a38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [30207.766932]  [<ffffffffc0c22e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [30207.767745]  [<ffffffff9c6c52ab>] ? __wake_up_common+0x5b/0x90
      [30207.768357]  [<ffffffffc0c26592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [30207.769008]  [<ffffffffc0c25b00>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
      [30207.769743]  [<ffffffff9c6bb621>] kthread+0xd1/0xe0
      [30207.770236]  [<ffffffff9c6bb550>] ? insert_kthread_work+0x40/0x40
      [30207.770840]  [<ffffffff9cd205f7>] ret_from_fork_nospec_begin+0x21/0x21
      [30207.771487]  [<ffffffff9c6bb550>] ? insert_kthread_work+0x40/0x40
      [30207.772094] Code:  Bad RIP value.
      [30207.772483] RIP  [<ffffffc09d0c20ff>] 0xffffffc09d0c20ff
      [30207.773039]  RSP <ffff90e07fc03eb8>
      [30207.773390] CR2: ffffffc09d0c20ff
      

      There are many instances of racer_on_nfs crashing recently, but I haven’t been able to find one with a matching stack trace.

      For this crash, we are testing RHEL 7.5 servers with ldiskfs targets and SLES 12 SP3 clients

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: