Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14434

parallel-scale-nfsv4 test compilebench crashes in qmt_id_lock_cb

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.14.0
    • 3
    • 9223372036854775807

    Description

      We’ve only seen this crash twice; only in parallel-scale-nfsv4 test compilebench:
      2021-01-22: x86_64 clients - https://testing.whamcloud.com/test_sets/68ed07f9-eb1d-459c-b327-269bd996d449
      2021-02-11: ARM clients - https://testing.whamcloud.com/test_sets/6aae8467-c5e3-4547-aefa-04f220cf4042

      Looking at the first failure above, we see in the kernel-crash

      [47446.646161] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test compilebench: compilebench ============================================== 19:11:27 (1611342687)
      [47447.194275] Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.parallel-scale-nfs\/d0.compilebench.1394887 -i 2         -r 2 --makej
      [47447.620295] Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.parallel-scale-nfs/d0.compilebench.1394887 -i 2 -r 2 --makej
      [48153.651976] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [48153.653435] PGD 0 P4D 0 
      [48153.653864] Oops: 0000 [#1] SMP PTI
      [48153.654527] CPU: 0 PID: 1485996 Comm: qmt_reba_lustre Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-240.1.1.el8_lustre.x86_64 #1
      [48153.656584] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [48153.657600] RIP: 0010:qmt_id_lock_cb+0x69/0x100 [lquota]
      [48153.658462] Code: 48 8b 53 20 8b 4a 0c 85 c9 74 74 89 c1 48 8b 42 18 83 78 10 02 75 0a 83 e1 01 b8 01 00 00 00 74 17 48 63 44 24 04 48 c1 e0 04 <48> 03 45 00 f6 40 08 0c 0f 95 c0 0f b6 c0 48 8b 4c 24 08 65 48 33
      [48153.661475] RSP: 0018:ffffbf43c0c5bde8 EFLAGS: 00010246
      [48153.662317] RAX: 0000000000000000 RBX: ffff9fbe4b55e000 RCX: 0000000000000000
      [48153.663454] RDX: ffff9fbe71e8f7a0 RSI: 0000000000000000 RDI: ffff9fbe47c2e862
      [48153.664587] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000004
      [48153.665717] R10: 0000000000000010 R11: f000000000000000 R12: ffff9fbe4b55e000
      [48153.666855] R13: ffff9fbe3133be60 R14: ffff9fbe4ebacb98 R15: ffff9fbe4ebacb40
      [48153.667999] FS:  0000000000000000(0000) GS:ffff9fbe7fc00000(0000) knlGS:0000000000000000
      [48153.669283] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [48153.670208] CR2: 0000000000000000 CR3: 0000000059c0a002 CR4: 00000000000606f0
      [48153.671348] Call Trace:
      [48153.671787]  ? cfs_cdebug_show.part.2.constprop.22+0x20/0x20 [lquota]
      [48153.672831]  qmt_glimpse_lock.isra.19+0x27e/0xfb0 [lquota]
      [48153.673726]  qmt_reba_thread+0x5da/0x9b0 [lquota]
      [48153.674503]  ? qmt_glimpse_lock.isra.19+0xfb0/0xfb0 [lquota]
      [48153.675454]  kthread+0x112/0x130
      [48153.676009]  ? kthread_flush_work_fn+0x10/0x10
      [48153.676745]  ret_from_fork+0x35/0x40
      [48153.677349] Modules linked in: nfsd nfs_acl lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul dm_mod ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 ip_tables ext4 mbcache jbd2 ata_generic 8139too ata_piix crc32c_intel libata serio_raw 8139cp mii virtio_blk [last unloaded: dm_flakey]
      [48153.688461] CR2: 0000000000000000
      

      Attachments

        Issue Links

          Activity

            People

              scherementsev Sergey Cheremencev
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: