Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11812

parallel-scale-nfsv4 test racer_on_nfs crashes with “BUG: unable to handle kernel NULL pointer dereference”

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.2
    • None
    • 2.11.0 servers with 2.12.0 RC2 clients
      2.12.53.1 servers with 2.12.1 clients
    • 3
    • 9223372036854775807

    Description

      parallel-scale-nfsv4 test_racer_on_nfs client crashes for 2.11.0 servers and 2.12.0 RC2 clients.

      Looking at the logs at https://testing.whamcloud.com/test_sets/d535d716-fd79-11e8-a97c-52540065bddc, from client 1 (vm10) we can see the stack trace

       [47194.303686] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 18:06:33 (1544551593)
      [47194.487648] Lustre: DEBUG MARKER: MDSCOUNT=1 OSTCOUNT=7 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/d0.parallel-scale-nfs
      [47277.681873] 2[25283]: segfault at 8 ip 00007f3fedceb718 sp 00007ffdbca831f0 error 4 in ld-2.17.so[7f3fedce0000+22000]
      [47414.009897] 15[3823]: segfault at 0 ip 00000000004043e0 sp 00007ffca5fa64e8 error 6 in 15[400000+6000]
      [47419.350008] 14[17578]: segfault at 8 ip 00007ff5d6baa718 sp 00007ffecd59c0f0 error 4 in ld-2.17.so[7ff5d6b9f000+22000]
      [47427.166966] 9[4070]: segfault at 8 ip 00007fe542938718 sp 00007ffeb0050000 error 4 in ld-2.17.so[7fe54292d000+22000]
      [47450.889695] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
      [47450.890808] IP: [<ffffffffc073d6a6>] nfs_advise_use_readdirplus+0x6/0x40 [nfs]
      [47450.891637] PGD 800000005bd1c067 PUD 7c171067 PMD 0 
      [47450.892238] Oops: 0000 [#1] SMP 
      [47450.892634] Modules linked in: nfsv3 nfs_acl mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel sunrpc ppdev aesni_intel pcspkr joydev lrw gf128mul glue_helper ablk_helper cryptd virtio_balloon i2c_piix4 parport_pc parport ip_tables ext4 mbcache jbd2 virtio_blk ata_generic pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel serio_raw floppy ata_piix libata 8139too virtio_pci virtio_ring virtio 8139cp mii [last unloaded: lnet_selftest]
      [47450.901754] CPU: 0 PID: 20258 Comm: rm Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.el7.x86_64 #1
      [47450.902787] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [47450.903355] task: ffff8d5a828f0000 ti: ffff8d5a8ca50000 task.ti: ffff8d5a8ca50000
      [47450.904097] RIP: 0010:[<ffffffffc073d6a6>]  [<ffffffffc073d6a6>] nfs_advise_use_readdirplus+0x6/0x40 [nfs]
      [47450.905052] RSP: 0018:ffff8d5a8ca53df8  EFLAGS: 00010246
      [47450.905560] RAX: ffff8d5aeca26000 RBX: ffff8d5acf009640 RCX: ffffff8000000000
      [47450.906225] RDX: ffffff8100000000 RSI: ffffff8100000000 RDI: 0000000000000000
      [47450.906903] RBP: ffff8d5a8ca53e40 R08: 0000000000000001 R09: 0000000000000000
      [47450.907579] R10: 00007ffdb2737ca0 R11: 0000000000000246 R12: ffff8d5aeca26000
      [47450.908233] R13: ffff8d5a80c236c0 R14: ffff8d5afa56d6a0 R15: ffff8d5a8ca53ec0
      [47450.908910] FS:  00007f35ee41e740(0000) GS:ffff8d5affc00000(0000) knlGS:0000000000000000
      [47450.909721] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [47450.910284] CR2: 0000000000000028 CR3: 000000007b4f6000 CR4: 00000000000606f0
      [47450.911063] Call Trace:
      [47450.911352]  [<ffffffffc0743d19>] ? nfs_getattr+0xf9/0x250 [nfs]
      [47450.911987]  [<ffffffffa2246aa9>] vfs_getattr+0x49/0x80
      [47450.912539]  [<ffffffffa2246b25>] vfs_fstat+0x45/0x80
      [47450.913081]  [<ffffffffa2247094>] SYSC_newfstat+0x24/0x60
      [47450.913687]  [<ffffffffa2774d21>] ? system_call_after_swapgs+0xae/0x146
      [47450.914387]  [<ffffffffa2774d15>] ? system_call_after_swapgs+0xa2/0x146
      [47450.915054]  [<ffffffffa2774d21>] ? system_call_after_swapgs+0xae/0x146
      [47450.915724]  [<ffffffffa2774d15>] ? system_call_after_swapgs+0xa2/0x146
      [47450.916373]  [<ffffffffa2774d21>] ? system_call_after_swapgs+0xae/0x146
      [47450.917022]  [<ffffffffa2774d15>] ? system_call_after_swapgs+0xa2/0x146
      [47450.917664]  [<ffffffffa2774d21>] ? system_call_after_swapgs+0xae/0x146
      [47450.918279]  [<ffffffffa2774d15>] ? system_call_after_swapgs+0xa2/0x146
      [47450.918908]  [<ffffffffa2774d21>] ? system_call_after_swapgs+0xae/0x146
      [47450.919551]  [<ffffffffa2774d15>] ? system_call_after_swapgs+0xa2/0x146
      [47450.920171]  [<ffffffffa2774d21>] ? system_call_after_swapgs+0xae/0x146
      [47450.920798]  [<ffffffffa224746e>] SyS_newfstat+0xe/0x10
      [47450.921293]  [<ffffffffa2774ddb>] system_call_fastpath+0x22/0x27
      [47450.921879]  [<ffffffffa2774d21>] ? system_call_after_swapgs+0xae/0x146
      [47450.922527] Code: 89 8d 60 ff ff ff e8 81 df 01 e2 8b 8d 60 ff ff ff e9 02 fe ff ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 <48> 8b 47 28 48 89 e5 48 8b 80 50 03 00 00 f6 80 fc 02 00 00 01 
      [47450.925633] RIP  [<ffffffffc073d6a6>] nfs_advise_use_readdirplus+0x6/0x40 [nfs]
      [47450.926354]  RSP <ffff8d5a8ca53df8>
      [47450.926704] CR2: 0000000000000028
      

      Similar issue at https://testing.whamcloud.com/test_sets/56a97530-beb1-11e8-b143-52540065bddc , with the following in the client 1 (vm5) console

      [ 4879.403401] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 21:00:54 (1537650054)
      [ 4879.583288] Lustre: DEBUG MARKER: MDSCOUNT=1 OSTCOUNT=7 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/d0.parallel-scale-nfs
      [ 4951.358555] 13[22550]: segfault at 8 ip 00007f1a4c2b6958 sp 00007fff9954d570 error 4 in ld-2.17.so[7f1a4c2ab000+22000]
      [ 4964.483018] 1[23397]: segfault at 8 ip 00007f9b1355a958 sp 00007ffdc00ef9f0 error 4 in ld-2.17.so[7f9b1354f000+22000]
      [ 5088.407853] 5[30393]: segfault at 8 ip 00007f310575e958 sp 00007ffdbc27b700 error 4 in ld-2.17.so[7f3105753000+22000]
      [ 5103.960350] ------------[ cut here ]------------
      [ 5103.960967] kernel BUG at fs/dcache.c:661!
      [ 5103.961381] invalid opcode: 0000 [#1] SMP 
      [ 5103.961939] Modules linked in: nfsv3 nfs_acl lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc ppdev iosf_mbi i2c_piix4 i2c_core crc32_pclmul pcspkr joydev ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd parport_pc virtio_balloon parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common 8139too crc32c_intel serio_raw virtio_pci 8139cp virtio_ring virtio mii floppy
      [ 5103.970783] CPU: 1 PID: 2704 Comm: cp Kdump: loaded Tainted: G           OE  ------------   3.10.0-862.9.1.el7.x86_64 #1
      [ 5103.971789] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 5103.972336] task: ffff8c90f9fd8000 ti: ffff8c90b5e9c000 task.ti: ffff8c90b5e9c000
      [ 5103.973032] RIP: 0010:[<ffffffffb5033bd2>]  [<ffffffffb5033bd2>] dget_parent+0x72/0x80
      [ 5103.973808] RSP: 0018:ffff8c90b5e9fde0  EFLAGS: 00010246
      [ 5103.974305] RAX: 0000000000000000 RBX: ffff8c90f859acc0 RCX: 0000000000000000
      [ 5103.974968] RDX: 0000000000000000 RSI: 0000000100000000 RDI: ffff8c90f859ad18
      [ 5103.975633] RBP: ffff8c90b5e9fdf8 R08: 0000000000000000 R09: 90e45f4528000000
      [ 5103.976302] R10: 00007ffeb1207460 R11: 0000000000000246 R12: ffff8c90f873e240
      [ 5103.976973] R13: ffff8c90f859ad18 R14: ffff8c90fbdfb9a0 R15: ffff8c90b5e9fec0
      [ 5103.977640] FS:  00007fb39631c840(0000) GS:ffff8c90ffd00000(0000) knlGS:0000000000000000
      [ 5103.978391] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 5103.978926] CR2: 0000000000406538 CR3: 000000007a4b2000 CR4: 00000000000606e0
      [ 5103.979600] Call Trace:
      [ 5103.979885]  [<ffffffffc07bbcad>] nfs_getattr+0xed/0x250 [nfs]
      [ 5103.980453]  [<ffffffffb5020e09>] vfs_getattr+0x49/0x80
      [ 5103.980943]  [<ffffffffb5020e85>] vfs_fstat+0x45/0x80
      [ 5103.981423]  [<ffffffffb50215a4>] SYSC_newfstat+0x24/0x60
      [ 5103.981927]  [<ffffffffb502ccdd>] ? putname+0x3d/0x60
      [ 5103.982528]  [<ffffffffb55206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 5103.983146]  [<ffffffffb55206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 5103.983775]  [<ffffffffb55206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 5103.984418]  [<ffffffffb55206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 5103.985048]  [<ffffffffb55206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 5103.985668]  [<ffffffffb55206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 5103.986277]  [<ffffffffb55206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 5103.986895]  [<ffffffffb55206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 5103.987518]  [<ffffffffb55206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 5103.988137]  [<ffffffffb55206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 5103.988759]  [<ffffffffb55206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 5103.989382]  [<ffffffffb502179e>] SyS_newfstat+0xe/0x10
      [ 5103.989880]  [<ffffffffb5520795>] system_call_fastpath+0x1c/0x21
      [ 5103.990451]  [<ffffffffb55206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 5103.991062] Code: 4c 89 ef e8 71 2c 4e 00 49 3b 5c 24 18 75 1e 8b 53 5c 85 d2 74 15 83 c2 01 4c 89 ef 89 53 5c ff 14 25 d0 07 a3 b5 48 89 d8 eb bd <0f> 0b 4c 89 ef ff 14 25 d0 07 a3 b5 eb be 66 66 66 66 90 55 48 
      [ 5103.994142] RIP  [<ffffffffb5033bd2>] dget_parent+0x72/0x80
      [ 5103.994694]  RSP <ffff8c90b5e9fde0>
      

      Similar crashes at
      https://testing.whamcloud.com/test_sets/501f81da-dc27-11e8-b46b-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: