Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10541

parallel-scale-nfsv3 test racer_on_nfs hangs with BUG: unable to handle kernel NULL pointer dereference

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0, Lustre 2.10.4
    • Lustre 2.11.0, Lustre 2.10.4
    • None
    • SLES12 SP3 server/client
      and SLES12 SP2 server/client
    • 3
    • 9223372036854775807

    Description

      parallel-scale-nfsv3 test_racer_on_nfs hangs for SLES12 SP3 and SLES12 SP2 server/client only.

      From the MDS console:

      [76114.715893] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 06:10:32 \(1516371032\)
      [76114.743657] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 06:10:32 (1516371032)
      [76114.921643] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
      [76114.922573] IP: [<ffffffffa13ec095>] ll_xattr_set_common_4_3+0x5/0x10 [lustre]
      [76114.923403] PGD 0 
      [76114.923653] Oops: 0000 [#1] SMP 
      [76114.924045] Modules linked in: nfsd(E) nfs_acl(E) lustre(OEN) lmv(OEN) mdc(OEN) osc(OEN) lov(OEN) osp(OEN) mdd(OEN) lod(OEN) mdt(OEN) lfsck(OEN) mgs(OEN) mgc(OEN) osd_ldiskfs(OEN) ldiskfs(OEN) lquota(OEN) fid(OEN) fld(OEN) ksocklnd(OEN) ptlrpc(OEN) obdclass(OEN) lnet(OEN) libcfs(OEN) loop(E) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) fscache(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) rpcrdma(E) sunrpc(E) ib_isert(E) iscsi_target_mod(E) ib_iser(E) libiscsi(E) scsi_transport_iscsi(E) ib_srpt(E) target_core_mod(E) ib_srp(E) scsi_transport_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) configfs(E) ib_cm(E) iw_cm(E) ib_core(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) jitterentropy_rng(E) drbg(E) ansi_cprng(E) ppdev(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) 8139too(E) joydev(E) 8139cp(E) pcspkr(E) virtio_balloon(E) i2c_piix4(E) mii(E) parport_pc(E) parport(E) pvpanic(E) button(E) processor(E) ata_generic(E) ext4(E) crc16(E) jbd2(E) mbcache(E) ata_piix(E) ahci(E) libahci(E) virtio_blk(E) floppy(E) serio_raw(E) virtio_pci(E) virtio_ring(E) virtio(E) uhci_hcd(E) ehci_hcd(E) usbcore(E) usb_common(E) libata(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E) [last unloaded: lnet_selftest]
      [76114.938788] Supported: No, Unsupported modules are loaded
      [76114.939342] CPU: 0 PID: 28809 Comm: nfsd Tainted: G           OE   N  4.4.103-6.33_lustre-default #1
      [76114.940280] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [76114.940875] task: ffff88007c3254c0 ti: ffff88002c48c000 task.ti: ffff88002c48c000
      [76114.941635] RIP: 0010:[<ffffffffa13ec095>]  [<ffffffffa13ec095>] ll_xattr_set_common_4_3+0x5/0x10 [lustre]
      [76114.942668] RSP: 0018:ffff88002c48fd90  EFLAGS: 00010246
      [76114.943220] RAX: ffffffffa14094c0 RBX: 0000000000008000 RCX: 0000000000000000
      [76114.943945] RDX: ffffffffa140b05b RSI: 0000000000000000 RDI: ffffffffa14094c0
      [76114.944670] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
      [76114.945399] R10: 0000000000000000 R11: ffff88006a9320fb R12: ffff880065cfce50
      [76114.946129] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa140b05b
      [76114.946859] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
      [76114.947677] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [76114.948266] CR2: 0000000000000030 CR3: 000000007a2e6000 CR4: 00000000000406f0
      [76114.948994] Stack:
      [76114.949215]  ffffffffa13b499e 0000000000000000 ffff880065d1f000 ffff88002c48fe00
      [76114.950059]  ffff88003b8c0000 ffff88003b8c0008 ffff880065cfce50 ffff880078199900
      [76114.950904]  ffffffffa1479d8f ffff8800442dc000 ffff8800442dc000 ffff880044086018
      [76114.951816] Call Trace:
      [76114.952142]  [<ffffffffa13b499e>] ll_set_acl+0xee/0x370 [lustre]
      [76114.952790]  [<ffffffffa1479d8f>] nfsd3_proc_setacl+0x19f/0x260 [nfsd]
      [76114.953482]  [<ffffffffa1469e23>] nfsd_dispatch+0xc3/0x260 [nfsd]
      [76114.954139]  [<ffffffffa05811f8>] svc_process_common+0x418/0x6a0 [sunrpc]
      [76114.954843]  [<ffffffffa058157d>] svc_process+0xfd/0x1b0 [sunrpc]
      [76114.955483]  [<ffffffffa146989a>] nfsd+0xea/0x160 [nfsd]
      [76114.956044]  [<ffffffff8109bb57>] kthread+0xc7/0xe0
      [76114.956552]  [<ffffffff8160c67f>] ret_from_fork+0x3f/0x70
      [76114.958488] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
      [76114.959116] 
      [76114.959290] Leftover inexact backtrace:
      [76114.959290] 
      [76114.959848]  [<ffffffff8109ba90>] ? kthread_park+0x50/0x50
      [76114.960412] Code: c0 c7 05 63 3e 04 00 00 00 04 00 e8 96 d0 3b ff 48 c7 c7 c0 fe 42 a1 e8 7a 26 3b ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> 8b 76 30 e9 32 ed ff ff 66 90 66 66 66 66 90 41 55 41 54 55 
      [76114.963792] RIP  [<ffffffffa13ec095>] ll_xattr_set_common_4_3+0x5/0x10 [lustre]
      [76114.964576]  RSP <ffff88002c48fd90>
      [76114.964949] CR2: 0000000000000030
      

      In a different test session, https://testing.hpdd.intel.com/test_sets/bba5459c-fc9f-11e7-a10a-52540065bddc , we see the same hang, but there is a little more output in the MDS console log:

      [32570.101935] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 10:52:35 (1516301555)
      [32570.288540] LustreError: 18012:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x240000404:0x1:0x0] without object type: valid = 0x100000001
      [32570.288547] LustreError: 18012:0:(llite_lib.c:2354:ll_prep_inode()) new_inode -fatal: rc -12
      [32570.402366] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
      [32570.402366] IP: [<ffffffffa1461095>] ll_xattr_set_common_4_3+0x5/0x10 [lustre]
      

      I can confirm that this test started hanging with this NULL pointer dereference on at least 2018-01-16. This test hangs so often that I can’t review all test hangs to see exactly when this started happening.

      Logs for some of the failures are at
      https://testing.hpdd.intel.com/test_sets/ebd266f6-fac8-11e7-bd00-52540065bddc
      https://testing.hpdd.intel.com/test_sets/9dfe1932-fd36-11e7-a6ad-52540065bddc
      https://testing.hpdd.intel.com/test_sets/ce6de31a-fe15-11e7-a10a-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-10541] parallel-scale-nfsv3 test racer_on_nfs hangs with BUG: unable to handle kernel NULL pointer dereference

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31966/
            Subject: LU-10541 llite: setxattr directly in ll_set_acl
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 51774da37e27b696e506f76c82a3ae2507df1dc2

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31966/ Subject: LU-10541 llite: setxattr directly in ll_set_acl Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 51774da37e27b696e506f76c82a3ae2507df1dc2

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31966
            Subject: LU-10541 llite: setxattr directly in ll_set_acl
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: b95bcc9340522a04bacd62332423567040d2f553

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31966 Subject: LU-10541 llite: setxattr directly in ll_set_acl Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: b95bcc9340522a04bacd62332423567040d2f553
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31588/
            Subject: LU-10541 llite: setxattr directly in ll_set_acl
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 682939ffb60e9afff9a7d61cc1112611b2197b78

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31588/ Subject: LU-10541 llite: setxattr directly in ll_set_acl Project: fs/lustre-release Branch: master Current Patch Set: Commit: 682939ffb60e9afff9a7d61cc1112611b2197b78

            John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/31588
            Subject: LU-10541 llite: setxattr directly in ll_set_acl
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 45939caa6af84d80abebbeb33525c4f179774c53

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/31588 Subject: LU-10541 llite: setxattr directly in ll_set_acl Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 45939caa6af84d80abebbeb33525c4f179774c53

            Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/31486
            Subject: LU-10541 llite: use faked dentry in ll_set_acl
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a20273abded0a7b6add3b1c50ca61c0938abd71f

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/31486 Subject: LU-10541 llite: use faked dentry in ll_set_acl Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a20273abded0a7b6add3b1c50ca61c0938abd71f
            jhammond John Hammond added a comment -

            It seems likely that this is caused by the following. In ll_set_acl() we do

                    /* dentry is only used for *.lov attributes so it's safe to be NULL */
                    rc = __vfs_setxattr(NULL, inode, name, value, size, XATTR_CREATE);
            

            And we have

            static int ll_xattr_set_common_4_3(const struct xattr_handler *handler,
                                               struct dentry *dentry, const char *name,
                                               const void *value, size_t size, int flags)
            {
                    return ll_xattr_set_common(handler, dentry, dentry->d_inode, name,
                                               value, size, flags);
            }
            
            jhammond John Hammond added a comment - It seems likely that this is caused by the following. In ll_set_acl() we do /* dentry is only used for *.lov attributes so it's safe to be NULL */ rc = __vfs_setxattr(NULL, inode, name, value, size, XATTR_CREATE); And we have static int ll_xattr_set_common_4_3( const struct xattr_handler *handler, struct dentry *dentry, const char *name, const void *value, size_t size, int flags) { return ll_xattr_set_common(handler, dentry, dentry->d_inode, name, value, size, flags); }
            pjones Peter Jones added a comment -

            Hongchao

            Could you please investigate?

            Thanks

            Peter

            pjones Peter Jones added a comment - Hongchao Could you please investigate? Thanks Peter
            mdiep Minh Diep added a comment - +1 on b2_10 https://testing.hpdd.intel.com/test_sets/472f2886-0b90-11e8-bd00-52540065bddc

            Hit the same error mentioned above for "RHEL 7.4 Server/DNE/ldiskfs RHEL 7.4 Client" on 2.10.57 for racer test_1
            https://testing.hpdd.intel.com/test_sets/463269c4-fd9a-11e7-bd00-52540065bddc

            [55555.700895] LustreError: 7868:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x200000407:0x12:0x0] without object type: valid = 0x100000001
            [55555.702444] LustreError: 7868:0:(llite_lib.c:2354:ll_prep_inode()) new_inode -fatal: rc -12
            [55563.234885] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x280000405:0x50:0x0], use llapi_layout_get_by_path()
            
            standan Saurabh Tandan (Inactive) added a comment - Hit the same error mentioned above for "RHEL 7.4 Server/DNE/ldiskfs RHEL 7.4 Client" on 2.10.57 for racer test_1 https://testing.hpdd.intel.com/test_sets/463269c4-fd9a-11e7-bd00-52540065bddc [55555.700895] LustreError: 7868:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x200000407:0x12:0x0] without object type: valid = 0x100000001 [55555.702444] LustreError: 7868:0:(llite_lib.c:2354:ll_prep_inode()) new_inode -fatal: rc -12 [55563.234885] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x280000405:0x50:0x0], use llapi_layout_get_by_path()

            People

              jhammond John Hammond
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: