Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7613

racer crash on lustre nfs mount, kernel BUG at fs/namei.c:1669

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.8.0
    • 3
    • 9223372036854775807

    Description

      Once in a few times racer crashes with the following logs:

      <6>Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt
      <3>LustreError: 6426:0:(llite_nfs.c:307:ll_get_parent()) lustre: failure inode [0x200000400:0xee:0x0] get parent: rc = -2
      <4>reconnect_path: npd != pd
      <3>LustreError: 6424:0:(dir.c:429:ll_get_dir_page()) read cache page: [0x200000400:0x5d2:0x0] at 0: rc -2
      <3>LustreError: 6424:0:(dir.c:597:ll_dir_read()) error reading dir [0x200000400:0x5d2:0x0] at 0: rc -2
      <3>LustreError: 6431:0:(llite_nfs.c:307:ll_get_parent()) lustre: failure inode [0x200000400:0x5d2:0x0] get parent: rc = -2
      <3>LustreError: 6431:0:(llite_nfs.c:307:ll_get_parent()) Skipped 2 previous similar messages
      <4>------------[ cut here ]------------
      <2>kernel BUG at fs/namei.c:1669!
      <4>invalid opcode: 0000 [#1] SMP 
      <4>last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/dev
      <4>CPU 2 
      <4>Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) lfsck(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc_gss(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) autofs4 nfs fscache 8021q garp stp llc rdma_ucm(U) ib_ucm(U) rdma_cm(U) iw_cm(U) ib_ipoib(U) ib_cm(U) ib_uverbs(U) ib_umad(U) mlx5_ib(U) mlx5_core(U) mlx4_en(U) ptp pps_core mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) ipv6 mlx4_core(U) compat(U) nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ext3 jbd uinput ppdev iTCO_wdt iTCO_vendor_support parport_pc parport microcode sg serio_raw i2c_i801 lpc_ich mfd_core r8169 mii snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ext4 jbd2 mbcache sd_mod crc_t10dif ahci i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      <4>
      <4>Pid: 6424, comm: nfsd Not tainted 2.6.32-431.17.1.x2.0.47.x86_64 #1                  /D525MWV
      <4>RIP: 0010:[<ffffffff81197a24>]  [<ffffffff81197a24>] may_delete+0x134/0x190
      <4>RSP: 0018:ffff8800be16bc30  EFLAGS: 00010283
      <4>RAX: ffff8800375b5c00 RBX: ffff88009b347180 RCX: ffff88009b26f3c0
      <4>RDX: 0000000000000000 RSI: ffff88009b347180 RDI: ffff880104beeb38
      <4>RBP: ffff8800be16bc50 R08: ffff88003753a980 R09: ffff88003753a980
      <4>R10: ffff880104beeb38 R11: ffff880104beeb38 R12: ffff880104beeb38
      <4>R13: 0000000000000000 R14: 0000000000000000 R15: ffff880104beeb38
      <4>FSe:  0000000000000000(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      <4>CR2: 0000003a50f5a04c CR3: 00000000964e7000 CR4: 00000000000007e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process nfsd (pid: 6424, threadinfo ffff8800be16a000, task ffff880102fe2080)
      <4>Stack:
      <4> 0000000000000000 ffff88009b347180 ffff88009b26f3c0 0000000000000000
      <4><d> ffff8800be16bcd0 ffffffff81197cbc ffff8800be16bc70 ffff88003753a980
      <4><d> ffff8800a45d80ba ffff8800bfb4b040 ffff8800a45d80b8 00000000ffffffea
      <4>Call Trace:
      <4> [<ffffffff81197cbc>] vfs_rename+0x5c/0x480
      <4> [<ffffffffa03d0aca>] nfsd_rename+0x47a/0x4d0 [nfsd]
      <4> [<ffffffffa03dd585>] nfsd4_rename+0x75/0x220 [nfsd]
      <4> [<ffffffffa03df435>] ? nfsd4_encode_operation+0x75/0x180 [nfsd]
      <4> [<ffffffffa03dd458>] nfsd4_proc_compound+0x3d8/0x490 [nfsd]
      <4> [<ffffffffa03ca425>] nfsd_dispatch+0xe5/0x230 [nfsd]
      <4> [<ffffffffa035a844>] svc_process_common+0x344/0x640 [sunrpc]
      <4> [<ffffffff81061dc0>] ? default_wake_function+0x0/0x20
      <4> [<ffffffffa035ae80>] svc_process+0x110/0x160 [sunrpc]
      <4> [<ffffffffa03cab52>] nfsd+0xc2/0x160 [nfsd]
      <4> [<ffffffffa03caa90>] ? nfsd+0x0/0x160 [nfsd]
      <4> [<ffffffff8109ac66>] kthread+0x96/0xa0
      <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
      <4> [<ffffffff8109abd0>] ? kthread+0x0/0xa0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      The appropriate kernel code is:

      static int may_delete(struct inode *dir,struct dentry *victim,int isdir)
      {
              int error;
      
              if (!victim->d_inode)
                      return -ENOENT;
      
              BUG_ON(victim->d_parent->d_inode != dir);
      

      Attachments

        Issue Links

          Activity

            [LU-7613] racer crash on lustre nfs mount, kernel BUG at fs/namei.c:1669
            pjones Peter Jones added a comment -

            Landed for 2.9

            pjones Peter Jones added a comment - Landed for 2.9

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17732/
            Subject: LU-7613 llite: changes to avoid cache corruption
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: cf6efbdb726ceae10a9f3c770bc7af9d15571a80

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17732/ Subject: LU-7613 llite: changes to avoid cache corruption Project: fs/lustre-release Branch: master Current Patch Set: Commit: cf6efbdb726ceae10a9f3c770bc7af9d15571a80

            ll_find_alias is responsible for getting alias for inode which can be reused. Directories are asumed to have unique alias. Where in case of non-directories there can be multiple
            aliases. In case of lustre there can be two type of aliases i.e. discon_alias and invalid_alias. Invalid_alias is an alias which satisfies these conditions

             else if (alias->d_parent == dentry->d_parent             &&
                                     alias->d_name.hash == dentry->d_name.hash       &&
                                     alias->d_name.len == dentry->d_name.len         &&
                                     memcmp(alias->d_name.name, dentry->d_name.name,
                                            dentry->d_name.len) == 0)

            Usage of discon_alias in case of non-directories may corrupt dcache and leads to kernel crash. Patch created to avoid usage of discon_alias in case of non-directories

            lokesh.jaliminche Lokesh Nagappa Jaliminche (Inactive) added a comment - ll_find_alias is responsible for getting alias for inode which can be reused. Directories are asumed to have unique alias. Where in case of non-directories there can be multiple aliases. In case of lustre there can be two type of aliases i.e. discon_alias and invalid_alias. Invalid_alias is an alias which satisfies these conditions else if (alias->d_parent == dentry->d_parent && alias->d_name.hash == dentry->d_name.hash && alias->d_name.len == dentry->d_name.len && memcmp(alias->d_name.name, dentry->d_name.name, dentry->d_name.len) == 0) Usage of discon_alias in case of non-directories may corrupt dcache and leads to kernel crash. Patch created to avoid usage of discon_alias in case of non-directories

            lokesh.jaliminche (lokesh.jaliminche@seagate.com) uploaded a new patch: http://review.whamcloud.com/17732
            Subject: LU-7613 dcache: changes made to ll_splice_inode to avoid dcache corruption.
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 3e7901e9b654989502c951fb859e505ce4a4a8cb

            gerrit Gerrit Updater added a comment - lokesh.jaliminche (lokesh.jaliminche@seagate.com) uploaded a new patch: http://review.whamcloud.com/17732 Subject: LU-7613 dcache: changes made to ll_splice_inode to avoid dcache corruption. Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3e7901e9b654989502c951fb859e505ce4a4a8cb

            Recreation steps:
            ===============
            1. cat /etc/exports
            /mnt/lustre *(crossmnt,rw,no_root_squash,async,no_subtree_check,insecure)
            2. mkdir /mnt/nfs_client
            3. /etc/init.d/nfs stop
            4. bash llmount.sh
            5. /etc/init.d/nfs start
            6. mount -t nfs server:/mnt/lustre /mnt/nfs_client
            7. cd racer
            8. bash racer.sh /mnt/nfs_client/
            9. umount /mnt/nfs_client
            10. /etc/init.d/nfs stop
            11. bash llmountcleanup.sh

            lokesh.jaliminche Lokesh Nagappa Jaliminche (Inactive) added a comment - Recreation steps: =============== 1. cat /etc/exports /mnt/lustre *(crossmnt,rw,no_root_squash,async,no_subtree_check,insecure) 2. mkdir /mnt/nfs_client 3. /etc/init.d/nfs stop 4. bash llmount.sh 5. /etc/init.d/nfs start 6. mount -t nfs server:/mnt/lustre /mnt/nfs_client 7. cd racer 8. bash racer.sh /mnt/nfs_client/ 9. umount /mnt/nfs_client 10. /etc/init.d/nfs stop 11. bash llmountcleanup.sh

            People

              wc-triage WC Triage
              lokesh.jaliminche Lokesh Nagappa Jaliminche (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: