Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7613

racer crash on lustre nfs mount, kernel BUG at fs/namei.c:1669

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.8.0
    • 3
    • 9223372036854775807

    Description

      Once in a few times racer crashes with the following logs:

      <6>Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt
      <3>LustreError: 6426:0:(llite_nfs.c:307:ll_get_parent()) lustre: failure inode [0x200000400:0xee:0x0] get parent: rc = -2
      <4>reconnect_path: npd != pd
      <3>LustreError: 6424:0:(dir.c:429:ll_get_dir_page()) read cache page: [0x200000400:0x5d2:0x0] at 0: rc -2
      <3>LustreError: 6424:0:(dir.c:597:ll_dir_read()) error reading dir [0x200000400:0x5d2:0x0] at 0: rc -2
      <3>LustreError: 6431:0:(llite_nfs.c:307:ll_get_parent()) lustre: failure inode [0x200000400:0x5d2:0x0] get parent: rc = -2
      <3>LustreError: 6431:0:(llite_nfs.c:307:ll_get_parent()) Skipped 2 previous similar messages
      <4>------------[ cut here ]------------
      <2>kernel BUG at fs/namei.c:1669!
      <4>invalid opcode: 0000 [#1] SMP 
      <4>last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/dev
      <4>CPU 2 
      <4>Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) lfsck(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc_gss(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) autofs4 nfs fscache 8021q garp stp llc rdma_ucm(U) ib_ucm(U) rdma_cm(U) iw_cm(U) ib_ipoib(U) ib_cm(U) ib_uverbs(U) ib_umad(U) mlx5_ib(U) mlx5_core(U) mlx4_en(U) ptp pps_core mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) ipv6 mlx4_core(U) compat(U) nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ext3 jbd uinput ppdev iTCO_wdt iTCO_vendor_support parport_pc parport microcode sg serio_raw i2c_i801 lpc_ich mfd_core r8169 mii snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ext4 jbd2 mbcache sd_mod crc_t10dif ahci i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      <4>
      <4>Pid: 6424, comm: nfsd Not tainted 2.6.32-431.17.1.x2.0.47.x86_64 #1                  /D525MWV
      <4>RIP: 0010:[<ffffffff81197a24>]  [<ffffffff81197a24>] may_delete+0x134/0x190
      <4>RSP: 0018:ffff8800be16bc30  EFLAGS: 00010283
      <4>RAX: ffff8800375b5c00 RBX: ffff88009b347180 RCX: ffff88009b26f3c0
      <4>RDX: 0000000000000000 RSI: ffff88009b347180 RDI: ffff880104beeb38
      <4>RBP: ffff8800be16bc50 R08: ffff88003753a980 R09: ffff88003753a980
      <4>R10: ffff880104beeb38 R11: ffff880104beeb38 R12: ffff880104beeb38
      <4>R13: 0000000000000000 R14: 0000000000000000 R15: ffff880104beeb38
      <4>FSe:  0000000000000000(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      <4>CR2: 0000003a50f5a04c CR3: 00000000964e7000 CR4: 00000000000007e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process nfsd (pid: 6424, threadinfo ffff8800be16a000, task ffff880102fe2080)
      <4>Stack:
      <4> 0000000000000000 ffff88009b347180 ffff88009b26f3c0 0000000000000000
      <4><d> ffff8800be16bcd0 ffffffff81197cbc ffff8800be16bc70 ffff88003753a980
      <4><d> ffff8800a45d80ba ffff8800bfb4b040 ffff8800a45d80b8 00000000ffffffea
      <4>Call Trace:
      <4> [<ffffffff81197cbc>] vfs_rename+0x5c/0x480
      <4> [<ffffffffa03d0aca>] nfsd_rename+0x47a/0x4d0 [nfsd]
      <4> [<ffffffffa03dd585>] nfsd4_rename+0x75/0x220 [nfsd]
      <4> [<ffffffffa03df435>] ? nfsd4_encode_operation+0x75/0x180 [nfsd]
      <4> [<ffffffffa03dd458>] nfsd4_proc_compound+0x3d8/0x490 [nfsd]
      <4> [<ffffffffa03ca425>] nfsd_dispatch+0xe5/0x230 [nfsd]
      <4> [<ffffffffa035a844>] svc_process_common+0x344/0x640 [sunrpc]
      <4> [<ffffffff81061dc0>] ? default_wake_function+0x0/0x20
      <4> [<ffffffffa035ae80>] svc_process+0x110/0x160 [sunrpc]
      <4> [<ffffffffa03cab52>] nfsd+0xc2/0x160 [nfsd]
      <4> [<ffffffffa03caa90>] ? nfsd+0x0/0x160 [nfsd]
      <4> [<ffffffff8109ac66>] kthread+0x96/0xa0
      <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
      <4> [<ffffffff8109abd0>] ? kthread+0x0/0xa0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      The appropriate kernel code is:

      static int may_delete(struct inode *dir,struct dentry *victim,int isdir)
      {
              int error;
      
              if (!victim->d_inode)
                      return -ENOENT;
      
              BUG_ON(victim->d_parent->d_inode != dir);
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              lokesh.jaliminche Lokesh Nagappa Jaliminche (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: