Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5468

(mdc_locks.c:130:mdc_set_lock_data()) ASSERTION( old_i node->i_state & I_FREEING ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.7.0
    • Lustre 2.7.0
    • 3
    • 15235

    Description

      Running racer with MDSCOUNT=2 and fault injection I see this often:

      [  156.683517] LustreError: 6471:0:(ldlm_resource.c:1150:ldlm_resource_get()) lustre-OST0001: lvbo_init failed for resource 0x240000400:0x1fa: rc = -14
      [  156.709334] LustreError: 25560:0:(mdc_locks.c:130:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode ffff8801c82f3180/198158400800950378/46137348 state 1 in lock: setting data to ffff8801d3c01180/198158400800950378/46137348
      [  156.714833] LustreError: 25560:0:(mdc_locks.c:130:mdc_set_lock_data()) LBUG
      [  156.716560] Pid: 25560, comm: chmod
      [  156.717391] 
      [  156.717393] Call Trace:
      [  156.718410]  [<ffffffffa02be8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [  156.720233]  [<ffffffffa02beec7>] lbug_with_loc+0x47/0xb0 [libcfs]
      [  156.721706]  [<ffffffffa0918c40>] mdc_set_lock_data+0x200/0x240 [mdc]
      [  156.723274]  [<ffffffffa08ba838>] lmv_set_lock_data+0x108/0x3a0 [lmv]
      [  156.724811]  [<ffffffffa0ec4b7c>] ll_lookup_it_finish+0x93c/0x11b0 [lustre]
      [  156.726507]  [<ffffffff810b777d>] ? trace_hardirqs_on+0xd/0x10
      [  156.727900]  [<ffffffffa0ec3a40>] ? ll_md_blocking_ast+0x0/0x800 [lustre]
      [  156.729682]  [<ffffffffa0ec56a7>] ll_lookup_it+0x2b7/0xad0 [lustre]
      [  156.731340]  [<ffffffffa0ec5f4c>] ll_lookup_nd+0x8c/0x560 [lustre]
      [  156.733089]  [<ffffffff811b29b5>] do_lookup+0x1a5/0x230
      [  156.734536]  [<ffffffff811b2fc4>] __link_path_walk+0x584/0x840
      [  156.736240]  [<ffffffff811b398a>] path_walk+0x6a/0xe0
      [  156.737622]  [<ffffffff811b3b9b>] filename_lookup+0x6b/0xc0
      [  156.738941]  [<ffffffff811b4cc7>] user_path_at+0x57/0xa0
      [  156.740201]  [<ffffffff8104bc84>] ? __do_page_fault+0x244/0x4b0
      [  156.741611]  [<ffffffff81162d60>] ? __vma_link_rb+0x30/0x40
      [  156.742939]  [<ffffffff811a8790>] vfs_fstatat+0x50/0xa0
      [  156.744174]  [<ffffffff811a890b>] vfs_stat+0x1b/0x20
      [  156.745475]  [<ffffffff811a8934>] sys_newstat+0x24/0x50
      [  156.746467]  [<ffffffff81554298>] ? lockdep_sys_exit_thunk+0x35/0x67
      [  156.748806]  [<ffffffff810f08f7>] ? audit_syscall_entry+0x1d7/0x200
      [  156.750032]  [<ffffffff81554222>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [  156.751193]  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      [  156.752279] 
      

      Instrumenting mdc_set_lock_data() shows that in each case that fails the assertion we have

      is_bad_inode(old_inode) || is_bad_inode(new_inode)

      .

      We should not call make_bad_inode() from the ll_update_inode() branch of ll_iget() since it unhashes (and modifies) an inode that may already associated with a lock:

      void make_bad_inode(struct inode *inode)
      {
              remove_inode_hash(inode);
      
              inode->i_mode = S_IFREG;
              inode->i_atime = inode->i_mtime = inode->i_ctime =
                             current_fs_time(inode->i_sb);
              inode->i_op = &bad_inode_ops;
              inode->i_fop = &bad_file_ops;
      }
      

      This is only observed with MDSCOUNT > 1 because (currently) ll_update_inode() will succeed otherwise.

      Attachments

        Issue Links

          Activity

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: