Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.7.0
-
3
-
15235
Description
Running racer with MDSCOUNT=2 and fault injection I see this often:
[ 156.683517] LustreError: 6471:0:(ldlm_resource.c:1150:ldlm_resource_get()) lustre-OST0001: lvbo_init failed for resource 0x240000400:0x1fa: rc = -14 [ 156.709334] LustreError: 25560:0:(mdc_locks.c:130:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode ffff8801c82f3180/198158400800950378/46137348 state 1 in lock: setting data to ffff8801d3c01180/198158400800950378/46137348 [ 156.714833] LustreError: 25560:0:(mdc_locks.c:130:mdc_set_lock_data()) LBUG [ 156.716560] Pid: 25560, comm: chmod [ 156.717391] [ 156.717393] Call Trace: [ 156.718410] [<ffffffffa02be8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [ 156.720233] [<ffffffffa02beec7>] lbug_with_loc+0x47/0xb0 [libcfs] [ 156.721706] [<ffffffffa0918c40>] mdc_set_lock_data+0x200/0x240 [mdc] [ 156.723274] [<ffffffffa08ba838>] lmv_set_lock_data+0x108/0x3a0 [lmv] [ 156.724811] [<ffffffffa0ec4b7c>] ll_lookup_it_finish+0x93c/0x11b0 [lustre] [ 156.726507] [<ffffffff810b777d>] ? trace_hardirqs_on+0xd/0x10 [ 156.727900] [<ffffffffa0ec3a40>] ? ll_md_blocking_ast+0x0/0x800 [lustre] [ 156.729682] [<ffffffffa0ec56a7>] ll_lookup_it+0x2b7/0xad0 [lustre] [ 156.731340] [<ffffffffa0ec5f4c>] ll_lookup_nd+0x8c/0x560 [lustre] [ 156.733089] [<ffffffff811b29b5>] do_lookup+0x1a5/0x230 [ 156.734536] [<ffffffff811b2fc4>] __link_path_walk+0x584/0x840 [ 156.736240] [<ffffffff811b398a>] path_walk+0x6a/0xe0 [ 156.737622] [<ffffffff811b3b9b>] filename_lookup+0x6b/0xc0 [ 156.738941] [<ffffffff811b4cc7>] user_path_at+0x57/0xa0 [ 156.740201] [<ffffffff8104bc84>] ? __do_page_fault+0x244/0x4b0 [ 156.741611] [<ffffffff81162d60>] ? __vma_link_rb+0x30/0x40 [ 156.742939] [<ffffffff811a8790>] vfs_fstatat+0x50/0xa0 [ 156.744174] [<ffffffff811a890b>] vfs_stat+0x1b/0x20 [ 156.745475] [<ffffffff811a8934>] sys_newstat+0x24/0x50 [ 156.746467] [<ffffffff81554298>] ? lockdep_sys_exit_thunk+0x35/0x67 [ 156.748806] [<ffffffff810f08f7>] ? audit_syscall_entry+0x1d7/0x200 [ 156.750032] [<ffffffff81554222>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 156.751193] [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b [ 156.752279]
Instrumenting mdc_set_lock_data() shows that in each case that fails the assertion we have
is_bad_inode(old_inode) || is_bad_inode(new_inode)
.
We should not call make_bad_inode() from the ll_update_inode() branch of ll_iget() since it unhashes (and modifies) an inode that may already associated with a lock:
void make_bad_inode(struct inode *inode) { remove_inode_hash(inode); inode->i_mode = S_IFREG; inode->i_atime = inode->i_mtime = inode->i_ctime = current_fs_time(inode->i_sb); inode->i_op = &bad_inode_ops; inode->i_fop = &bad_file_ops; }
This is only observed with MDSCOUNT > 1 because (currently) ll_update_inode() will succeed otherwise.
Attachments
Issue Links
- mentioned in
-
Page Loading...