[LU-5468] (mdc_locks.c:130:mdc_set_lock_data()) ASSERTION( old_i node->i_state & I_FREEING ) failed Created: 09/Aug/14  Updated: 29/May/15  Resolved: 27/Apr/15

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: fault, llite

Severity: 3
Rank (Obsolete): 15235

 Description   

Running racer with MDSCOUNT=2 and fault injection I see this often:

[  156.683517] LustreError: 6471:0:(ldlm_resource.c:1150:ldlm_resource_get()) lustre-OST0001: lvbo_init failed for resource 0x240000400:0x1fa: rc = -14
[  156.709334] LustreError: 25560:0:(mdc_locks.c:130:mdc_set_lock_data()) ASSERTION( old_inode->i_state & I_FREEING ) failed: Found existing inode ffff8801c82f3180/198158400800950378/46137348 state 1 in lock: setting data to ffff8801d3c01180/198158400800950378/46137348
[  156.714833] LustreError: 25560:0:(mdc_locks.c:130:mdc_set_lock_data()) LBUG
[  156.716560] Pid: 25560, comm: chmod
[  156.717391] 
[  156.717393] Call Trace:
[  156.718410]  [<ffffffffa02be8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[  156.720233]  [<ffffffffa02beec7>] lbug_with_loc+0x47/0xb0 [libcfs]
[  156.721706]  [<ffffffffa0918c40>] mdc_set_lock_data+0x200/0x240 [mdc]
[  156.723274]  [<ffffffffa08ba838>] lmv_set_lock_data+0x108/0x3a0 [lmv]
[  156.724811]  [<ffffffffa0ec4b7c>] ll_lookup_it_finish+0x93c/0x11b0 [lustre]
[  156.726507]  [<ffffffff810b777d>] ? trace_hardirqs_on+0xd/0x10
[  156.727900]  [<ffffffffa0ec3a40>] ? ll_md_blocking_ast+0x0/0x800 [lustre]
[  156.729682]  [<ffffffffa0ec56a7>] ll_lookup_it+0x2b7/0xad0 [lustre]
[  156.731340]  [<ffffffffa0ec5f4c>] ll_lookup_nd+0x8c/0x560 [lustre]
[  156.733089]  [<ffffffff811b29b5>] do_lookup+0x1a5/0x230
[  156.734536]  [<ffffffff811b2fc4>] __link_path_walk+0x584/0x840
[  156.736240]  [<ffffffff811b398a>] path_walk+0x6a/0xe0
[  156.737622]  [<ffffffff811b3b9b>] filename_lookup+0x6b/0xc0
[  156.738941]  [<ffffffff811b4cc7>] user_path_at+0x57/0xa0
[  156.740201]  [<ffffffff8104bc84>] ? __do_page_fault+0x244/0x4b0
[  156.741611]  [<ffffffff81162d60>] ? __vma_link_rb+0x30/0x40
[  156.742939]  [<ffffffff811a8790>] vfs_fstatat+0x50/0xa0
[  156.744174]  [<ffffffff811a890b>] vfs_stat+0x1b/0x20
[  156.745475]  [<ffffffff811a8934>] sys_newstat+0x24/0x50
[  156.746467]  [<ffffffff81554298>] ? lockdep_sys_exit_thunk+0x35/0x67
[  156.748806]  [<ffffffff810f08f7>] ? audit_syscall_entry+0x1d7/0x200
[  156.750032]  [<ffffffff81554222>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  156.751193]  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[  156.752279] 

Instrumenting mdc_set_lock_data() shows that in each case that fails the assertion we have

is_bad_inode(old_inode) || is_bad_inode(new_inode)

.

We should not call make_bad_inode() from the ll_update_inode() branch of ll_iget() since it unhashes (and modifies) an inode that may already associated with a lock:

void make_bad_inode(struct inode *inode)
{
        remove_inode_hash(inode);

        inode->i_mode = S_IFREG;
        inode->i_atime = inode->i_mtime = inode->i_ctime =
                       current_fs_time(inode->i_sb);
        inode->i_op = &bad_inode_ops;
        inode->i_fop = &bad_file_ops;
}

This is only observed with MDSCOUNT > 1 because (currently) ll_update_inode() will succeed otherwise.



 Comments   
Comment by John Hammond [ 26/Aug/14 ]

Please see http://review.whamcloud.com/#/c/11609/.

Comment by John Hammond [ 24/Sep/14 ]

Patch landed to master.

Generated at Sat Feb 10 01:51:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.