[LU-27] (mds_open.c:1667:mds_close()) @@@ no handle for file close ino Created: 17/Dec/10  Updated: 28/Jun/11  Resolved: 08/Feb/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.6
Fix Version/s: Lustre 1.8.6

Type: Bug Priority: Minor
Reporter: Shuichi Ihara (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File mds-messages    
Severity: 3
Rank (Obsolete): 10454

 Description   

We are hitting frequent MDS hangs at Titech due to LBUG caused by "no handle for file close in mds_close()".
it looks like similar bug 22104 and 22528, but no solution and patches yet to address this problem.
could you have a look at attachment and any suggestions?



 Comments   
Comment by Liang Zhen (Inactive) [ 17/Dec/10 ]

I think the LBUG is because error handler in mds_verify_child() is not quite right:

switch(cleanup_phase) {
case 2:
if (child_res_id->name[0] != 0)
ldlm_lock_decref(child_lockh, child_mode);

we do have chance to get "child_res_id->name[0] != 0" and "child_lockh == NULL" at here, probably Niu can look into it to see whether these is any other issue, as he is also working on open/close related code for 2.x?

Comment by Dan Ferber (Inactive) [ 18/Dec/10 ]

Assigned to Niu, per Liang's comments and suggestion.

Comment by Niu Yawei (Inactive) [ 19/Dec/10 ]

Yes, there are some defects in the mds_verify_child():

  • Wrongly decref child lock in the "no child lock wanted" case;
  • Wrongly decref parent lock in the "reget child lock successfully" case;

This bug isn't necessarily caused by the "no handle for file close in mds_close()", so I think it's not similar to bug 22104 and 22528.

Will post a patch for review soon.

Comment by Shuichi Ihara (Inactive) [ 20/Dec/10 ]

Niu, just confirmation. you did file this on bugzilla as 24360, then moving forward to review patches, right?

Comment by Niu Yawei (Inactive) [ 20/Dec/10 ]

Yes, the patch has been posted on BZ and Gerrit for review.

Comment by Peter Jones [ 08/Feb/11 ]

It looks like this patch has landed on the Oracle 1.8.6. Is the same fix needed for master?

Comment by Niu Yawei (Inactive) [ 08/Feb/11 ]

No, the master doesn't have this bug.

Comment by Peter Jones [ 08/Feb/11 ]

Great! So, does any work remain or can we mark this issue as resolved?

Comment by Niu Yawei (Inactive) [ 08/Feb/11 ]

Yes, I think we can mark this as resolved, not sure if it should be marked by reporter or assignee.

Comment by Peter Jones [ 08/Feb/11 ]

ok I will mark it as resolved. Ihara, please reopen if you feel that this is inappropriate

Generated at Sat Feb 10 01:03:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.