[LU-1535] LustreError: 1843:0:(mds_open.c:1645:mds_close()) Created: 17/Jun/12  Updated: 22/Feb/13  Resolved: 06/Jul/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 1.8.9

Type: Bug Priority: Minor
Reporter: Shuichi Ihara (Inactive) Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File LFS05_MDS_20120617.log    
Severity: 3
Rank (Obsolete): 6379

 Description   

On the our customer lustre system, MDS thread hanged after the call traces. On MDS, the following messages showed up during the call traces.

Jun 17 05:45:08 ALPL505 kernel: LustreError: 1843:0:(mds_open.c:1645:mds_close()) @@@ no handle for file close ino 21989538: cookie 0x1e6d8ca7fa6bf800  req@ffff810287df6400 x1401981983149299/t0 o35->c9344f7b-1e2a-0615-0b51-cbf06bb316a5@NET_0x500000a030235_UUID:0/0 lens 408/4896 e 0 to 0 dl 1339883114 ref 1 fl Interpret:/0/0 rc 0/0
Jun 17 05:45:08 ALPL505 kernel: LustreError: 1843:0:(mds_open.c:1645:mds_close()) Skipped 1 previous similar message
Jun 17 05:45:08 ALPL505 kernel: LustreError: 1843:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-116)  req@ffff810287df6400 x1401981983149299/t0 o35->c9344f7b-1e2a-0615-0b51-cbf06bb316a5@NET_0x500000a030235_UUID:0/0 lens 408/2928 e 0 to 0 dl 1339883114 ref 1 fl Interpret:/0/0 rc -116/0
Jun 17 05:45:08 ALPL505 kernel: LustreError: 1843:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 1 previous similar message
Jun 17 05:45:09 ALPL505 kernel: LustreError: 2131:0:(mds_open.c:1645:mds_close()) @@@ no handle for file close ino 21922157: cookie 0x1e6d8ca7fa68693c  req@ffff810237e71c00 x1401981983149371/t0 o35->c9344f7b-1e2a-0615-0b51-cbf06bb316a5@NET_0x500000a030235_UUID:0/0 lens 408/4896 e 0 to 0 dl 1339883115 ref 1 fl Interpret:/0/0 rc 0/0
Jun 17 06:02:54 ALPL505 kernel: Lustre: Service thread pid 981 was inactive for 710.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:


 Comments   
Comment by Shuichi Ihara (Inactive) [ 17/Jun/12 ]

messages attached.

Comment by Peter Jones [ 18/Jun/12 ]

Lai

Could you please look into this one?

Thanks

Peter

Comment by Andreas Dilger [ 18/Jun/12 ]

Ihara, what version of Lustre is this?

Comment by Shuichi Ihara (Inactive) [ 18/Jun/12 ]

the Lustre-1.8.6-wc1 is running on this cluster.

Comment by Lai Siyao [ 19/Jun/12 ]

This looks to be the same issue in http://jira.whamcloud.com/browse/LU-1128. I've backported the fix at: http://review.whamcloud.com/#change,3138. Could you find a way to verify?

Comment by Shuichi Ihara (Inactive) [ 19/Jun/12 ]

Lai,
thanks! we will try your backport patch. btw, this problem happened on MDS, does LU-1128 causes this problem on MDS as well?

Comment by Lai Siyao [ 19/Jun/12 ]

The issue LU-1128 fixed is for ldlm server, that is, it may occur on MGS, MDS and OSS.

Comment by Lai Siyao [ 06/Jul/12 ]

patched landed.

Comment by Cory Spitz [ 06/Jul/12 ]

Lai, which patch landed? You marked this bug fixed for 2.3.0, but I don't see any master patches beyond LU-1128 and that was marked fixed for 2.2.0 and 2.1.2. change #3138 landed to b1_8 so maybe this ticket should be marked fixed for 1.8.9 instead, if that is possible.

Comment by Lai Siyao [ 06/Jul/12 ]

Cory, thanks for your clarification.

Comment by Lai Siyao [ 06/Jul/12 ]

The actual fix version should be 1.8.9, if there will be one. But it's invalid to fill in that field because it doesn't exist. 1.8.8 is used.

Comment by Peter Jones [ 08/Jul/12 ]

Rather than put an incorrect release we can just leave the release fixed field empty and populate it if/when we form concrete plans for a 1.8.9 release.

Generated at Sat Feb 10 01:17:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.