[LU-875] (mds_open.c:1645:mds_close()) @@@ no handle for file close Created: 23/Nov/11  Updated: 06/Jan/12  Resolved: 06/Jan/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Shuichi Ihara (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File messages.1    
Severity: 3
Rank (Obsolete): 6516

 Description   

we are seeing the following error messages very offen at the customer site.

Nov 17 23:36:29 ALPL506 kernel: LustreError: 1810:0:(mds_open.c:1645:mds_close()) @@@ no handle for file close ino 20549237: cookie 0xffea02d92441093b req@ffff810604316800 x1385446631059308/t0 o35->3f691b52-07fd-72a2-4901-6a11eb41c9af@NET_0x500000a03061c_UUID:0/0 lens 408/4896 e 0 to 0 dl 1321544195 ref 1 fl Interpret:/0/0 rc 0/0
Nov 17 23:36:29 ALPL506 kernel: LustreError: 1810:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (116) req@ffff810604316800 x1385446631059308/t0 o35>3f691b52-07fd-72a2-4901-6a11eb41c9af@NET_0x500000a03061c_UUID:0/0 lens 408/2928 e 0 to 0 dl 1321544195 ref 1 fl Interpret:/0/0 rc -116/0
Nov 17 23:36:29 ALPL506 kernel: LustreError: 1873:0:(mds_open.c:1645:mds_close()) @@@ no handle for file close ino 20549177: cookie 0xffea02d9245b5b8d req@ffff810450e77800 x1385446631059335/t0 o35->3f691b52-07fd-72a2-4901-6a11eb41c9af@NET_0x500000a03061c_UUID:0/0 lens 408/4896 e 0 to 0 dl 1321544195 ref 1 fl Interpret:/0/0 rc 0/0
Nov 17 23:36:29 ALPL506 kernel: LustreError: 1873:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (116) req@ffff810450e77800 x1385446631059335/t0 o35>3f691b52-07fd-72a2-4901-6a11eb41c9af@NET_0x500000a03061c_UUID:0/0 lens 408/2928 e 0 to 0 dl 1321544195 ref 1 fl Interpret:/0/0 rc -116/0
Nov 17 23:36:36 ALPL506 kernel: LustreError: 2012:0:(mds_open.c:1645:mds_close()) @@@ no handle for file close ino 20549276: cookie 0xffea02d9244de445 req@ffff8102c148c450 x1385446631059475/t0 o35->3f691b52-07fd-72a2-4901-6a11eb41c9af@NET_0x500000a03061c_UUID:0/0 lens 408/4896 e 0 to 0 dl 1321544202 ref 1 fl Interpret:/0/0 rc 0/0
Nov 17 23:36:36 ALPL506 kernel: LustreError: 2012:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (116) req@ffff8102c148c450 x1385446631059475/t0 o35>3f691b52-07fd-72a2-4901-6a11eb41c9af@NET_0x500000a03061c_UUID:0/0 lens 408/2928 e 0 to 0 dl 1321544202 ref 1 fl Interpret:/0/0 rc -116/0

please advise whether this is something problem ot not.



 Comments   
Comment by Andreas Dilger [ 23/Nov/11 ]

This message is itself not a serious problem. It means that the clients were previously evicted from the MDS, and when they try to close the files, there is no record of the open on the MDS.

If the clients are being evicted on a regular basis, then this is the real problem to be investigating.

Comment by Shuichi Ihara (Inactive) [ 23/Nov/11 ]

I had a look at log files. Some of clients were eveicted from MDS, but it happened one or more hours before these messages showed up. do these eveistions still affect to this error messages?

Nov 17 18:05:10 ALPL506 kernel: Lustre: LFS05-MDT0000: haven't heard from client b5553d1d-d2d7-10eb-8dd2-47d28010c55a (at 10.3.9.23@o2ib) in 227 seconds. I think it's dead, and I am evicting it.
Nov 17 18:05:10 ALPL506 kernel: Lustre: Skipped 7 previous similar messages
Nov 17 22:43:54 ALPL506 kernel: Lustre: LFS05-MDT0000: haven't heard from client 3f691b52-07fd-72a2-4901-6a11eb41c9af (at 10.3.6.28@o2ib) in 227 seconds. I think it's dead, and I am evicting it.
Nov 17 22:43:54 ALPL506 kernel: Lustre: Skipped 1 previous similar message
Nov 17 22:43:54 ALPL506 kernel: Lustre: MGS: haven't heard from client 5b087fc9-5c1b-1600-b1aa-170c245d32f5 (at 10.3.6.28@o2ib) in 227 seconds. I think it's dead, and I am evicting it.
Nov 17 23:36:29 ALPL506 kernel: LustreError: 1810:0:(mds_open.c:1645:mds_close()) @@@ no handle for file close ino 20549237: cookie 0xffea02d92441093b req@ffff810604316800 x1385446631059308/t0 o35->3f691b52-07fd-72a2-4901-6a11eb41c9af@NET_0x500000a03061c_UUID:0/0 lens 408/4896 e 0 to 0 dl 1321544195 ref 1 fl Interpret:/0/0 rc 0/0

Comment by Peter Jones [ 23/Nov/11 ]

Niu

Can you please look into this one?

Thanks

Peter

Comment by Andreas Dilger [ 23/Nov/11 ]

Ihara, yes these messages will only happen when the client tries to close the file, regardless of how long ago it was evicted.

Did the client(s) have a large number of files open? If there are more such messages from the same client UUID after it is clear that the application running there at the time of eviction is stopped, then it may be some new problem.

Comment by Shuichi Ihara (Inactive) [ 06/Jan/12 ]

It seems that Infinbiand issues happened occasionally at that time. So, we saw many evicted clients due to these IB issues.
we haven't much seen same messages so far. Please close this ticket, and I will open new ticket if we see same messages as real problem.

Comment by Peter Jones [ 06/Jan/12 ]

ok - thanks Ihara!

Generated at Sat Feb 10 01:11:15 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.