[LU-2025] cd'ing to a certain directory causes client eviction Created: 25/Sep/12  Updated: 18/Jul/14  Resolved: 18/Jul/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.7
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Manish Patel (Inactive) Assignee: Yang Sheng
Resolution: Cannot Reproduce Votes: 0
Labels: None

Attachments: File lustre_dk.log.fe7.bz2     Text File lustre_dk.log.mds-1-1.bz2     File messages-fe7    
Severity: 3
Rank (Obsolete): 4150

 Description   

Directory not accessible. When a CD is typed as follows:
cd /scratch1/portfolios/NCEPDEV/jcsda/noscrub/Yong.Chen/CRTM_ODPStraining/work_ir_v12.1_Zeus/ECMWF83/iasiB3_metop-a

The following error is generated after a ~6 minute wait:
cannot access iasiB3_metop-a: Cannot send after transport endpoint shutdown

ls of this directory hangs for more than 10 minutes. behavior is not see on any of the parent directories.

It appears that the MDT gets stuck and the client is eventually evicted. I have kernel messages from the the client as well as debug logs for the client and MDT.



 Comments   
Comment by Kit Westneat (Inactive) [ 25/Sep/12 ]

The behavior can be reproduced on any client at will. Let me know if there is other information that could be useful, and I will try to get it.

Comment by Peter Jones [ 25/Sep/12 ]

Yangsheng

Could you please comment on this one?

Thanks

Peter

Comment by Kit Westneat (Inactive) [ 27/Sep/12 ]

Any update? This user is currently unable to access their data, so it's high priority for us.

Thanks,
Kit

Comment by Yang Sheng [ 27/Sep/12 ]

Hi, Kit, I'll starting look into this issue. Could you please recollect the logs for me? They are look like not include the failure-point data. I cannot find the Directory name in fe7 log. And please trying to stop other operations while collect log. Of course, this looks like a product system and very busy. So i understand it not so easy to do. TIA.

Comment by Oleg Drokin [ 27/Sep/12 ]

So it looks like some mds processes stuck while processing requests.
A messages from MDS would be useful.
Also sysrq-t from MDS

Comment by Peter Jones [ 18/Jul/14 ]

As per DDN this is no longer a priority

Generated at Sat Feb 10 01:21:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.