[LU-3451] Editor closes file very slow because of fsync() Created: 11/Jun/13  Updated: 07/Aug/13  Resolved: 07/Aug/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.8
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Li Xi (Inactive) Assignee: Peter Jones
Resolution: Not a Bug Votes: 0
Labels: ldiskfs
Environment:

Kernel: 2.6.32-220.el6.x86_64
Lustre: 1.8.8


Attachments: File fsync.c    
Severity: 3
Rank (Obsolete): 8631

 Description   

When edit a file by emacs or vim and save it, it takes a couple of seconds to close it. This problen is not limited to a specific file, client or OST, and can be reproduced easily at any file or any client. It occurs with root user. There is no error recorded on MDS/OSS/client syslog for it.

We traced the process and got Lustre log. According to the strace, it takes seconds while fsync(). And according to Lustre log of client, mdc_sync() waits for seconds until it recieves the reply. But the log on Lustre MDS does not contain any message about mds_sync().

We do not find any problem of other metadata operations like file creation or direcotry creation. The simple test 'fsync.c' opens, writes to and then fsync a file but it can not reproduce the problem.



 Comments   
Comment by Li Xi (Inactive) [ 11/Jun/13 ]

Here are the logs:
ftp://ftp.whamcloud.com/uploads/LU-3451/MMBK_log_20130605.tar.gz
ftp://ftp.whamcloud.com/uploads/LU-3451/lustre_debug_mds_cl.tar1.gz

Comment by Peter Jones [ 12/Jun/13 ]

Thanks for your submission Li Xi.

Comment by Li Xi (Inactive) [ 12/Jun/13 ]

Thank you Peter. I would be grateful for any help.

Comment by Andreas Dilger [ 17/Jun/13 ]

Is it slow on every close, or just occasionally?

I think occasional slow closes are caused by close on the MDT changing the atime, but getting stuck in the journal transaction commit. There is an old Bugzilla bug for this problem also, but I'm not able to find it (possibly because it is not public?).

Comment by Li Xi (Inactive) [ 19/Jun/13 ]

Hi Andreas,

It is slow accasionally, but very frequently. And we found that there are a lot of open/close operations on the system. Is it the cause of the problem?

Do you know which direction I should do more research? Or is it too complex to fix it promptly?

Thanks!

Comment by Shuichi Ihara (Inactive) [ 07/Aug/13 ]

After we collected metadata stats, this performance problem was caused by too many metadata operation with some applications. We found that applications that was doing such metadata operation and stop the job, the system was going to be back normal. So, please close ticket now.

Comment by Peter Jones [ 07/Aug/13 ]

ok - thanks Ihara

Generated at Sat Feb 10 01:34:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.