[LU-11844] IO pattern causing writes to hang to OST Created: 09/Jan/19  Updated: 09/Jan/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.4
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Alex Parga Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File dk_client.txt     Text File dk_mds.txt     Text File dk_oss.txt     File repro.c    
Issue Links:
Related
is related to LU-11798 cur_grant goes to 0 and never increas... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We have a fortran code that does small writes to a status file. This code worked well on Lustre 2.5.x. On Lustre 2.10.4 we’re experiencing a bug with the same code where one of the writes will hang for about 20 seconds. During that period all writes to the affect OST will hang. Io to other OSTs will work, and io from other nodes are unaffected.

 

The strace of the application shows that it opens a file, truncates to 0, writes a char, tuncates to 1, and continues writing,closes then repeats. After a few cycles the write that passes 4k offset into the file will hang for about 30 seconds.

 

The strace looks like this:

open("short_file", O_RDWR|O_CREAT, 0700) = 3

ftruncate(3, 0) = 0

write(3, "/", 1) = 1

ftruncate(3, 1) = 0

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130 ← Hangs Here

 

The Lustre debug traces from the client shows the hang and indicates multiple calls to genops.c:1990:obd_stale_export_get. Attached are a reproducer, and the lctl dk output after issuing: echo "trace nettrace dlmtrace rpctrace vfstrace" > /proc/sys/lnet/debug

 

 



 Comments   
Comment by Alex Zhuravlev [ 09/Jan/19 ]

do you use ldiskfs or ZFS on OST? how much free space OSTs report?

 

Comment by Alex Parga [ 09/Jan/19 ]

We use zfs 0.7.9-1. We can replicate the problem on all of our Lustre2.10.4 filesystems that are freshly formated and up to 35% utilized.

Comment by Alex Zhuravlev [ 09/Jan/19 ]

the symptoms look like LU-11798 

Comment by Andreas Dilger [ 09/Jan/19 ]

Alex Z, why did you think LU-11798? I thought that was due to some incompatibility between 2.8 clients and 2.10 servers?

Alex P, does this issue happen immediately on a newly mounted client, or does it take some time before it hits? On a client that is having the problem, can you check "lctl get_param osc.*.cur_grant_bytes" to see if the client is running out of grant?

Comment by Alex Parga [ 09/Jan/19 ]

It does happen on a freshly mounted client. Just to confirm we are using Lustre-2.10.4 on the clients and servers. We did a test based LU-11798 comments and set the recordsize back to 128K, and then ran the reproducer while monitoring the cur_grant_bytes. The problem still occurred however with each hang the cur_grant_bytes  would cycle in order between 262144, 524288, 0 and then repeat.

Generated at Sat Feb 10 02:47:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.