[LU-15941] sanity test_398b: timeouts with ZFS Created: 14/Jun/22  Updated: 16/Aug/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-15963 sanityn test_56b: OSS OOM with ZFS In Progress
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Alex Zhuravlev <bzzz@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/097393a8-5380-4d65-af83-5d44c963ce88

test_398b failed with the following error:

Timeout occurred after 326 minutes, last suite running was sanity

this started on June 10, after recent landing wave.
I suspect two patches:

  • c45b8a92a3 2022-05-11 | LU-15583 build: Update ZFS version to 2.1.2 [Jian Yu]
  • b4880f3758 2021-07-15 | LU-15483 tests: Improve test 398b [Patrick Farrell]

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_398b - Timeout occurred after 326 minutes, last suite running was sanity



 Comments   
Comment by Andreas Dilger [ 01/Jul/22 ]

+1 on master: https://testing.whamcloud.com/test_sets/34d33d79-d068-4eeb-8990-9c8d06669a01

It is reporting pathetic IOPS of 1, under 8 KB/s. I guess that is contention on the singe HDD on the host, also caused by r-m-w of the larger ZFS blocks? ALEX, do you think your blocksize patch https://review.whamcloud.com/47768 "LU-15963 osd: use contiguous chunk to grow blocksize" might help this?

Comment by Alex Zhuravlev [ 24/Aug/22 ]

I profiled 398b: dt_trans_stop() in ofd_commitrw_write() takes 50 usec with ldiskfs and 512831 with ZFS on average.
the majority of OST_WRITE were missing OBD_BRW_ASYNC, this is why dt_trans_stop() was taking that long.
changing max blocksize doesn't improve the situation significantly, locally at least, but I'm going to try that with AT.

Comment by Nikitas Angelinas [ 14/Dec/22 ]

+1 on master: https://testing.whamcloud.com/test_sets/b7ffdf4c-d214-427b-95e0-379d8c837267

Comment by Nikitas Angelinas [ 17/Jan/23 ]

+1 on master: https://testing.whamcloud.com/test_sets/cff8066f-91ae-4805-a4e2-ce35545e5bfe

Comment by Patrick Farrell [ 03/Apr/23 ]

Alex,

They are missing 'ASYNC' because they should be missing async - This is direct IO, which expects the server to do a sync each time.  This means DIO performance on ZFS is absolutely terrible.  And I think we can't fix it except by fixing our sync behavior on ZFS, which I understand is a huge project.  So... yeah.

Comment by Andreas Dilger [ 03/Apr/23 ]

IIRC, there are two significant performance issues with ZFS sync writes:

  • one of course is the fact that transaction commit has a lot of overhead (4x überblock sync writes per device, full merkle tree flush each time)
  • the other is that calling "sync" on ZFS does not actually trigger a transaction commit, it just waits for one to happen by itself. That is why, on average, Alex is reporting a 0.5s commit time, since the commit interval is 1s and half of the time we are close to the transaction already committing.

For flash devices it would still be possible to commit thousands of times per second, and for HDD devices maybe 10/s, instead of 1/s. This would of course increase load on the storage and CPUs, but what else are they for, and why should both the clients and servers be waiting idle for the 1s ZFS transaction commit?

Comment by Nikitas Angelinas [ 16/Aug/23 ]

+1 on master: https://testing.whamcloud.com/test_sets/2dda2437-8b99-4e36-b99d-f769947b2f6b

Generated at Sat Feb 10 03:22:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.