[LU-2151] slow ZFS IO performance - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.4.0
Labels:
None

Severity:
3
Rank (Obsolete):
2869

Description

The IO performance seen with sanityn fsx is abysmal:

https://maloo.whamcloud.com/test_sets/9ac72874-392b-11e1-b15b-5254004bbbd3

Running fsx in this test environment was averaging 0.6 IOPS, doing only 2100 operations over 3600 seconds!

Granted, these are virtual machines with a single disk that is likely getting pounded, and fsx is running on 2 separate client nodes, but this performance is going to be a killer. In comparison, ldiskfs completed 2500 operations in 147 seconds (17 IOPS):

https://maloo.whamcloud.com/sub_tests/71c5c4e0-3af1-11e1-8506-5254004bbbd3

I don't think that running fsx on 2 clients should cause the IO operations to be synchronous, since clients can handle async IO recovery today. It may be, however, that fsx is forcing many of the operations to be synchronous by using MMAP and/or O_DIRECT.

It may be that we have to relax the O_DIRECT semantic on ZFS to allow cached IO on the OST, instead of waiting for the data to sync to disk, since there isn't really any mechanism for "cacheless" writes on the OST. The big question is whether the O_DIRECT flag implies "uncached" behaviour on the client, or it implies "synchronous writes" or both?

Attachments

Activity

People

Assignee:: Alex Zhuravlev

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 09/Jan/12 2:44 PM

Updated:: 11/Oct/12 4:04 PM

Resolved:: 17/Sep/12 1:20 PM