[LU-2151] slow ZFS IO performance Created: 09/Jan/12  Updated: 11/Oct/12  Resolved: 17/Sep/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Andreas Dilger Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 2869

 Description   

The IO performance seen with sanityn fsx is abysmal:

https://maloo.whamcloud.com/test_sets/9ac72874-392b-11e1-b15b-5254004bbbd3

Running fsx in this test environment was averaging 0.6 IOPS, doing only 2100 operations over 3600 seconds!

Granted, these are virtual machines with a single disk that is likely getting pounded, and fsx is running on 2 separate client nodes, but this performance is going to be a killer. In comparison, ldiskfs completed 2500 operations in 147 seconds (17 IOPS):

https://maloo.whamcloud.com/sub_tests/71c5c4e0-3af1-11e1-8506-5254004bbbd3

I don't think that running fsx on 2 clients should cause the IO operations to be synchronous, since clients can handle async IO recovery today. It may be, however, that fsx is forcing many of the operations to be synchronous by using MMAP and/or O_DIRECT.

It may be that we have to relax the O_DIRECT semantic on ZFS to allow cached IO on the OST, instead of waiting for the data to sync to disk, since there isn't really any mechanism for "cacheless" writes on the OST. The big question is whether the O_DIRECT flag implies "uncached" behaviour on the client, or it implies "synchronous writes" or both?



 Comments   
Comment by Andreas Dilger [ 24/Jan/12 ]

This is causing almost all sanityn.sh tests to time out, because fsx cannot complete 2500 filesystem operations within 3600s. In some cases, it barely completes the operations before timing out in test_18 or shortly thereafter.

Comment by Johann Lombardi (Inactive) [ 26/Jan/12 ]

I have just pushed a grant patch which should reduce the number of sync calls.
http://review.whamcloud.com/2027

Let's see if this helps.

Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,client,el5,inkernel #340
ORI-473 grant: sync backend filesystem less aggressively (Revision 7c049d71e8594a3e153b3ae914ed89e3f999ec0b)

Result = SUCCESS
Mikhail Pershin : 7c049d71e8594a3e153b3ae914ed89e3f999ec0b
Files :

  • lustre/ofd/ofd_grant.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,client,el6,inkernel #340
ORI-473 grant: sync backend filesystem less aggressively (Revision 7c049d71e8594a3e153b3ae914ed89e3f999ec0b)

Result = SUCCESS
Mikhail Pershin : 7c049d71e8594a3e153b3ae914ed89e3f999ec0b
Files :

  • lustre/ofd/ofd_grant.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,server,el5,inkernel #340
ORI-473 grant: sync backend filesystem less aggressively (Revision 7c049d71e8594a3e153b3ae914ed89e3f999ec0b)

Result = SUCCESS
Mikhail Pershin : 7c049d71e8594a3e153b3ae914ed89e3f999ec0b
Files :

  • lustre/ofd/ofd_grant.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,server,el6,inkernel #340
ORI-473 grant: sync backend filesystem less aggressively (Revision 7c049d71e8594a3e153b3ae914ed89e3f999ec0b)

Result = SUCCESS
Mikhail Pershin : 7c049d71e8594a3e153b3ae914ed89e3f999ec0b
Files :

  • lustre/ofd/ofd_grant.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,client,el5,inkernel #340
ORI-473 grant: sync backend filesystem less aggressively (Revision 7c049d71e8594a3e153b3ae914ed89e3f999ec0b)

Result = SUCCESS
Mikhail Pershin : 7c049d71e8594a3e153b3ae914ed89e3f999ec0b
Files :

  • lustre/ofd/ofd_grant.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,server,el5,inkernel #340
ORI-473 grant: sync backend filesystem less aggressively (Revision 7c049d71e8594a3e153b3ae914ed89e3f999ec0b)

Result = SUCCESS
Mikhail Pershin : 7c049d71e8594a3e153b3ae914ed89e3f999ec0b
Files :

  • lustre/ofd/ofd_grant.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,client,el6,inkernel #340
ORI-473 grant: sync backend filesystem less aggressively (Revision 7c049d71e8594a3e153b3ae914ed89e3f999ec0b)

Result = SUCCESS
Mikhail Pershin : 7c049d71e8594a3e153b3ae914ed89e3f999ec0b
Files :

  • lustre/ofd/ofd_grant.c
Comment by Ian Colle (Inactive) [ 17/Sep/12 ]

Patch landed.

Generated at Sat Feb 10 01:22:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.