[LU-1231] statahead/agl is slower on sles11 client Created: 18/Mar/12  Updated: 24/Jul/12  Resolved: 24/Jul/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Minh Diep Assignee: nasf (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 6431

 Description   

https://maloo.whamcloud.com/test_sessions/0a6e15b6-7126-11e1-a89e-5254004bbbd3

total: 9000 creates in 28.71 seconds: 313.44 creates/second
llite.lustre-ffff88005e131800.statahead_max=0
10001

real 0m55.864s
user 0m0.140s
sys 0m28.882s
ls 10000 files without statahead: 56 sec
llite.lustre-ffff88005e131800.statahead_max=32
32
10001

real 0m59.101s
user 0m0.180s
sys 0m24.130s
ls 10000 files with statahead: 59 sec
statahead total: 4
statahead wrong: 0
agl total: 4



 Comments   
Comment by Peter Jones [ 19/Mar/12 ]

Fanyong

Could you please give an initial assessment of this issue?

Thanks

Peter

Comment by nasf (Inactive) [ 23/Mar/12 ]

Such case is reasonable, because "statahead + AGL" needs to start two threads at background for pre-fetching. That requires powerful CPU to schedule those pre-fetching threads in time. But above tests are done under virtual machines: four VMs share the same physical node, which only have 4-cores. That means each VM only has 1-core. So it is possible that those pre-fetching threads cannot be scheduled in time. And then the "statahead + AGL" mode is slower than w/o cases.

Comment by Minh Diep [ 23/Mar/12 ]

How do we explain that the issue only seen in sles11 clients, not in rhel5 nor rhel6. Not even in rhel6 1.8.7-wc1 client.

Comment by nasf (Inactive) [ 24/Mar/12 ]

In fact, I have also met the same issue in my local VM environment, mine is RHEL6 based, with 2-cores shared by MDS + OSTs + client.

On the other hand, is it true that all test_123a failed for lustre-2.x on sles11 client?

Comment by nasf (Inactive) [ 24/Jul/12 ]

Statahead performance under VM environment is incredible because of less CPU power and schedule delay.

On the other hand, we found some performance degression caused by security getxattr RPC. Related issues have been fixed in LU-549.

Generated at Sat Feb 10 01:14:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.