[LU-1277] Initial IO is slow after mount on the client Created: 02/Apr/12  Updated: 20/Aug/12  Resolved: 20/Aug/12

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.7
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Shuichi Ihara (Inactive) Assignee: Johann Lombardi (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File messages.t2s007057    
Severity: 3
Rank (Obsolete): 6424

 Description   

Hi, we have an customer who have existing lustre filesystem (usage is about 60-70%). When they mount the lustre on the clients and start the IO the client, the initial IO performance is always slow.
When it writes the files and remove them... repeating many times, the system is going to be normal and IO performance is back.
This is odd behavior, but it always happens after the lustre starts.



 Comments   
Comment by Shuichi Ihara (Inactive) [ 02/Apr/12 ]

attached is syslog (we ran sysraq+t) on one of OSSs, we saw these call-trace on all of OSS, when this problem happens.

Comment by Johann Lombardi (Inactive) [ 02/Apr/12 ]

Hi Ihara,

Once performance are back to "normal" from a given client, if you then mount a new client, are performance bad again on this new client until you repeat the test several times?
I wonder if this bug is instead related to OST restart, like the problem we have with bitmap loading (as mentioned in LU-15).

If this is really related to the lustre client itself, then it might be due to grant. It will take several bulk writes for the client to own 32MB of grant space. To isolate the problem, could you please:

  • collect rpc_stats (on client) & brw_stats (on OSTs) for a good and bad run?
  • tell us if this problem is only observed with writes or also with reads?
Comment by Shuichi Ihara (Inactive) [ 02/Apr/12 ]

Johann,

we just tried to mount on a new client after back to the normal performance, but the performance was no problem on that client.

btw, we saw similar behavior that was reported on LU-15 - when it writes the files after all OSTs are mounted, there are many small read IOs although there are noshing read from user space.
This filesystem was formatted on 1.8.4, but already upgraded 1.8.7 few month ago.

Comment by Johann Lombardi (Inactive) [ 02/Apr/12 ]

If the problem only happens after OST restart, then i'm afraid that the only "solution" would be to write some tool/patch to pre-load the bitmaps in memory.

Now, if the problem can also happen while the OST wasn't restarted recently, it could mean that some bitmaps got evicted from memory and it might make sense to try to disable the OSS read cache feature (there are actually 2 parameters to disable) if not done already.

Comment by Shuichi Ihara (Inactive) [ 02/Apr/12 ]

So far, the problem only happens after mount the OSTs..
I thought LU-15 was fixed in 1.8.6, but we still have a potential to get this problem even on 1.8.7?

Comment by Oleg Drokin [ 02/Apr/12 ]

I think this is a longstanding issue related to bitmap loading and calculating of extent tables on ext4 mount.
I think people used debugfs right after start to preload the bitmaps and it helped.

Comment by Johann Lombardi (Inactive) [ 06/Aug/12 ]

Hi Ihara, are you ok to close this bug?

Comment by Shuichi Ihara (Inactive) [ 18/Aug/12 ]

yes, please. we will try dumpe2fs before start OSTs to load bitmap and extent table calculation. Thanks for advice!

Generated at Sat Feb 10 01:15:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.