Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17755

sanity-compr test_101i: FAIL: expected 5 misses but got 6

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run:
      https://testing.whamcloud.com/test_sets/4380b7cd-f886-4477-99d0-e68484661e31

      test_101i failed with the following error:

      == sanity test 101i: allow current readahead to exceed reservation ====== 06:24:54 (1712039094)
      10+0 records in
      10+0 records out
      10485760 bytes (10 MB, 10 MiB) copied, 0.228838 s, 45.8 MB/s
      llite.lustre-ffff8896320bb000.max_read_ahead_per_file_mb=1
      Reset readahead stats
      llite.lustre-ffff8896320bb000.read_ahead_stats=0
      5+0 records in
      5+0 records out
      10485760 bytes (10 MB, 10 MiB) copied, 0.3471 s, 30.2 MB/s
      llite.lustre-ffff8896320bb000.read_ahead_stats=
      snapshot_time             1712039095.945972104 secs.nsecs
      start_time                1712039095.591196979 secs.nsecs
      elapsed_time              0.354775125 secs.nsecs
      hits                      2554 samples [pages]
      misses                    6 samples [pages]
      zero_size_window          2554 samples [pages]
      failed_to_fast_read       6 samples [pages]
      readahead_pages           6 samples [pages] 15 511 2554
       sanity test_101i: @@@@@@ FAIL: expected misses 5 but got 6 
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/103767 - 4.18.0-513.18.1.el8_9.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/103767 - 4.18.0-513.18.1.el8_lustre.x86_64

      This is failing with sanity-compr running with a PFL file layout compr_STRIPEPARAMS="-E 64k -c 1 -E eof" applied to the whole filesystem.

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-compr test_101i - expected misses 5 but got 6

      Attachments

        Issue Links

          Activity

            [LU-17755] sanity-compr test_101i: FAIL: expected 5 misses but got 6

            Yup, that's correct - whole file readahead can't do any better.  Since readahead doesn't request locks.

            FWIW, Lustre keeps trying to do whole file readahead, so it will suck all of each stripe, etc, until it hits the next one.  But it's not ideal!

            So, readahead should request locks.  (And perhaps should've been implemented through the regular IO path rather than a separate one!  But it wasn't, and may have been done that way with good reason - I'm not sure.  And it would be an enormous lift to change it now.)

            paf Patrick Farrell (Inactive) added a comment - Yup, that's correct - whole file readahead can't do any better.  Since readahead doesn't request locks. FWIW, Lustre keeps trying to do whole file readahead, so it will suck all of each stripe, etc, until it hits the next one.  But it's not ideal! So, readahead should request locks.  (And perhaps should've been implemented through the regular IO path rather than a separate one!  But it wasn't, and may have been done that way with good reason - I'm not sure.  And it would be an enormous lift to change it now.)
            sarah Sarah Liu added a comment -

            Thanks, I will add comment link with the 2 tickets

            sarah Sarah Liu added a comment - Thanks, I will add comment link with the 2 tickets

            I found it failed when stripe count is greater than 1, either with PFL layout or just plain layout with -c2

            I would hope that the whole-file readahead would work properly when there are multiple stripes/components on a file (typical stripe_size=1MB, max_readahead_whole=2MB), but I guess if the readahead stops at the stripe boundary this would be consistent with the previous issue?

            For now it makes sense to apply a 1-stripe layout for the test, but maybe add a comment to link it to this ticket and LU-15155 so we know it needs to be fixed.

            adilger Andreas Dilger added a comment - I found it failed when stripe count is greater than 1, either with PFL layout or just plain layout with -c2 I would hope that the whole-file readahead would work properly when there are multiple stripes/components on a file (typical stripe_size=1MB, max_readahead_whole=2MB), but I guess if the readahead stops at the stripe boundary this would be consistent with the previous issue? For now it makes sense to apply a 1-stripe layout for the test, but maybe add a comment to link it to this ticket and LU-15155 so we know it needs to be fixed.
            sarah Sarah Liu added a comment -

            Is the failure of test_248c in the below link the same as 101i? I found it failed when stripe count is greater than 1, either with PFL layout or just plain layout with -c2

            https://testing.whamcloud.com/test_sets/4e99f18b-e1bf-401c-95bc-ea856029e33f

            if 248c failed as the same reason, I will also add "setstripe -c1" to the file to make it pass

            sarah Sarah Liu added a comment - Is the failure of test_248c in the below link the same as 101i? I found it failed when stripe count is greater than 1, either with PFL layout or just plain layout with -c2 https://testing.whamcloud.com/test_sets/4e99f18b-e1bf-401c-95bc-ea856029e33f if 248c failed as the same reason, I will also add "setstripe -c1" to the file to make it pass

            This can be closed after the above patch lands, otherwise we may still see this failure during other testing.

            adilger Andreas Dilger added a comment - This can be closed after the above patch lands, otherwise we may still see this failure during other testing.

            OK, thanks. I thought it was something like that, but wanted to make sure we weren't hiding a real issue. The patch https://review.whamcloud.com/54797 "LU-16904 tests: Fix sanity test 34h, 184d and 101i when PFL layout is used" will just change this subtest to use a 1-stripe layout so that it works as it always had, and it sounds like you will already be adding stripe-boundary readahead checking in other patches.

            adilger Andreas Dilger added a comment - OK, thanks. I thought it was something like that, but wanted to make sure we weren't hiding a real issue. The patch https://review.whamcloud.com/54797 " LU-16904 tests: Fix sanity test 34h, 184d and 101i when PFL layout is used " will just change this subtest to use a 1-stripe layout so that it works as it always had, and it sounds like you will already be adding stripe-boundary readahead checking in other patches.

            So you're in exactly the right area and have the right things in mind.  The exact problem is a bit different, and it has implications so I'll just explain it...

            The problem with cross-stripe readahead isn't that it isn't tried, it's that readahead does not request LDLM locks.  So when readahead is crossing to a new stripe, if there's no LDLM lock there, readahead will give up.  This does, eventually, result in a miss when the actual reads reach that stripe, this lock issue happens happens the first time you hit a stripe, so it's a thing at component boundaries too.  And yes it blows the readahead state back to zero, but our readahead is pretty aggressive when starting from a clean state so the cost of the state loss is small.  (basically the readahead window goes from 0 to (I think) RPC size to even larger... so it doesn't start small and grow, but it means we generate full size RPCs from the first readahead, which is arguably better anyway - A little wasted read in some cases in exchange for large IO for all reads after the first).

            I haven't checked 101i specifically here, but what you're describing is a perfect match for the cross-stripe readahead issue with not requesting locks.

            So I do have a patch to make readahead request locks, but it's bound up with a bunch of other stuff - the series I have to improve readahead testing, etc.
            https://review.whamcloud.com/c/fs/lustre-release/+/45317
            Doing this right requires some changes in how clio/osc uses LDLM flags (it doesn't actually require changing the meaning of any LDLM flags, but it does require changing exactly how OSC requests them and rename one... the point is it doesn't change the meaning and shouldn't have client/server interop implications):
            https://review.whamcloud.com/c/fs/lustre-release/+/45449/17

            But the point is this issue is a bit of a pain to solve and doesn't have huge performance implications.  It reduces the performance of reading the first few MiB of a striped file but no one notices that, I think...  It's probably worth fixing only if we do the other stuff in the series - currently, we have a performant but funky readahead implementation that takes extra misses for a few different kind of odd reasons.  But it's essentially performant in all the common cases.  That series tries to get us to a 'clean' readahead implementation that takes the minimum number of misses and is practical to test tightly.  It's a lot of work though.

            paf Patrick Farrell (Inactive) added a comment - - edited So you're in exactly the right area and have the right things in mind.  The exact problem is a bit different, and it has implications so I'll just explain it... The problem with cross-stripe readahead isn't that it isn't tried, it's that readahead does not request LDLM locks.  So when readahead is crossing to a new stripe, if there's no LDLM lock there, readahead will give up.  This does, eventually, result in a miss when the actual reads reach that stripe, this lock issue happens happens the first time you hit a stripe, so it's a thing at component boundaries too.  And yes it blows the readahead state back to zero, but our readahead is pretty aggressive when starting from a clean state so the cost of the state loss is small.  (basically the readahead window goes from 0 to (I think) RPC size to even larger... so it doesn't start small and grow, but it means we generate full size RPCs from the first readahead, which is arguably better anyway - A little wasted read in some cases in exchange for large IO for all reads after the first). I haven't checked 101i specifically here, but what you're describing is a perfect match for the cross-stripe readahead issue with not requesting locks. So I do have a patch to make readahead request locks, but it's bound up with a bunch of other stuff - the series I have to improve readahead testing, etc. https://review.whamcloud.com/c/fs/lustre-release/+/45317 Doing this right requires some changes in how clio/osc uses LDLM flags (it doesn't actually require changing the meaning of any LDLM flags, but it does require changing exactly how OSC requests them and rename one... the point is it doesn't change the meaning and shouldn't have client/server interop implications): https://review.whamcloud.com/c/fs/lustre-release/+/45449/17 But the point is this issue is a bit of a pain to solve and doesn't have huge performance implications.  It reduces the performance of reading the first few MiB of a striped file but no one notices that, I think...  It's probably worth fixing only if we do the other stuff in the series - currently, we have a performant but funky readahead implementation that takes extra misses for a few different kind of odd reasons.  But it's essentially performant in all the common cases.  That series tries to get us to a 'clean' readahead implementation that takes the minimum number of misses and is practical to test tightly.  It's a lot of work though.

            paf, as you know, sarah is running testing toward allowing all of the tests to run on a filesystem with default PFL layouts (so that we might be able to enable this by mkfs.lustre and save every new user some grief).

            In this particular case, it looks like sanity.sh test_101i is failing because the 64KB first component is causing an extra readahead miss. The expedient way to fix this would be to use "$LFS setstripe -c 1 $DIR/$tfile" before "dd" to keep the testing "status quo" of expecting to run on a 1-stripe file. However, I'm wondering if there is potentially a deeper issue with the readahead code that the llite level readahead is somehow confused by the presence of the short stripe at the start (which may happen with DoM, though I haven't opened that can of worms yet)? My preference, if possible, would be to make the test work properly with the PFL layout so that we don't silently introduce some readahead failure that is never seen in testing because we never test with PFL, or only test with reads exactly aligned with stripe bounaries (as is likely with developer-written tests).

            I would think a linear 2MB read of a 10MB file should need 5 read() calls, and the fact that this is crossing a stripe boundary shouldn't be considered a "miss", but maybe I'm reading into this too much? I think that currently we don't do readahead across stripe boundaries, and IIRC you have a patch to change that, but before this failure is swept under the rug I just wanted to make sure this wasn't some deeper issue with the readahead code (e.g. readahead considers any stripe boundary crossing as a "miss" and resets the readahead window as a result).

            adilger Andreas Dilger added a comment - paf , as you know, sarah is running testing toward allowing all of the tests to run on a filesystem with default PFL layouts (so that we might be able to enable this by mkfs.lustre and save every new user some grief). In this particular case, it looks like sanity.sh test_101i is failing because the 64KB first component is causing an extra readahead miss. The expedient way to fix this would be to use " $LFS setstripe -c 1 $DIR/$tfile " before " dd " to keep the testing "status quo" of expecting to run on a 1-stripe file. However, I'm wondering if there is potentially a deeper issue with the readahead code that the llite level readahead is somehow confused by the presence of the short stripe at the start (which may happen with DoM, though I haven't opened that can of worms yet)? My preference, if possible, would be to make the test work properly with the PFL layout so that we don't silently introduce some readahead failure that is never seen in testing because we never test with PFL, or only test with reads exactly aligned with stripe bounaries (as is likely with developer-written tests). I would think a linear 2MB read of a 10MB file should need 5 read() calls, and the fact that this is crossing a stripe boundary shouldn't be considered a "miss", but maybe I'm reading into this too much? I think that currently we don't do readahead across stripe boundaries, and IIRC you have a patch to change that, but before this failure is swept under the rug I just wanted to make sure this wasn't some deeper issue with the readahead code (e.g. readahead considers any stripe boundary crossing as a "miss" and resets the readahead window as a result).

            People

              core-lustre-triage Core Lustre Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: