Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18102

sanity test_27J: FAIL: /mnt/lustre/d27J.sanity/f27J.sanity: read should fail

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0
    • None
    • Ubuntu 24.04 client
      SLES 15 SP6 client
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/fd1502a3-be70-40ce-b2d1-c2490a7980fd

      test_27J failed with the following error:

      lfs setstripe: setstripe error for '/mnt/lustre/d27J.sanity/f27J.sanity': stripe already set
      lfs setstripe: setstripe error for '/mnt/lustre/d27J.sanity/f27J.sanity2': stripe already set
       sanity test_27J: @@@@@@ FAIL: /mnt/lustre/d27J.sanity/f27J.sanity: read should fail 
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master/4558 - 6.8.0-35-generic
      servers: https://build.whamcloud.com/job/lustre-master/4558 - 5.14.0-427.24.1_lustre.el9.x86_64

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_27J - /mnt/lustre/d27J.sanity/f27J.sanity: read should fail

      Attachments

        Issue Links

          Activity

            [LU-18102] sanity test_27J: FAIL: /mnt/lustre/d27J.sanity/f27J.sanity: read should fail
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55977/
            Subject: LU-18102 tests: skip read/write in sanity/27J for some kernels
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: adb3d20f3c8d3a720b293d3d5977a1849a9e0cb7

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55977/ Subject: LU-18102 tests: skip read/write in sanity/27J for some kernels Project: fs/lustre-release Branch: master Current Patch Set: Commit: adb3d20f3c8d3a720b293d3d5977a1849a9e0cb7
            qian_wc Qian Yingjin added a comment -

            It will not affect PCC-RO as the file size will not be changed for readonly PCC.

             

            For PCC-RW, the file data will save into the copy on HSM backend with HSM released state.

            In this case, a client other than the exclusive caching the file with PCC-RO mode may see the 0 byte for the Lustre file.

            However, it will detect that file is in HSM released state in the higher call patch (generic_file_read|write), it will restore the file and then restart the I/O.

             

            So it will not affect PCC-RW also, I think.

            qian_wc Qian Yingjin added a comment - It will not affect PCC-RO as the file size will not be changed for readonly PCC.   For PCC-RW, the file data will save into the copy on HSM backend with HSM released state. In this case, a client other than the exclusive caching the file with PCC-RO mode may see the 0 byte for the Lustre file. However, it will detect that file is in HSM released state in the higher call patch (generic_file_read|write), it will restore the file and then restart the I/O.   So it will not affect PCC-RW also, I think.

            Hello Andreas,
            It has been quite a long time that I have been involved with this, but I seem to remember that the DAOS binding thru foreign layout is done at open time not read.
            I will try to fully confirm if I can get some spare time to do some testing.

            bfaccini-nvda Bruno Faccini added a comment - Hello Andreas, It has been quite a long time that I have been involved with this, but I seem to remember that the DAOS binding thru foreign layout is done at open time not read. I will try to fully confirm if I can get some spare time to do some testing.

            bfaccini-nvda, any idea on whether/how this would affect DAOS usage of Foreign file layouts, or are they trapped before the read gets to the Lustre file?

            qian_wc, will this affect PCC usage of Foreign file layouts?

            adilger Andreas Dilger added a comment - bfaccini-nvda , any idea on whether/how this would affect DAOS usage of Foreign file layouts, or are they trapped before the read gets to the Lustre file? qian_wc , will this affect PCC usage of Foreign file layouts?
            bobijam Zhenyu Xu added a comment -

            It seems like a bug in Lustre if the file size is 0 but there is data in the file?

            I guess not, a file created with foreign layout in this case is sized 0 and contains no data, and it would be reported ENODATA in Lustre I/O path, while kernel bypasses the readpage() so the test script would report R/W success while it expects them to fail the I/O.

            bobijam Zhenyu Xu added a comment - It seems like a bug in Lustre if the file size is 0 but there is data in the file? I guess not, a file created with foreign layout in this case is sized 0 and contains no data, and it would be reported ENODATA in Lustre I/O path, while kernel bypasses the readpage() so the test script would report R/W success while it expects them to fail the I/O.

            The kernel introduce this issue in fiemap_read() to skip calling readpage() when the file size is 0

            It seems like a bug in Lustre if the file size is 0 but there is data in the file?

            adilger Andreas Dilger added a comment - The kernel introduce this issue in fiemap_read() to skip calling readpage() when the file size is 0 It seems like a bug in Lustre if the file size is 0 but there is data in the file?

            "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55977
            Subject: LU-18012 tests: skip read/write in sanity/27J for some kernels
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 496e8bd36cbf26b1c694d32c719c7636f1975693

            adilger Andreas Dilger added a comment - "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55977 Subject: LU-18012 tests: skip read/write in sanity/27J for some kernels Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 496e8bd36cbf26b1c694d32c719c7636f1975693
            lixi_wc Li Xi added a comment -

            Bobijam thinks the kernel version check in sanity:test_27J needs to be updated due to his analysis of the kernel codes. A patch is coming.

            lixi_wc Li Xi added a comment - Bobijam thinks the kernel version check in sanity:test_27J needs to be updated due to his analysis of the kernel codes. A patch is coming.
            bobijam Zhenyu Xu added a comment -

            The kernel introduce this issue in fiemap_read() to skip calling readpage() when the file size is 0

            commit 8c8387ee3f55a38c3388882f1dd72dc526965211 (v5.15-41-g8c8387ee3f55)
                mm: stop filemap_read() from grabbing a superfluous page
                
                Under some circumstances, filemap_read() will allocate sufficient pages
                to read to the end of the file, call readahead/readpages on them and
                copy the data over - and then it will allocate another page at the EOF
                and call readpage on that and then ignore it.  This is unnecessary and a
                waste of time and resources.
                
                filemap_read() *does* check for this, but only after it has already done
                the allocation and I/O.  Fix this by checking before calling
                filemap_get_pages() also.
            
            diff --git a/mm/filemap.c b/mm/filemap.c
            index dae481293b5d..e50be519f6a4 100644
            --- a/mm/filemap.c
            +++ b/mm/filemap.c
            @@ -2625,6 +2625,9 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
                            if ((iocb->ki_flags & IOCB_WAITQ) && already_read)
                                    iocb->ki_flags |= IOCB_NOWAIT;
             
            +               if (unlikely(iocb->ki_pos >= i_size_read(inode)))
            +                       break;
            +
                            error = filemap_get_pages(iocb, iter, &pvec);
                            if (error < 0)
                                    break;
            

            While Qingjin's kernel fix commit 5956592ce337330cdff0399a6f8b6a5aea397a8e(v6.2-rc4-61-g5956592ce337) only changes filemap_get_pages(), it cannot affect the skipping, and IMO sanity/27J will still fail with "read should fail".

             commit 5956592ce337330cdff0399a6f8b6a5aea397a8e(v6.2-rc4-61-g5956592ce337)
                mm/filemap: fix page end in filemap_get_read_batch
            
                I was running traces of the read code against an RAID storage system to
                understand why read requests were being misaligned against the underlying
                RAID strips.  I found that the page end offset calculation in
                filemap_get_read_batch() was off by one.
                
                When a read is submitted with end offset 1048575, then it calculates the
                end page for read of 256 when it should be 255.  "last_index" is the index
                of the page beyond the end of the read and it should be skipped when get a
                batch of pages for read in @filemap_get_read_batch().
                
                The below simple patch fixes the problem.  This code was introduced in
                kernel 5.12.
            
                Fixes: cbd59c48ae2b ("mm/filemap: use head pages in generic_file_buffered_read")
            
            diff --git a/mm/filemap.c b/mm/filemap.c
            index c4d4ace9cc70..0e20a8d6dd93 100644
            --- a/mm/filemap.c
            +++ b/mm/filemap.c
            @@ -2588,18 +2588,19 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter,
                    struct folio *folio;
                    int err = 0;
             
            +       /* "last_index" is the index of the page beyond the end of the read */
                    last_index = DIV_ROUND_UP(iocb->ki_pos + iter->count, PAGE_SIZE);
             retry:
                    if (fatal_signal_pending(current))
                            return -EINTR;
             
            -       filemap_get_read_batch(mapping, index, last_index, fbatch);
            +       filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
                    if (!folio_batch_count(fbatch)) {
                            if (iocb->ki_flags & IOCB_NOIO)
                                    return -EAGAIN;
                            page_cache_sync_readahead(mapping, ra, filp, index,
                                            last_index - index);
            -               filemap_get_read_batch(mapping, index, last_index, fbatch);
            +               filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
                    }
                    if (!folio_batch_count(fbatch)) {
                            if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
            
            bobijam Zhenyu Xu added a comment - The kernel introduce this issue in fiemap_read() to skip calling readpage() when the file size is 0 commit 8c8387ee3f55a38c3388882f1dd72dc526965211 (v5.15-41-g8c8387ee3f55) mm: stop filemap_read() from grabbing a superfluous page Under some circumstances, filemap_read() will allocate sufficient pages to read to the end of the file, call readahead/readpages on them and copy the data over - and then it will allocate another page at the EOF and call readpage on that and then ignore it. This is unnecessary and a waste of time and resources. filemap_read() *does* check for this , but only after it has already done the allocation and I/O. Fix this by checking before calling filemap_get_pages() also. diff --git a/mm/filemap.c b/mm/filemap.c index dae481293b5d..e50be519f6a4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2625,6 +2625,9 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter, if ((iocb->ki_flags & IOCB_WAITQ) && already_read) iocb->ki_flags |= IOCB_NOWAIT; + if (unlikely(iocb->ki_pos >= i_size_read(inode))) + break ; + error = filemap_get_pages(iocb, iter, &pvec); if (error < 0) break ; While Qingjin's kernel fix commit 5956592ce337330cdff0399a6f8b6a5aea397a8e(v6.2-rc4-61-g5956592ce337) only changes filemap_get_pages(), it cannot affect the skipping, and IMO sanity/27J will still fail with "read should fail". commit 5956592ce337330cdff0399a6f8b6a5aea397a8e(v6.2-rc4-61-g5956592ce337) mm/filemap: fix page end in filemap_get_read_batch I was running traces of the read code against an RAID storage system to understand why read requests were being misaligned against the underlying RAID strips. I found that the page end offset calculation in filemap_get_read_batch() was off by one. When a read is submitted with end offset 1048575, then it calculates the end page for read of 256 when it should be 255. "last_index" is the index of the page beyond the end of the read and it should be skipped when get a batch of pages for read in @filemap_get_read_batch(). The below simple patch fixes the problem. This code was introduced in kernel 5.12. Fixes: cbd59c48ae2b ( "mm/filemap: use head pages in generic_file_buffered_read" ) diff --git a/mm/filemap.c b/mm/filemap.c index c4d4ace9cc70..0e20a8d6dd93 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2588,18 +2588,19 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter, struct folio *folio; int err = 0; + /* "last_index" is the index of the page beyond the end of the read */ last_index = DIV_ROUND_UP(iocb->ki_pos + iter->count, PAGE_SIZE); retry: if (fatal_signal_pending(current)) return -EINTR; - filemap_get_read_batch(mapping, index, last_index, fbatch); + filemap_get_read_batch(mapping, index, last_index - 1, fbatch); if (!folio_batch_count(fbatch)) { if (iocb->ki_flags & IOCB_NOIO) return -EAGAIN; page_cache_sync_readahead(mapping, ra, filp, index, last_index - index); - filemap_get_read_batch(mapping, index, last_index, fbatch); + filemap_get_read_batch(mapping, index, last_index - 1, fbatch); } if (!folio_batch_count(fbatch)) { if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))

            People

              bobijam Zhenyu Xu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: