Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16579

llite: Fix the wrong ending offset calculation

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0, Lustre 2.15.3
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      For a single-stripe file, after applied https://review.whamcloud.com/c/fs/lustre-release/+/49226
      Subject: LU-16338 readahead: clip readahead with kms,
      It cause a dead loop:
      https://testing.whamcloud.com/test_logs/8d73c4e7-0e6e-482c-b93c-bf7159706890/show_text

      00000080:00200000:1.0:1676614736.277040:0:21215:0:(rw.c:1981:ll_readpage()) pgno:260, cnt:1032192, pos:0
      00000080:00000001:1.0:1676614736.277041:0:21215:0:(vvp_io.c:1654:vvp_io_read_ahead()) Process entered
      00000080:00000001:1.0:1676614736.277041:0:21215:0:(vvp_io.c:1666:vvp_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
      00000008:00000001:1.0:1676614736.277042:0:21215:0:(osc_io.c:83:osc_io_read_ahead()) Process entered
      00000008:00000001:1.0:1676614736.277042:0:21215:0:(osc_lock.c:1281:osc_obj_dlmlock_at_pgoff()) Process entered
      00000008:00000001:1.0:1676614736.277043:0:21215:0:(osc_request.c:3137:osc_match_base()) Process entered
      00000008:00000001:1.0:1676614736.277043:0:21215:0:(osc_request.c:3172:osc_match_base()) Process leaving (rc=4 : 4 : 4)
      00000008:00000001:1.0:1676614736.277044:0:21215:0:(osc_lock.c:1315:osc_obj_dlmlock_at_pgoff()) Process leaving (rc=18446620395345229440 : -123678364322176 : ffff8f83e3073680)
      00000008:00000001:1.0:1676614736.277045:0:21215:0:(osc_io.c:112:osc_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
      00000080:00000001:1.0:1676614736.277045:0:21215:0:(rw.c:2012:ll_readpage()) Process leaving (rc=524289 : 524289 : 80001)
      00000080:00000001:1.0:1676614736.277046:0:21215:0:(rw.c:1873:ll_readpage()) Process entered
      00000080:00200000:1.0:1676614736.277046:0:21215:0:(rw.c:1981:ll_readpage()) pgno:260, cnt:1032192, pos:0
      00000080:00000001:1.0:1676614736.277047:0:21215:0:(vvp_io.c:1654:vvp_io_read_ahead()) Process entered
      00000080:00000001:1.0:1676614736.277047:0:21215:0:(vvp_io.c:1666:vvp_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
      00000008:00000001:1.0:1676614736.277048:0:21215:0:(osc_io.c:83:osc_io_read_ahead()) Process entered
      00000008:00000001:1.0:1676614736.277048:0:21215:0:(osc_lock.c:1281:osc_obj_dlmlock_at_pgoff()) Process entered
      00000008:00000001:1.0:1676614736.277049:0:21215:0:(osc_request.c:3137:osc_match_base()) Process entered
      00000008:00000001:1.0:1676614736.277049:0:21215:0:(osc_request.c:3172:osc_match_base()) Process leaving (rc=4 : 4 : 4)
      00000008:00000001:1.0:1676614736.277050:0:21215:0:(osc_lock.c:1315:osc_obj_dlmlock_at_pgoff()) Process leaving (rc=18446620395345229440 : -123678364322176 : ffff8f83e3073680)
      00000008:00000001:1.0:1676614736.277051:0:21215:0:(osc_io.c:112:osc_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
      00000080:00000001:1.0:1676614736.277051:0:21215:0:(rw.c:2012:ll_readpage()) Process leaving (rc=524289 : 524289 : 80001)
      

      The dead loop code lines:

      if (cl_offset(clob, vmpage->index) >= iter->count + iocb->ki_pos) {
                              result = cl_io_read_ahead(env, io, vmpage->index, &ra);
                              if (result < 0 || vmpage->index > ra.cra_end_idx) {
                                      cl_read_ahead_release(env, &ra);
                                      unlock_page(vmpage);
                                      RETURN(AOP_TRUNCATED_PAGE); ===>AOP_TRUNCATED_PAGE = 80001
                              }
                      }
      
      

      After investigated, I found the ending beyond offset calculation is wrong as each time finished the read of a page, it will advance the @iter->count (iter->count - read_bytes).
      The wrong ending offset results in the dead loop described above.

      Attachments

        Issue Links

          Activity

            [LU-16579] llite: Fix the wrong ending offset calculation
            qian_wc Qian Yingjin added a comment - - edited

            The deadlock problem in the previous comment was solved by LU-16579.

            qian_wc Qian Yingjin added a comment - - edited The deadlock problem in the previous comment was solved by LU-16579 .
            bobijam Zhenyu Xu made changes -
            Link New: This issue is related to DDN-5149 [ DDN-5149 ]
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.15.3 [ 15998 ]

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50278/
            Subject: LU-16579 llite: fix the wrong beyond read end calculation
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: 9c8a80bca738884e09affd66837b9e94508664d1

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50278/ Subject: LU-16579 llite: fix the wrong beyond read end calculation Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: 9c8a80bca738884e09affd66837b9e94508664d1
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50065/
            Subject: LU-16579 llite: fix the wrong beyond read end calculation
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ae356dc325877bd130ad94acc5f3610898de8a8a

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50065/ Subject: LU-16579 llite: fix the wrong beyond read end calculation Project: fs/lustre-release Branch: master Current Patch Set: Commit: ae356dc325877bd130ad94acc5f3610898de8a8a

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50278
            Subject: LU-16579 llite: fix the wrong beyond read end calculation
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 88e538a9c6f7444fde284960d56ed62d4f17cb3a

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50278 Subject: LU-16579 llite: fix the wrong beyond read end calculation Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 88e538a9c6f7444fde284960d56ed62d4f17cb3a

            Hi Patrick,

            They are all not applied to ES5.2.

            qian_wc Qian Yingjin added a comment - Hi Patrick, They are all not applied to ES5.2.
            qian_wc Qian Yingjin made changes -
            Description Original: For a single-stripe file, after applied https://review.whamcloud.com/c/fs/lustre-release/+/49226
            Subject: LU-16338 readahead: clip readahead with kms,
            It cause a dead loop:
            https://testing.whamcloud.com/test_logs/8d73c4e7-0e6e-482c-b93c-bf7159706890/show_text
            00000080:00200000:1.0:1676614736.277040:0:21215:0:(rw.c:1981:ll_readpage()) pgno:260, cnt:1032192, pos:0
            00000080:00000001:1.0:1676614736.277041:0:21215:0:(vvp_io.c:1654:vvp_io_read_ahead()) Process entered
            00000080:00000001:1.0:1676614736.277041:0:21215:0:(vvp_io.c:1666:vvp_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
            00000008:00000001:1.0:1676614736.277042:0:21215:0:(osc_io.c:83:osc_io_read_ahead()) Process entered
            00000008:00000001:1.0:1676614736.277042:0:21215:0:(osc_lock.c:1281:osc_obj_dlmlock_at_pgoff()) Process entered
            00000008:00000001:1.0:1676614736.277043:0:21215:0:(osc_request.c:3137:osc_match_base()) Process entered
            00000008:00000001:1.0:1676614736.277043:0:21215:0:(osc_request.c:3172:osc_match_base()) Process leaving (rc=4 : 4 : 4)
            00000008:00000001:1.0:1676614736.277044:0:21215:0:(osc_lock.c:1315:osc_obj_dlmlock_at_pgoff()) Process leaving (rc=18446620395345229440 : -123678364322176 : ffff8f83e3073680)
            00000008:00000001:1.0:1676614736.277045:0:21215:0:(osc_io.c:112:osc_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
            00000080:00000001:1.0:1676614736.277045:0:21215:0:(rw.c:2012:ll_readpage()) Process leaving (rc=524289 : 524289 : 80001)
            00000080:00000001:1.0:1676614736.277046:0:21215:0:(rw.c:1873:ll_readpage()) Process entered
            00000080:00200000:1.0:1676614736.277046:0:21215:0:(rw.c:1981:ll_readpage()) pgno:260, cnt:1032192, pos:0
            00000080:00000001:1.0:1676614736.277047:0:21215:0:(vvp_io.c:1654:vvp_io_read_ahead()) Process entered
            00000080:00000001:1.0:1676614736.277047:0:21215:0:(vvp_io.c:1666:vvp_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
            00000008:00000001:1.0:1676614736.277048:0:21215:0:(osc_io.c:83:osc_io_read_ahead()) Process entered
            00000008:00000001:1.0:1676614736.277048:0:21215:0:(osc_lock.c:1281:osc_obj_dlmlock_at_pgoff()) Process entered
            00000008:00000001:1.0:1676614736.277049:0:21215:0:(osc_request.c:3137:osc_match_base()) Process entered
            00000008:00000001:1.0:1676614736.277049:0:21215:0:(osc_request.c:3172:osc_match_base()) Process leaving (rc=4 : 4 : 4)
            00000008:00000001:1.0:1676614736.277050:0:21215:0:(osc_lock.c:1315:osc_obj_dlmlock_at_pgoff()) Process leaving (rc=18446620395345229440 : -123678364322176 : ffff8f83e3073680)
            00000008:00000001:1.0:1676614736.277051:0:21215:0:(osc_io.c:112:osc_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
            00000080:00000001:1.0:1676614736.277051:0:21215:0:(rw.c:2012:ll_readpage()) Process leaving (rc=524289 : 524289 : 80001)
            The dead loop code lines:
            if (cl_offset(clob, vmpage->index) >= iter->count + iocb->ki_pos) {
                                    result = cl_io_read_ahead(env, io, vmpage->index, &ra);
                                    if (result < 0 || vmpage->index > ra.cra_end_idx) {
                                            cl_read_ahead_release(env, &ra);
                                            unlock_page(vmpage);
                                            RETURN(AOP_TRUNCATED_PAGE); ===>AOP_TRUNCATED_PAGE = 80001
                                    }
                            }

            After investigated, I found the ending beyond offset calculation is wrong as each time finished the read of a page, it will advance the @iter->count (iter->count - read_bytes).
            The wrong ending offset results in the dead loop described above.
            New: For a single-stripe file, after applied https://review.whamcloud.com/c/fs/lustre-release/+/49226
            Subject: LU-16338 readahead: clip readahead with kms,
            It cause a dead loop:
            https://testing.whamcloud.com/test_logs/8d73c4e7-0e6e-482c-b93c-bf7159706890/show_text

            {code:java}
            00000080:00200000:1.0:1676614736.277040:0:21215:0:(rw.c:1981:ll_readpage()) pgno:260, cnt:1032192, pos:0
            00000080:00000001:1.0:1676614736.277041:0:21215:0:(vvp_io.c:1654:vvp_io_read_ahead()) Process entered
            00000080:00000001:1.0:1676614736.277041:0:21215:0:(vvp_io.c:1666:vvp_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
            00000008:00000001:1.0:1676614736.277042:0:21215:0:(osc_io.c:83:osc_io_read_ahead()) Process entered
            00000008:00000001:1.0:1676614736.277042:0:21215:0:(osc_lock.c:1281:osc_obj_dlmlock_at_pgoff()) Process entered
            00000008:00000001:1.0:1676614736.277043:0:21215:0:(osc_request.c:3137:osc_match_base()) Process entered
            00000008:00000001:1.0:1676614736.277043:0:21215:0:(osc_request.c:3172:osc_match_base()) Process leaving (rc=4 : 4 : 4)
            00000008:00000001:1.0:1676614736.277044:0:21215:0:(osc_lock.c:1315:osc_obj_dlmlock_at_pgoff()) Process leaving (rc=18446620395345229440 : -123678364322176 : ffff8f83e3073680)
            00000008:00000001:1.0:1676614736.277045:0:21215:0:(osc_io.c:112:osc_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
            00000080:00000001:1.0:1676614736.277045:0:21215:0:(rw.c:2012:ll_readpage()) Process leaving (rc=524289 : 524289 : 80001)
            00000080:00000001:1.0:1676614736.277046:0:21215:0:(rw.c:1873:ll_readpage()) Process entered
            00000080:00200000:1.0:1676614736.277046:0:21215:0:(rw.c:1981:ll_readpage()) pgno:260, cnt:1032192, pos:0
            00000080:00000001:1.0:1676614736.277047:0:21215:0:(vvp_io.c:1654:vvp_io_read_ahead()) Process entered
            00000080:00000001:1.0:1676614736.277047:0:21215:0:(vvp_io.c:1666:vvp_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
            00000008:00000001:1.0:1676614736.277048:0:21215:0:(osc_io.c:83:osc_io_read_ahead()) Process entered
            00000008:00000001:1.0:1676614736.277048:0:21215:0:(osc_lock.c:1281:osc_obj_dlmlock_at_pgoff()) Process entered
            00000008:00000001:1.0:1676614736.277049:0:21215:0:(osc_request.c:3137:osc_match_base()) Process entered
            00000008:00000001:1.0:1676614736.277049:0:21215:0:(osc_request.c:3172:osc_match_base()) Process leaving (rc=4 : 4 : 4)
            00000008:00000001:1.0:1676614736.277050:0:21215:0:(osc_lock.c:1315:osc_obj_dlmlock_at_pgoff()) Process leaving (rc=18446620395345229440 : -123678364322176 : ffff8f83e3073680)
            00000008:00000001:1.0:1676614736.277051:0:21215:0:(osc_io.c:112:osc_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
            00000080:00000001:1.0:1676614736.277051:0:21215:0:(rw.c:2012:ll_readpage()) Process leaving (rc=524289 : 524289 : 80001)
            {code}

            The dead loop code lines:

            {code:java}
            if (cl_offset(clob, vmpage->index) >= iter->count + iocb->ki_pos) {
                                    result = cl_io_read_ahead(env, io, vmpage->index, &ra);
                                    if (result < 0 || vmpage->index > ra.cra_end_idx) {
                                            cl_read_ahead_release(env, &ra);
                                            unlock_page(vmpage);
                                            RETURN(AOP_TRUNCATED_PAGE); ===>AOP_TRUNCATED_PAGE = 80001
                                    }
                            }

            {code}

            After investigated, I found the ending beyond offset calculation is wrong as each time finished the read of a page, it will advance the @iter->count (iter->count - read_bytes).
            The wrong ending offset results in the dead loop described above.

            People

              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: