Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17993

"ASSERTION( i == pv->ldp_count ) failed" is ll_direct_rw_pages

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      There's something landed recently (directio related I am sure) that has landed recently and causes regular though not very frequent failures in maloo.

       [16534.089981] Lustre: DEBUG MARKER: == sanity test 119h: basic tests of memory unaligned dio ========================================================== 11:35:07 (1719315307)
      [16535.909936] LustreError: 443178:0:(rw26.c:426:ll_direct_rw_pages()) ASSERTION( i == pv->ldp_count ) failed: 
      [16535.911077] LustreError: 443178:0:(rw26.c:426:ll_direct_rw_pages()) LBUG
      [16535.911822] CPU: 0 PID: 443178 Comm: multiop Kdump: loaded Tainted: G           OE    --------  ---  5.14.0-284.30.1.el9_2.x86_64 #1
      [16535.912992] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [16535.913597] Call Trace:
      [16535.913905]  <TASK>
      [16535.914173]  dump_stack_lvl+0x34/0x48
      [16535.914628]  lbug_with_loc.cold+0x5/0x43 [libcfs]
      [16535.915184]  ll_direct_rw_pages+0x46e/0x470 [lustre]
      [16535.915886]  ll_direct_IO_impl+0x462/0xc90 [lustre]
      [16535.916432]  ? bpf_lsm_inode_removexattr+0x10/0x10
      [16535.916970]  generic_file_direct_write+0x9c/0x1e0
      [16535.917519]  __generic_file_write_iter+0x9e/0x1a0
      [16535.918029]  vvp_io_write_start+0xb25/0xcb0 [lustre]
      [16535.918595]  cl_io_start+0x61/0x130 [obdclass]
      [16535.919327]  cl_io_loop+0x99/0x220 [obdclass]
      [16535.919846]  ll_file_io_generic+0x53d/0xef0 [lustre]
      [16535.920415]  do_file_write_iter+0x4e3/0x770 [lustre]
      [16535.920991]  ? __alloc_pages+0xe6/0x230
      [16535.921430]  new_sync_write+0xff/0x190
      [16535.921875]  vfs_write+0x1ef/0x280
      [16535.922270]  ksys_write+0x5f/0xe0
      [16535.922657]  do_syscall_64+0x5c/0x90
      [16535.923119]  ? handle_mm_fault+0xc5/0x2a0
      [16535.923582]  ? do_user_addr_fault+0x1dd/0x6b0
      [16535.924073]  ? exc_page_fault+0x62/0x150
      [16535.924504]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [16535.925059] RIP: 0033:0x7f415893eb97
      [16535.925477] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      [16535.927263] RSP: 002b:00007ffc09d90ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [16535.928037] RAX: ffffffffffffffda RBX: 0000000000406a48 RCX: 00007f415893eb97
      [16535.928773] RDX: 0000000000000fff RSI: 00007f41585cb010 RDI: 0000000000000003
      [16535.929501] RBP: 0000000000000fff R08: 0000000000000003 R09: 0000000000000000
      [16535.930232] R10: 00007f41588070e0 R11: 0000000000000246 R12: 0000000000000fff
      [16535.930969] R13: 0000000000000003 R14: 00000000004044fc R15: 0000000000000122
      [16535.931695]  </TASK>
      [16535.932007] Kernel panic - not syncing: LBUG
      [16535.932480] CPU: 0 PID: 443178 Comm: multiop Kdump: loaded Tainted: G           OE    --------  ---  5.14.0-284.30.1.el9_2.x86_64 #1
      [16535.933665] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [16535.934264] Call Trace:
      [16535.934571]  <TASK>
      [16535.934840]  dump_stack_lvl+0x34/0x48
      [16535.935257]  panic+0xf4/0x2c6
      [16535.935637]  lbug_with_loc.cold+0x1a/0x43 [libcfs]
      [16535.936168]  ll_direct_rw_pages+0x46e/0x470 [lustre]
      [16535.936750]  ll_direct_IO_impl+0x462/0xc90 [lustre]
      [16535.937312]  ? bpf_lsm_inode_removexattr+0x10/0x10
      [16535.937842]  generic_file_direct_write+0x9c/0x1e0
      [16535.938395]  __generic_file_write_iter+0x9e/0x1a0
      [16535.938918]  vvp_io_write_start+0xb25/0xcb0 [lustre]
      [16535.939479]  cl_io_start+0x61/0x130 [obdclass]
      [16535.940016]  cl_io_loop+0x99/0x220 [obdclass]
      [16535.940542]  ll_file_io_generic+0x53d/0xef0 [lustre]
      [16535.941098]  do_file_write_iter+0x4e3/0x770 [lustre]
      [16535.941657]  ? __alloc_pages+0xe6/0x230
      [16535.942081]  new_sync_write+0xff/0x190
      [16535.942501]  vfs_write+0x1ef/0x280
      [16535.942892]  ksys_write+0x5f/0xe0
      [16535.943279]  do_syscall_64+0x5c/0x90
      [16535.943693]  ? handle_mm_fault+0xc5/0x2a0
      [16535.944132]  ? do_user_addr_fault+0x1dd/0x6b0
      [16535.944617]  ? exc_page_fault+0x62/0x150
      [16535.945050]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [16535.945589] RIP: 0033:0x7f415893eb97
      [16535.945992] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      [16535.947776] RSP: 002b:00007ffc09d90ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [16535.948557] RAX: ffffffffffffffda RBX: 0000000000406a48 RCX: 00007f415893eb97
      [16535.949289] RDX: 0000000000000fff RSI: 00007f41585cb010 RDI: 0000000000000003
      [16535.950032] RBP: 0000000000000fff R08: 0000000000000003 R09: 0000000000000000
      [16535.950765] R10: 00007f41588070e0 R11: 0000000000000246 R12: 0000000000000fff
      [16535.951491] R13: 0000000000000003 R14: 00000000004044fc R15: 0000000000000122
      [16535.952228]  </TASK>

      First time it hit in a seemingly unrelated review on June 11th: https://testing.whamcloud.com/test_sets/69350cd5-31d1-4654-bb37-1986fda71422

      and then shortly after hit in master proper on June 25th: https://testing.whamcloud.com/test_sets/00f3ebed-3676-4495-9fd6-308afa7df77b

      another failure from that session: https://testing.whamcloud.com/test_sets/79ab5267-752e-41f6-a27a-940ee3406679

       

      Always hits in sanity test 119h

       

      Attachments

        Issue Links

          Activity

            [LU-17993] "ASSERTION( i == pv->ldp_count ) failed" is ll_direct_rw_pages

            "Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56361
            Subject: LU-17993 llite: print more logs for different page count
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: cc07a35bed8931ece68152397ec2b3dd70885239

            gerrit Gerrit Updater added a comment - "Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56361 Subject: LU-17993 llite: print more logs for different page count Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: cc07a35bed8931ece68152397ec2b3dd70885239
            ys Yang Sheng added a comment -

            I don't ask for usage of assert. I mean ldp_count was counted by result+start. But 'i' hasn't consider the 'start' at all. So it must fail on assert.

            ys Yang Sheng added a comment - I don't ask for usage of assert. I mean ldp_count was counted by result+start. But 'i' hasn't consider the 'start' at all. So it must fail on assert.

            "So the problem is that, Why we assert the (i==ldp_count)? The kernel really return a 2 pages then we only count 1 page base on size."

            One comment here.  The assert is ensuring we counted correctly.  If it fails, we've done something wrong and need to figure that out.

            paf Patrick Farrell (Inactive) added a comment - "So the problem is that, Why we assert the (i==ldp_count)? The kernel really return a 2 pages then we only count 1 page base on size." One comment here.  The assert is ensuring we counted correctly.  If it fails, we've done something wrong and need to figure that out.

            Hm, I don't think so - I am following along.  It seems like there is some confusion about whether certain things are being considered unaligned consistently.

            It looks like Hongchao is making excellent progress in adding debug code.

            paf Patrick Farrell (Inactive) added a comment - Hm, I don't think so - I am following along.  It seems like there is some confusion about whether certain things are being considered unaligned consistently. It looks like Hongchao is making excellent progress in adding debug code.
            pjones Peter Jones added a comment -

            paf anything to add to this discussion?

            pjones Peter Jones added a comment - paf anything to add to this discussion?
            hongchao.zhang Hongchao Zhang added a comment - - edited

            https://testing.whamcloud.com/test_sets/3572f3b3-9bc7-444e-bcbc-3e6e9dfbdb13
            In RHEL9.3 (5.14.0-362.24.1.el9_3.x86_64), will the unaligned be marked as true if the user base address (iov->iov_base)
            is not aligned?

                    /* Check that all user buffers are aligned as well */
                    if (ll_iov_iter_alignment(iter) & ~PAGE_MASK)
                            unaligned = true;
            
            static unsigned long iov_iter_alignment_iovec(const struct iov_iter *i)
            {       
                    unsigned long res = 0;
                    size_t size = i->count;
                    size_t skip = i->iov_offset;
                    unsigned k;
                    
                    for (k = 0; k < i->nr_segs; k++, skip = 0) {
                            size_t len = i->iov[k].iov_len - skip;
                            if (len) {
                                    res |= (unsigned long)i->iov[k].iov_base + skip;
                                    if (len > size)
                                            len = size;
                                    res |= len;
                                    size -= len;
                                    if (!size)
                                            break;
                            }
                    }
                    return res;
            }
            

            the "ll_get_user_pages" will only be called in "aligned" path, no clear why the "ll_get_user_pages" is to be called here?

            hongchao.zhang Hongchao Zhang added a comment - - edited https://testing.whamcloud.com/test_sets/3572f3b3-9bc7-444e-bcbc-3e6e9dfbdb13 In RHEL9.3 (5.14.0-362.24.1.el9_3.x86_64), will the unaligned be marked as true if the user base address (iov->iov_base) is not aligned? /* Check that all user buffers are aligned as well */ if (ll_iov_iter_alignment(iter) & ~PAGE_MASK) unaligned = true; static unsigned long iov_iter_alignment_iovec(const struct iov_iter *i) { unsigned long res = 0; size_t size = i->count; size_t skip = i->iov_offset; unsigned k; for (k = 0; k < i->nr_segs; k++, skip = 0) { size_t len = i->iov[k].iov_len - skip; if (len) { res |= (unsigned long)i->iov[k].iov_base + skip; if (len > size) len = size; res |= len; size -= len; if (!size) break; } } return res; } the "ll_get_user_pages" will only be called in "aligned" path, no clear why the "ll_get_user_pages" is to be called here?
            bobijam Zhenyu Xu added a comment -

            alright, ll_get_user_pages() called iov_iter_revert()

            bobijam Zhenyu Xu added a comment - alright, ll_get_user_pages() called iov_iter_revert()
            ys Yang Sheng added a comment -

            The iov_iter_get_pages_alloc doesn't advance. but alloc2 do it. So we have already revert it in own define alloc2.

            ys Yang Sheng added a comment - The iov_iter_get_pages_alloc doesn't advance. but alloc2 do it. So we have already revert it in own define alloc2.
            bobijam Zhenyu Xu added a comment -

            I found that master and es60 has different operations in ll_get_user_pages(), in master after iov_iter_get_pages_alloc2() it advance the iter, while in es60, it does not. And in master iov_iter_get_pages_alloc2(), it advance the iter again. while in es60 iov_iter_get_pages_alloc2() is redefined as iov_iter_get_pages_alloc() and in kernel iov_iter_get_pages_alloc() i don't see it advance iter as well. And in both branches, ll_direct_IO_impl() would advance the iter after ll_get_user_pages().

            Is it the master branch advance the iter too much?

            bobijam Zhenyu Xu added a comment - I found that master and es60 has different operations in ll_get_user_pages(), in master after iov_iter_get_pages_alloc2() it advance the iter, while in es60, it does not. And in master iov_iter_get_pages_alloc2(), it advance the iter again. while in es60 iov_iter_get_pages_alloc2() is redefined as iov_iter_get_pages_alloc() and in kernel iov_iter_get_pages_alloc() i don't see it advance iter as well. And in both branches, ll_direct_IO_impl() would advance the iter after ll_get_user_pages(). Is it the master branch advance the iter too much?

            Yes, for aligned IO, there are two offset (file_offset and the starting address in iovec) used for calculating the page count

            there could be some issue in the above aligned IO check of the user buffers?

                    /* Check that all user buffers are aligned as well */
                    if (ll_iov_iter_alignment(iter) & ~PAGE_MASK)
                            unaligned = true;
            
            hongchao.zhang Hongchao Zhang added a comment - Yes, for aligned IO, there are two offset (file_offset and the starting address in iovec) used for calculating the page count there could be some issue in the above aligned IO check of the user buffers? /* Check that all user buffers are aligned as well */ if (ll_iov_iter_alignment(iter) & ~PAGE_MASK) unaligned = true;
            ys Yang Sheng added a comment -

            I think the key part is the user buffer address. All of crash cases for test_119h has a extra 0x10 on the last byte of user buffer address.  It caused the crash. It is why this bug cannot be reproduced stable.

            ys Yang Sheng added a comment - I think the key part is the user buffer address. All of crash cases for test_119h has a extra 0x10 on the last byte of user buffer address.  It caused the crash. It is why this bug cannot be reproduced stable.

            People

              hongchao.zhang Hongchao Zhang
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: