Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17993

"ASSERTION( i == pv->ldp_count ) failed" is ll_direct_rw_pages

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      There's something landed recently (directio related I am sure) that has landed recently and causes regular though not very frequent failures in maloo.

       [16534.089981] Lustre: DEBUG MARKER: == sanity test 119h: basic tests of memory unaligned dio ========================================================== 11:35:07 (1719315307)
      [16535.909936] LustreError: 443178:0:(rw26.c:426:ll_direct_rw_pages()) ASSERTION( i == pv->ldp_count ) failed: 
      [16535.911077] LustreError: 443178:0:(rw26.c:426:ll_direct_rw_pages()) LBUG
      [16535.911822] CPU: 0 PID: 443178 Comm: multiop Kdump: loaded Tainted: G           OE    --------  ---  5.14.0-284.30.1.el9_2.x86_64 #1
      [16535.912992] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [16535.913597] Call Trace:
      [16535.913905]  <TASK>
      [16535.914173]  dump_stack_lvl+0x34/0x48
      [16535.914628]  lbug_with_loc.cold+0x5/0x43 [libcfs]
      [16535.915184]  ll_direct_rw_pages+0x46e/0x470 [lustre]
      [16535.915886]  ll_direct_IO_impl+0x462/0xc90 [lustre]
      [16535.916432]  ? bpf_lsm_inode_removexattr+0x10/0x10
      [16535.916970]  generic_file_direct_write+0x9c/0x1e0
      [16535.917519]  __generic_file_write_iter+0x9e/0x1a0
      [16535.918029]  vvp_io_write_start+0xb25/0xcb0 [lustre]
      [16535.918595]  cl_io_start+0x61/0x130 [obdclass]
      [16535.919327]  cl_io_loop+0x99/0x220 [obdclass]
      [16535.919846]  ll_file_io_generic+0x53d/0xef0 [lustre]
      [16535.920415]  do_file_write_iter+0x4e3/0x770 [lustre]
      [16535.920991]  ? __alloc_pages+0xe6/0x230
      [16535.921430]  new_sync_write+0xff/0x190
      [16535.921875]  vfs_write+0x1ef/0x280
      [16535.922270]  ksys_write+0x5f/0xe0
      [16535.922657]  do_syscall_64+0x5c/0x90
      [16535.923119]  ? handle_mm_fault+0xc5/0x2a0
      [16535.923582]  ? do_user_addr_fault+0x1dd/0x6b0
      [16535.924073]  ? exc_page_fault+0x62/0x150
      [16535.924504]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [16535.925059] RIP: 0033:0x7f415893eb97
      [16535.925477] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      [16535.927263] RSP: 002b:00007ffc09d90ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [16535.928037] RAX: ffffffffffffffda RBX: 0000000000406a48 RCX: 00007f415893eb97
      [16535.928773] RDX: 0000000000000fff RSI: 00007f41585cb010 RDI: 0000000000000003
      [16535.929501] RBP: 0000000000000fff R08: 0000000000000003 R09: 0000000000000000
      [16535.930232] R10: 00007f41588070e0 R11: 0000000000000246 R12: 0000000000000fff
      [16535.930969] R13: 0000000000000003 R14: 00000000004044fc R15: 0000000000000122
      [16535.931695]  </TASK>
      [16535.932007] Kernel panic - not syncing: LBUG
      [16535.932480] CPU: 0 PID: 443178 Comm: multiop Kdump: loaded Tainted: G           OE    --------  ---  5.14.0-284.30.1.el9_2.x86_64 #1
      [16535.933665] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [16535.934264] Call Trace:
      [16535.934571]  <TASK>
      [16535.934840]  dump_stack_lvl+0x34/0x48
      [16535.935257]  panic+0xf4/0x2c6
      [16535.935637]  lbug_with_loc.cold+0x1a/0x43 [libcfs]
      [16535.936168]  ll_direct_rw_pages+0x46e/0x470 [lustre]
      [16535.936750]  ll_direct_IO_impl+0x462/0xc90 [lustre]
      [16535.937312]  ? bpf_lsm_inode_removexattr+0x10/0x10
      [16535.937842]  generic_file_direct_write+0x9c/0x1e0
      [16535.938395]  __generic_file_write_iter+0x9e/0x1a0
      [16535.938918]  vvp_io_write_start+0xb25/0xcb0 [lustre]
      [16535.939479]  cl_io_start+0x61/0x130 [obdclass]
      [16535.940016]  cl_io_loop+0x99/0x220 [obdclass]
      [16535.940542]  ll_file_io_generic+0x53d/0xef0 [lustre]
      [16535.941098]  do_file_write_iter+0x4e3/0x770 [lustre]
      [16535.941657]  ? __alloc_pages+0xe6/0x230
      [16535.942081]  new_sync_write+0xff/0x190
      [16535.942501]  vfs_write+0x1ef/0x280
      [16535.942892]  ksys_write+0x5f/0xe0
      [16535.943279]  do_syscall_64+0x5c/0x90
      [16535.943693]  ? handle_mm_fault+0xc5/0x2a0
      [16535.944132]  ? do_user_addr_fault+0x1dd/0x6b0
      [16535.944617]  ? exc_page_fault+0x62/0x150
      [16535.945050]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [16535.945589] RIP: 0033:0x7f415893eb97
      [16535.945992] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      [16535.947776] RSP: 002b:00007ffc09d90ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [16535.948557] RAX: ffffffffffffffda RBX: 0000000000406a48 RCX: 00007f415893eb97
      [16535.949289] RDX: 0000000000000fff RSI: 00007f41585cb010 RDI: 0000000000000003
      [16535.950032] RBP: 0000000000000fff R08: 0000000000000003 R09: 0000000000000000
      [16535.950765] R10: 00007f41588070e0 R11: 0000000000000246 R12: 0000000000000fff
      [16535.951491] R13: 0000000000000003 R14: 00000000004044fc R15: 0000000000000122
      [16535.952228]  </TASK>

      First time it hit in a seemingly unrelated review on June 11th: https://testing.whamcloud.com/test_sets/69350cd5-31d1-4654-bb37-1986fda71422

      and then shortly after hit in master proper on June 25th: https://testing.whamcloud.com/test_sets/00f3ebed-3676-4495-9fd6-308afa7df77b

      another failure from that session: https://testing.whamcloud.com/test_sets/79ab5267-752e-41f6-a27a-940ee3406679

       

      Always hits in sanity test 119h

       

      Attachments

        Issue Links

          Activity

            [LU-17993] "ASSERTION( i == pv->ldp_count ) failed" is ll_direct_rw_pages
            flei Feng Lei made changes -
            Link New: This issue is related to LU-18647 [ LU-18647 ]
            jcasper James Casper (Inactive) made changes -
            Remote Link New: This issue links to "Page (Whamcloud Community Wiki)" [ 38649 ]
            pjones Peter Jones made changes -
            Labels Original: npy pivc
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56404/
            Subject: LU-17993 llite: add further check for user buffer
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3a18a7b8cd0070523ba8383e7205f062b5eaef4f

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56404/ Subject: LU-17993 llite: add further check for user buffer Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3a18a7b8cd0070523ba8383e7205f062b5eaef4f
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.16.0 [ 15190 ]
            pjones Peter Jones added a comment -

            I believe that the test has been quietened but the issue still remains - Alex reported being able to hit it still once the test was turned back on

            pjones Peter Jones added a comment - I believe that the test has been quietened but the issue still remains - Alex reported being able to hit it still once the test was turned back on

            Looks like this problem is no longer seen in master with the last occurence in July 17th

            just hit with uptodate master:

            == sanity test 119h: basic tests of memory unaligned dio ========================================================== 14:45:52 (1727189152)
            [  206.046658] Lustre: DEBUG MARKER: == sanity test 119h: basic tests of memory unaligned dio ========================================================== 14:45:52 (1727189152)
            unaligned writes of blocksize: 1024
            unaligned writes of blocksize: 3072
            unaligned writes of blocksize: 4095
            [  206.130997] LustreError: 7828:0:(rw26.c:426:ll_direct_rw_pages()) ASSERTION( i == pv->ldp_count ) failed: 
            [  206.131100] LustreError: 7828:0:(rw26.c:426:ll_direct_rw_pages()) LBUG
            [  206.131140] CPU: 0 PID: 7828 Comm: multiop Tainted: G        W  O     --------- -  - 4.18.0 #42
            [  206.131192] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
            [  206.131242] Call Trace:
            [  206.131259]  dump_stack+0x6e/0xa0
            [  206.131290]  lbug_with_loc.cold.4+0x5/0x63 [libcfs]
            [  206.131333]  ll_direct_IO+0xfee/0x1150 [lustre]
            [  206.131390]  generic_file_direct_write+0x8c/0x160
            [  206.131430]  __generic_file_write_iter+0xb2/0x1c0
            [  206.131462]  vvp_io_write_start+0xb2e/0xc80 [lustre]
            [  206.131518]  cl_io_start+0x59/0x120 [obdclass]
            [  206.131586]  cl_io_loop+0x99/0x220 [obdclass]
            [  206.131644]  ll_file_io_generic+0x4f9/0xe00 [lustre]
            [  206.131696]  do_file_write_iter+0x6f8/0xb10 [lustre]
            [  206.131745]  ? do_raw_spin_unlock+0x44/0xc0
            [  206.131771]  new_sync_write+0xfa/0x130
            [  206.131797]  vfs_write+0xb9/0x1c0
            [  206.131826]  ksys_write+0x3d/0xa0
            [  206.131849]  do_syscall_64+0x4b/0x1b0
            [  206.131874]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
            
            bzzz Alex Zhuravlev added a comment - Looks like this problem is no longer seen in master with the last occurence in July 17th just hit with uptodate master: == sanity test 119h: basic tests of memory unaligned dio ========================================================== 14:45:52 (1727189152) [ 206.046658] Lustre: DEBUG MARKER: == sanity test 119h: basic tests of memory unaligned dio ========================================================== 14:45:52 (1727189152) unaligned writes of blocksize: 1024 unaligned writes of blocksize: 3072 unaligned writes of blocksize: 4095 [ 206.130997] LustreError: 7828:0:(rw26.c:426:ll_direct_rw_pages()) ASSERTION( i == pv->ldp_count ) failed: [ 206.131100] LustreError: 7828:0:(rw26.c:426:ll_direct_rw_pages()) LBUG [ 206.131140] CPU: 0 PID: 7828 Comm: multiop Tainted: G W O --------- - - 4.18.0 #42 [ 206.131192] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014 [ 206.131242] Call Trace: [ 206.131259] dump_stack+0x6e/0xa0 [ 206.131290] lbug_with_loc.cold.4+0x5/0x63 [libcfs] [ 206.131333] ll_direct_IO+0xfee/0x1150 [lustre] [ 206.131390] generic_file_direct_write+0x8c/0x160 [ 206.131430] __generic_file_write_iter+0xb2/0x1c0 [ 206.131462] vvp_io_write_start+0xb2e/0xc80 [lustre] [ 206.131518] cl_io_start+0x59/0x120 [obdclass] [ 206.131586] cl_io_loop+0x99/0x220 [obdclass] [ 206.131644] ll_file_io_generic+0x4f9/0xe00 [lustre] [ 206.131696] do_file_write_iter+0x6f8/0xb10 [lustre] [ 206.131745] ? do_raw_spin_unlock+0x44/0xc0 [ 206.131771] new_sync_write+0xfa/0x130 [ 206.131797] vfs_write+0xb9/0x1c0 [ 206.131826] ksys_write+0x3d/0xa0 [ 206.131849] do_syscall_64+0x4b/0x1b0 [ 206.131874] entry_SYSCALL_64_after_hwframe+0x49/0xbe
            green Oleg Drokin added a comment -

            Looks like this problem is no longer seen in master with the last occurence in July 17th

            green Oleg Drokin added a comment - Looks like this problem is no longer seen in master with the last occurence in July 17th

            People

              hongchao.zhang Hongchao Zhang
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: