Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17993

"ASSERTION( i == pv->ldp_count ) failed" is ll_direct_rw_pages

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      There's something landed recently (directio related I am sure) that has landed recently and causes regular though not very frequent failures in maloo.

       [16534.089981] Lustre: DEBUG MARKER: == sanity test 119h: basic tests of memory unaligned dio ========================================================== 11:35:07 (1719315307)
      [16535.909936] LustreError: 443178:0:(rw26.c:426:ll_direct_rw_pages()) ASSERTION( i == pv->ldp_count ) failed: 
      [16535.911077] LustreError: 443178:0:(rw26.c:426:ll_direct_rw_pages()) LBUG
      [16535.911822] CPU: 0 PID: 443178 Comm: multiop Kdump: loaded Tainted: G           OE    --------  ---  5.14.0-284.30.1.el9_2.x86_64 #1
      [16535.912992] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [16535.913597] Call Trace:
      [16535.913905]  <TASK>
      [16535.914173]  dump_stack_lvl+0x34/0x48
      [16535.914628]  lbug_with_loc.cold+0x5/0x43 [libcfs]
      [16535.915184]  ll_direct_rw_pages+0x46e/0x470 [lustre]
      [16535.915886]  ll_direct_IO_impl+0x462/0xc90 [lustre]
      [16535.916432]  ? bpf_lsm_inode_removexattr+0x10/0x10
      [16535.916970]  generic_file_direct_write+0x9c/0x1e0
      [16535.917519]  __generic_file_write_iter+0x9e/0x1a0
      [16535.918029]  vvp_io_write_start+0xb25/0xcb0 [lustre]
      [16535.918595]  cl_io_start+0x61/0x130 [obdclass]
      [16535.919327]  cl_io_loop+0x99/0x220 [obdclass]
      [16535.919846]  ll_file_io_generic+0x53d/0xef0 [lustre]
      [16535.920415]  do_file_write_iter+0x4e3/0x770 [lustre]
      [16535.920991]  ? __alloc_pages+0xe6/0x230
      [16535.921430]  new_sync_write+0xff/0x190
      [16535.921875]  vfs_write+0x1ef/0x280
      [16535.922270]  ksys_write+0x5f/0xe0
      [16535.922657]  do_syscall_64+0x5c/0x90
      [16535.923119]  ? handle_mm_fault+0xc5/0x2a0
      [16535.923582]  ? do_user_addr_fault+0x1dd/0x6b0
      [16535.924073]  ? exc_page_fault+0x62/0x150
      [16535.924504]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [16535.925059] RIP: 0033:0x7f415893eb97
      [16535.925477] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      [16535.927263] RSP: 002b:00007ffc09d90ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [16535.928037] RAX: ffffffffffffffda RBX: 0000000000406a48 RCX: 00007f415893eb97
      [16535.928773] RDX: 0000000000000fff RSI: 00007f41585cb010 RDI: 0000000000000003
      [16535.929501] RBP: 0000000000000fff R08: 0000000000000003 R09: 0000000000000000
      [16535.930232] R10: 00007f41588070e0 R11: 0000000000000246 R12: 0000000000000fff
      [16535.930969] R13: 0000000000000003 R14: 00000000004044fc R15: 0000000000000122
      [16535.931695]  </TASK>
      [16535.932007] Kernel panic - not syncing: LBUG
      [16535.932480] CPU: 0 PID: 443178 Comm: multiop Kdump: loaded Tainted: G           OE    --------  ---  5.14.0-284.30.1.el9_2.x86_64 #1
      [16535.933665] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [16535.934264] Call Trace:
      [16535.934571]  <TASK>
      [16535.934840]  dump_stack_lvl+0x34/0x48
      [16535.935257]  panic+0xf4/0x2c6
      [16535.935637]  lbug_with_loc.cold+0x1a/0x43 [libcfs]
      [16535.936168]  ll_direct_rw_pages+0x46e/0x470 [lustre]
      [16535.936750]  ll_direct_IO_impl+0x462/0xc90 [lustre]
      [16535.937312]  ? bpf_lsm_inode_removexattr+0x10/0x10
      [16535.937842]  generic_file_direct_write+0x9c/0x1e0
      [16535.938395]  __generic_file_write_iter+0x9e/0x1a0
      [16535.938918]  vvp_io_write_start+0xb25/0xcb0 [lustre]
      [16535.939479]  cl_io_start+0x61/0x130 [obdclass]
      [16535.940016]  cl_io_loop+0x99/0x220 [obdclass]
      [16535.940542]  ll_file_io_generic+0x53d/0xef0 [lustre]
      [16535.941098]  do_file_write_iter+0x4e3/0x770 [lustre]
      [16535.941657]  ? __alloc_pages+0xe6/0x230
      [16535.942081]  new_sync_write+0xff/0x190
      [16535.942501]  vfs_write+0x1ef/0x280
      [16535.942892]  ksys_write+0x5f/0xe0
      [16535.943279]  do_syscall_64+0x5c/0x90
      [16535.943693]  ? handle_mm_fault+0xc5/0x2a0
      [16535.944132]  ? do_user_addr_fault+0x1dd/0x6b0
      [16535.944617]  ? exc_page_fault+0x62/0x150
      [16535.945050]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [16535.945589] RIP: 0033:0x7f415893eb97
      [16535.945992] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      [16535.947776] RSP: 002b:00007ffc09d90ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [16535.948557] RAX: ffffffffffffffda RBX: 0000000000406a48 RCX: 00007f415893eb97
      [16535.949289] RDX: 0000000000000fff RSI: 00007f41585cb010 RDI: 0000000000000003
      [16535.950032] RBP: 0000000000000fff R08: 0000000000000003 R09: 0000000000000000
      [16535.950765] R10: 00007f41588070e0 R11: 0000000000000246 R12: 0000000000000fff
      [16535.951491] R13: 0000000000000003 R14: 00000000004044fc R15: 0000000000000122
      [16535.952228]  </TASK>

      First time it hit in a seemingly unrelated review on June 11th: https://testing.whamcloud.com/test_sets/69350cd5-31d1-4654-bb37-1986fda71422

      and then shortly after hit in master proper on June 25th: https://testing.whamcloud.com/test_sets/00f3ebed-3676-4495-9fd6-308afa7df77b

      another failure from that session: https://testing.whamcloud.com/test_sets/79ab5267-752e-41f6-a27a-940ee3406679

       

      Always hits in sanity test 119h

       

      Attachments

        Issue Links

          Activity

            [LU-17993] "ASSERTION( i == pv->ldp_count ) failed" is ll_direct_rw_pages
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56404/
            Subject: LU-17993 llite: add further check for user buffer
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3a18a7b8cd0070523ba8383e7205f062b5eaef4f

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56404/ Subject: LU-17993 llite: add further check for user buffer Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3a18a7b8cd0070523ba8383e7205f062b5eaef4f
            pjones Peter Jones added a comment -

            I believe that the test has been quietened but the issue still remains - Alex reported being able to hit it still once the test was turned back on

            pjones Peter Jones added a comment - I believe that the test has been quietened but the issue still remains - Alex reported being able to hit it still once the test was turned back on

            Looks like this problem is no longer seen in master with the last occurence in July 17th

            just hit with uptodate master:

            == sanity test 119h: basic tests of memory unaligned dio ========================================================== 14:45:52 (1727189152)
            [  206.046658] Lustre: DEBUG MARKER: == sanity test 119h: basic tests of memory unaligned dio ========================================================== 14:45:52 (1727189152)
            unaligned writes of blocksize: 1024
            unaligned writes of blocksize: 3072
            unaligned writes of blocksize: 4095
            [  206.130997] LustreError: 7828:0:(rw26.c:426:ll_direct_rw_pages()) ASSERTION( i == pv->ldp_count ) failed: 
            [  206.131100] LustreError: 7828:0:(rw26.c:426:ll_direct_rw_pages()) LBUG
            [  206.131140] CPU: 0 PID: 7828 Comm: multiop Tainted: G        W  O     --------- -  - 4.18.0 #42
            [  206.131192] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
            [  206.131242] Call Trace:
            [  206.131259]  dump_stack+0x6e/0xa0
            [  206.131290]  lbug_with_loc.cold.4+0x5/0x63 [libcfs]
            [  206.131333]  ll_direct_IO+0xfee/0x1150 [lustre]
            [  206.131390]  generic_file_direct_write+0x8c/0x160
            [  206.131430]  __generic_file_write_iter+0xb2/0x1c0
            [  206.131462]  vvp_io_write_start+0xb2e/0xc80 [lustre]
            [  206.131518]  cl_io_start+0x59/0x120 [obdclass]
            [  206.131586]  cl_io_loop+0x99/0x220 [obdclass]
            [  206.131644]  ll_file_io_generic+0x4f9/0xe00 [lustre]
            [  206.131696]  do_file_write_iter+0x6f8/0xb10 [lustre]
            [  206.131745]  ? do_raw_spin_unlock+0x44/0xc0
            [  206.131771]  new_sync_write+0xfa/0x130
            [  206.131797]  vfs_write+0xb9/0x1c0
            [  206.131826]  ksys_write+0x3d/0xa0
            [  206.131849]  do_syscall_64+0x4b/0x1b0
            [  206.131874]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
            
            bzzz Alex Zhuravlev added a comment - Looks like this problem is no longer seen in master with the last occurence in July 17th just hit with uptodate master: == sanity test 119h: basic tests of memory unaligned dio ========================================================== 14:45:52 (1727189152) [ 206.046658] Lustre: DEBUG MARKER: == sanity test 119h: basic tests of memory unaligned dio ========================================================== 14:45:52 (1727189152) unaligned writes of blocksize: 1024 unaligned writes of blocksize: 3072 unaligned writes of blocksize: 4095 [ 206.130997] LustreError: 7828:0:(rw26.c:426:ll_direct_rw_pages()) ASSERTION( i == pv->ldp_count ) failed: [ 206.131100] LustreError: 7828:0:(rw26.c:426:ll_direct_rw_pages()) LBUG [ 206.131140] CPU: 0 PID: 7828 Comm: multiop Tainted: G W O --------- - - 4.18.0 #42 [ 206.131192] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014 [ 206.131242] Call Trace: [ 206.131259] dump_stack+0x6e/0xa0 [ 206.131290] lbug_with_loc.cold.4+0x5/0x63 [libcfs] [ 206.131333] ll_direct_IO+0xfee/0x1150 [lustre] [ 206.131390] generic_file_direct_write+0x8c/0x160 [ 206.131430] __generic_file_write_iter+0xb2/0x1c0 [ 206.131462] vvp_io_write_start+0xb2e/0xc80 [lustre] [ 206.131518] cl_io_start+0x59/0x120 [obdclass] [ 206.131586] cl_io_loop+0x99/0x220 [obdclass] [ 206.131644] ll_file_io_generic+0x4f9/0xe00 [lustre] [ 206.131696] do_file_write_iter+0x6f8/0xb10 [lustre] [ 206.131745] ? do_raw_spin_unlock+0x44/0xc0 [ 206.131771] new_sync_write+0xfa/0x130 [ 206.131797] vfs_write+0xb9/0x1c0 [ 206.131826] ksys_write+0x3d/0xa0 [ 206.131849] do_syscall_64+0x4b/0x1b0 [ 206.131874] entry_SYSCALL_64_after_hwframe+0x49/0xbe
            green Oleg Drokin added a comment -

            Looks like this problem is no longer seen in master with the last occurence in July 17th

            green Oleg Drokin added a comment - Looks like this problem is no longer seen in master with the last occurence in July 17th

            "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56404
            Subject: LU-17993 llite: add further check for user buffer
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: b5e9ff3951ce5868e898530131ae481c222628af

            gerrit Gerrit Updater added a comment - "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56404 Subject: LU-17993 llite: add further check for user buffer Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b5e9ff3951ce5868e898530131ae481c222628af
            gerrit Gerrit Updater added a comment - - edited

            "Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56394
            Subject: LU-17993 llite: add further check for user buffer
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d949cec1e7044ad379a8aedadc023db736948f68

            gerrit Gerrit Updater added a comment - - edited "Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56394 Subject: LU-17993 llite: add further check for user buffer Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d949cec1e7044ad379a8aedadc023db736948f68

            People

              hongchao.zhang Hongchao Zhang
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: