Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17993

"ASSERTION( i == pv->ldp_count ) failed" is ll_direct_rw_pages

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      There's something landed recently (directio related I am sure) that has landed recently and causes regular though not very frequent failures in maloo.

       [16534.089981] Lustre: DEBUG MARKER: == sanity test 119h: basic tests of memory unaligned dio ========================================================== 11:35:07 (1719315307)
      [16535.909936] LustreError: 443178:0:(rw26.c:426:ll_direct_rw_pages()) ASSERTION( i == pv->ldp_count ) failed: 
      [16535.911077] LustreError: 443178:0:(rw26.c:426:ll_direct_rw_pages()) LBUG
      [16535.911822] CPU: 0 PID: 443178 Comm: multiop Kdump: loaded Tainted: G           OE    --------  ---  5.14.0-284.30.1.el9_2.x86_64 #1
      [16535.912992] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [16535.913597] Call Trace:
      [16535.913905]  <TASK>
      [16535.914173]  dump_stack_lvl+0x34/0x48
      [16535.914628]  lbug_with_loc.cold+0x5/0x43 [libcfs]
      [16535.915184]  ll_direct_rw_pages+0x46e/0x470 [lustre]
      [16535.915886]  ll_direct_IO_impl+0x462/0xc90 [lustre]
      [16535.916432]  ? bpf_lsm_inode_removexattr+0x10/0x10
      [16535.916970]  generic_file_direct_write+0x9c/0x1e0
      [16535.917519]  __generic_file_write_iter+0x9e/0x1a0
      [16535.918029]  vvp_io_write_start+0xb25/0xcb0 [lustre]
      [16535.918595]  cl_io_start+0x61/0x130 [obdclass]
      [16535.919327]  cl_io_loop+0x99/0x220 [obdclass]
      [16535.919846]  ll_file_io_generic+0x53d/0xef0 [lustre]
      [16535.920415]  do_file_write_iter+0x4e3/0x770 [lustre]
      [16535.920991]  ? __alloc_pages+0xe6/0x230
      [16535.921430]  new_sync_write+0xff/0x190
      [16535.921875]  vfs_write+0x1ef/0x280
      [16535.922270]  ksys_write+0x5f/0xe0
      [16535.922657]  do_syscall_64+0x5c/0x90
      [16535.923119]  ? handle_mm_fault+0xc5/0x2a0
      [16535.923582]  ? do_user_addr_fault+0x1dd/0x6b0
      [16535.924073]  ? exc_page_fault+0x62/0x150
      [16535.924504]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [16535.925059] RIP: 0033:0x7f415893eb97
      [16535.925477] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      [16535.927263] RSP: 002b:00007ffc09d90ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [16535.928037] RAX: ffffffffffffffda RBX: 0000000000406a48 RCX: 00007f415893eb97
      [16535.928773] RDX: 0000000000000fff RSI: 00007f41585cb010 RDI: 0000000000000003
      [16535.929501] RBP: 0000000000000fff R08: 0000000000000003 R09: 0000000000000000
      [16535.930232] R10: 00007f41588070e0 R11: 0000000000000246 R12: 0000000000000fff
      [16535.930969] R13: 0000000000000003 R14: 00000000004044fc R15: 0000000000000122
      [16535.931695]  </TASK>
      [16535.932007] Kernel panic - not syncing: LBUG
      [16535.932480] CPU: 0 PID: 443178 Comm: multiop Kdump: loaded Tainted: G           OE    --------  ---  5.14.0-284.30.1.el9_2.x86_64 #1
      [16535.933665] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [16535.934264] Call Trace:
      [16535.934571]  <TASK>
      [16535.934840]  dump_stack_lvl+0x34/0x48
      [16535.935257]  panic+0xf4/0x2c6
      [16535.935637]  lbug_with_loc.cold+0x1a/0x43 [libcfs]
      [16535.936168]  ll_direct_rw_pages+0x46e/0x470 [lustre]
      [16535.936750]  ll_direct_IO_impl+0x462/0xc90 [lustre]
      [16535.937312]  ? bpf_lsm_inode_removexattr+0x10/0x10
      [16535.937842]  generic_file_direct_write+0x9c/0x1e0
      [16535.938395]  __generic_file_write_iter+0x9e/0x1a0
      [16535.938918]  vvp_io_write_start+0xb25/0xcb0 [lustre]
      [16535.939479]  cl_io_start+0x61/0x130 [obdclass]
      [16535.940016]  cl_io_loop+0x99/0x220 [obdclass]
      [16535.940542]  ll_file_io_generic+0x53d/0xef0 [lustre]
      [16535.941098]  do_file_write_iter+0x4e3/0x770 [lustre]
      [16535.941657]  ? __alloc_pages+0xe6/0x230
      [16535.942081]  new_sync_write+0xff/0x190
      [16535.942501]  vfs_write+0x1ef/0x280
      [16535.942892]  ksys_write+0x5f/0xe0
      [16535.943279]  do_syscall_64+0x5c/0x90
      [16535.943693]  ? handle_mm_fault+0xc5/0x2a0
      [16535.944132]  ? do_user_addr_fault+0x1dd/0x6b0
      [16535.944617]  ? exc_page_fault+0x62/0x150
      [16535.945050]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [16535.945589] RIP: 0033:0x7f415893eb97
      [16535.945992] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      [16535.947776] RSP: 002b:00007ffc09d90ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [16535.948557] RAX: ffffffffffffffda RBX: 0000000000406a48 RCX: 00007f415893eb97
      [16535.949289] RDX: 0000000000000fff RSI: 00007f41585cb010 RDI: 0000000000000003
      [16535.950032] RBP: 0000000000000fff R08: 0000000000000003 R09: 0000000000000000
      [16535.950765] R10: 00007f41588070e0 R11: 0000000000000246 R12: 0000000000000fff
      [16535.951491] R13: 0000000000000003 R14: 00000000004044fc R15: 0000000000000122
      [16535.952228]  </TASK>

      First time it hit in a seemingly unrelated review on June 11th: https://testing.whamcloud.com/test_sets/69350cd5-31d1-4654-bb37-1986fda71422

      and then shortly after hit in master proper on June 25th: https://testing.whamcloud.com/test_sets/00f3ebed-3676-4495-9fd6-308afa7df77b

      another failure from that session: https://testing.whamcloud.com/test_sets/79ab5267-752e-41f6-a27a-940ee3406679

       

      Always hits in sanity test 119h

       

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: