Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19537

qmt_entry.c:1230:qmt_seed_glbe_all()) ASSERTION( idx >= 0 ) failed

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • Lustre 2.18.0
    • Lustre 2.17.0
    • None
    • 3
    • 9223372036854775807

      This is a periodic assertion we hit in maloo, first time in Apr 2025. While most occurrences were in sanity quota, we just hit one in lfsck too.

       [32438.240695] Lustre: DEBUG MARKER: == sanity-lfsck test 16: LFSCK can repair inconsistent MDT-object/OST-object owner ========================================================== 09:07:29 (1761728849)
      [32438.289458] Lustre: 1352109:0:(osd_internal.h:1470:osd_trans_exec_op()) lustre-MDT0000: opcode 2: before 251 < left 278, rollback = 2
      [32438.289653] Lustre: 1352109:0:(osd_internal.h:1470:osd_trans_exec_op()) Skipped 1799 previous similar messages
      [32438.289750] Lustre: 1352109:0:(osd_handler.c:2076:osd_trans_dump_creds())   create: 1/4/4, destroy: 0/0/0
      [32438.289844] Lustre: 1352109:0:(osd_handler.c:2076:osd_trans_dump_creds()) Skipped 1799 previous similar messages
      [32438.289959] Lustre: 1352109:0:(osd_handler.c:2083:osd_trans_dump_creds())   attr_set: 1/1/0, xattr_set: 4/278/0
      [32438.290054] Lustre: 1352109:0:(osd_handler.c:2083:osd_trans_dump_creds()) Skipped 1799 previous similar messages
      [32438.290152] Lustre: 1352109:0:(osd_handler.c:2090:osd_trans_dump_creds())   write: 1/11/0, punch: 0/0/0, quota 1/3/2
      [32438.290259] Lustre: 1352109:0:(osd_handler.c:2090:osd_trans_dump_creds()) Skipped 1799 previous similar messages
      [32438.290357] Lustre: 1352109:0:(osd_handler.c:2100:osd_trans_dump_creds())   insert: 4/65/3, delete: 0/0/0
      [32438.290453] Lustre: 1352109:0:(osd_handler.c:2100:osd_trans_dump_creds()) Skipped 1799 previous similar messages
      [32438.290554] Lustre: 1352109:0:(osd_handler.c:2107:osd_trans_dump_creds())   ref_add: 2/2/0, ref_del: 0/0/0
      [32438.290653] Lustre: 1352109:0:(osd_handler.c:2107:osd_trans_dump_creds()) Skipped 1799 previous similar messages
      [32438.351208] LustreError: 1356644:0:(qmt_entry.c:1165:qmt_map_lge_idx()) qmt: cannot map ostidx 3, num_used 3: rc = -22
      [32438.351402] LustreError: 1356644:0:(qmt_entry.c:1230:qmt_seed_glbe_all()) ASSERTION( idx >= 0 ) failed: idx -22 lqe_is_global 1 lqe ff4feaf57122ccb8
      [32438.351496] LustreError: 1356644:0:(qmt_entry.c:1230:qmt_seed_glbe_all()) LBUG
      [32438.351590] CPU: 0 PID: 1356644 Comm: mdt_rdpg00_002 Kdump: loaded Tainted: G           OE     -------  ---  5.14.0-503.40.1_lustre.el9.x86_64 #1
      [32438.351684] Hardware name: Red Hat KVM, BIOS 1.16.3-2.el9_5.1 04/01/2014
      [32438.351780] Call Trace:
      [32438.351899]  <TASK>
      [32438.352005]  dump_stack_lvl+0x34/0x48
      [32438.352101]  lbug_with_loc.cold+0x5/0x43 [lnet]
      [32438.352227]  qmt_seed_glbe_all+0x3d1/0x7c0 [lquota]
      [32438.352375]  qmt_setup_lqe_gd+0x14b/0x1b0 [lquota]
      [32438.352542]  qmt_lvbo_init+0x349/0x820 [lquota]
      [32438.352659]  ldlm_lvbo_init+0x62/0x1d0 [ptlrpc]
      [32438.352860]  ldlm_handle_enqueue+0x5a6/0x16d0 [ptlrpc]
      [32438.353068]  tgt_enqueue+0x60/0x240 [ptlrpc]
      [32438.353263]  tgt_handle_request0+0x147/0x770 [ptlrpc]
      [32438.353465]  tgt_request_handle+0x3fd/0xd00 [ptlrpc]
      [32438.353646]  ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
      [32438.353834]  ? srso_alias_return_thunk+0x5/0xfbef5
      [32438.353949]  ptlrpc_main+0x9bf/0xea0 [ptlrpc]
      [32438.354132]  ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
      [32438.354318]  kthread+0xdd/0x100
      [32438.354411]  ? __pfx_kthread+0x10/0x10
      [32438.354504]  ret_from_fork+0x29/0x50
      [32438.354611]  </TASK>
      [32438.354717] Kernel panic - not syncing: LBUG

      First hit: https://testing.whamcloud.com/test_sets/de9713b3-da39-4f96-9ed2-b63c8560a4ce

      Just hit in master next: https://testing.whamcloud.com/test_sets/7b58893c-f3ba-4ca0-8899-6374c5aead5d

      from the looks of it, error handling is just not there? This is relatively srerious as the assertion takes down an mds.

            hongchao.zhang Hongchao Zhang
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: