Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19537

qmt_entry.c:1230:qmt_seed_glbe_all()) ASSERTION( idx >= 0 ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • Lustre 2.17.0
    • Lustre 2.17.0
    • None
    • 3
    • 9223372036854775807

    Description

      This is a periodic assertion we hit in maloo, first time in Apr 2025. While most occurrences were in sanity quota, we just hit one in lfsck too.

       [32438.240695] Lustre: DEBUG MARKER: == sanity-lfsck test 16: LFSCK can repair inconsistent MDT-object/OST-object owner ========================================================== 09:07:29 (1761728849)
      [32438.289458] Lustre: 1352109:0:(osd_internal.h:1470:osd_trans_exec_op()) lustre-MDT0000: opcode 2: before 251 < left 278, rollback = 2
      [32438.289653] Lustre: 1352109:0:(osd_internal.h:1470:osd_trans_exec_op()) Skipped 1799 previous similar messages
      [32438.289750] Lustre: 1352109:0:(osd_handler.c:2076:osd_trans_dump_creds())   create: 1/4/4, destroy: 0/0/0
      [32438.289844] Lustre: 1352109:0:(osd_handler.c:2076:osd_trans_dump_creds()) Skipped 1799 previous similar messages
      [32438.289959] Lustre: 1352109:0:(osd_handler.c:2083:osd_trans_dump_creds())   attr_set: 1/1/0, xattr_set: 4/278/0
      [32438.290054] Lustre: 1352109:0:(osd_handler.c:2083:osd_trans_dump_creds()) Skipped 1799 previous similar messages
      [32438.290152] Lustre: 1352109:0:(osd_handler.c:2090:osd_trans_dump_creds())   write: 1/11/0, punch: 0/0/0, quota 1/3/2
      [32438.290259] Lustre: 1352109:0:(osd_handler.c:2090:osd_trans_dump_creds()) Skipped 1799 previous similar messages
      [32438.290357] Lustre: 1352109:0:(osd_handler.c:2100:osd_trans_dump_creds())   insert: 4/65/3, delete: 0/0/0
      [32438.290453] Lustre: 1352109:0:(osd_handler.c:2100:osd_trans_dump_creds()) Skipped 1799 previous similar messages
      [32438.290554] Lustre: 1352109:0:(osd_handler.c:2107:osd_trans_dump_creds())   ref_add: 2/2/0, ref_del: 0/0/0
      [32438.290653] Lustre: 1352109:0:(osd_handler.c:2107:osd_trans_dump_creds()) Skipped 1799 previous similar messages
      [32438.351208] LustreError: 1356644:0:(qmt_entry.c:1165:qmt_map_lge_idx()) qmt: cannot map ostidx 3, num_used 3: rc = -22
      [32438.351402] LustreError: 1356644:0:(qmt_entry.c:1230:qmt_seed_glbe_all()) ASSERTION( idx >= 0 ) failed: idx -22 lqe_is_global 1 lqe ff4feaf57122ccb8
      [32438.351496] LustreError: 1356644:0:(qmt_entry.c:1230:qmt_seed_glbe_all()) LBUG
      [32438.351590] CPU: 0 PID: 1356644 Comm: mdt_rdpg00_002 Kdump: loaded Tainted: G           OE     -------  ---  5.14.0-503.40.1_lustre.el9.x86_64 #1
      [32438.351684] Hardware name: Red Hat KVM, BIOS 1.16.3-2.el9_5.1 04/01/2014
      [32438.351780] Call Trace:
      [32438.351899]  <TASK>
      [32438.352005]  dump_stack_lvl+0x34/0x48
      [32438.352101]  lbug_with_loc.cold+0x5/0x43 [lnet]
      [32438.352227]  qmt_seed_glbe_all+0x3d1/0x7c0 [lquota]
      [32438.352375]  qmt_setup_lqe_gd+0x14b/0x1b0 [lquota]
      [32438.352542]  qmt_lvbo_init+0x349/0x820 [lquota]
      [32438.352659]  ldlm_lvbo_init+0x62/0x1d0 [ptlrpc]
      [32438.352860]  ldlm_handle_enqueue+0x5a6/0x16d0 [ptlrpc]
      [32438.353068]  tgt_enqueue+0x60/0x240 [ptlrpc]
      [32438.353263]  tgt_handle_request0+0x147/0x770 [ptlrpc]
      [32438.353465]  tgt_request_handle+0x3fd/0xd00 [ptlrpc]
      [32438.353646]  ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
      [32438.353834]  ? srso_alias_return_thunk+0x5/0xfbef5
      [32438.353949]  ptlrpc_main+0x9bf/0xea0 [ptlrpc]
      [32438.354132]  ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
      [32438.354318]  kthread+0xdd/0x100
      [32438.354411]  ? __pfx_kthread+0x10/0x10
      [32438.354504]  ret_from_fork+0x29/0x50
      [32438.354611]  </TASK>
      [32438.354717] Kernel panic - not syncing: LBUG

      First hit: https://testing.whamcloud.com/test_sets/de9713b3-da39-4f96-9ed2-b63c8560a4ce

      Just hit in master next: https://testing.whamcloud.com/test_sets/7b58893c-f3ba-4ca0-8899-6374c5aead5d

      from the looks of it, error handling is just not there? This is relatively srerious as the assertion takes down an mds.

      Attachments

        Activity

          People

            hongchao.zhang Hongchao Zhang
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: