Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20410

ZFS server panic on ZFS_DEBUG builds

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Medium Medium
    • None
    • Upstream
    • 3
    • 9223372036854775807

      Currently, when lustre is built on ZFS debug builds (flag ZFS_DEBUG), the MGS/MDS server panics every time on the first mount and is unable to proceed.

      There seems to be something wrong in the correctness of the way Lustre is dirtying the ZFS buffers and how TXGs are included.

      The intent is to fix panics in ZFS_DEBUG builds when dnodes are dirtied without being enrolled in the enclosing transaction.

      [81562.663604] Lustre: testfs-MDT0000: mounting server target with '-t lustre' deprecated, use '-t lustre_tgt'
      [81563.761861] Kernel panic - not syncing: dirtying dbuf obj=282 lvl=0 blkid=10 but not tx_held
      [81563.761883] CPU: 62 PID: 2324518 Comm: ll_mgs_0002 Kdump: loaded Tainted: P           OE     -------  ---  5.14.0-570.32.1.x7.4.000.88.x86_64 #1
      [81563.761901] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325 Gen11, BIOS 3.00 02/06/2026
      [81563.761912] Call Trace:
      [81563.761920]  <TASK>
      [81563.761928]  dump_stack_lvl+0x34/0x48
      [81563.761943]  panic+0x107/0x2bb
      [81563.761953]  dmu_tx_dirty_buf+0xe9/0x420 [zfs]
      [81563.762088]  ? dmu_zfetch+0x20/0x50 [zfs]
      [81563.762184]  dbuf_dirty+0x5f/0x1660 [zfs]
      [81563.762286]  ? srso_alias_return_thunk+0x5/0xfbef5
      [81563.762298]  dmu_write_impl+0xba/0x1f0 [zfs]
      [81563.762391]  dmu_write_by_dnode.part.0+0x96/0x190 [zfs]
      [81563.762481]  osd_write+0x1f2/0x3c0 [osd_zfs]
      [81563.762501]  dt_record_write+0x2f/0x120 [obdclass]
      [81563.762552]  llog_osd_write_rec+0xce4/0x1f80 [obdclass]
      [81563.762596]  ? __entry_text_end+0x102749/0x10274d
      [81563.762607]  ? spl_kmem_alloc+0x11b/0x160 [spl]
      [81563.762624]  llog_write_rec+0x43c/0x590 [obdclass]
      [81563.762666]  llog_write_cookie+0x4f6/0x5f0 [obdclass]
      [81563.762709]  record_marker+0x19c/0x220 [mgs]
      [81563.762730]  mgs_write_log_lov.constprop.0+0x345/0x5c0 [mgs]
      [81563.762749]  mgs_write_log_mdt0+0x2ee/0x6f0 [mgs]
      [81563.762768]  mgs_write_log_mdt+0x6d/0x6e0 [mgs]
      [81563.762785]  ? srso_alias_return_thunk+0x5/0xfbef5
      [81563.762794]  ? mgs_set_index+0x396/0x740 [mgs]
      [81563.762812]  mgs_write_log_target+0x635/0x740 [mgs]
      [81563.762829]  mgs_target_reg+0x1027/0x1d10 [mgs]
      [81563.762847]  ? __entry_text_end+0x58ae/0x10274d
      [81563.762857]  tgt_handle_request0+0x147/0x770 [ptlrpc]
      [81563.762937]  tgt_request_handle+0x3fd/0xd00 [ptlrpc]
      [81563.763001]  ? srso_alias_return_thunk+0x5/0xfbef5
      [81563.763011]  ? obd_export_timed_fini+0x9c/0xb0 [obdclass]
      [81563.763054]  ? srso_alias_return_thunk+0x5/0xfbef5
      [81563.763064]  ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
      [81563.763132]  ? srso_alias_return_thunk+0x5/0xfbef5
      [81563.763364]  ptlrpc_main+0x9bf/0xea0 [ptlrpc]
      [81563.763628]  ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
      [81563.763882]  kthread+0xdd/0x100
      [81563.764095]  ? __pfx_kthread+0x10/0x10
      [81563.764306]  ret_from_fork+0x29/0x50
      [81563.764516]  </TASK>
      ----
      
      
      Also, this
      
      [60533.503940] VERIFY3U(dn->dn_assigned_txg, ==, tx->tx_txg) failed (0 == 70)
      [60533.503962] PANIC at dmu_tx.c:746:dmu_tx_dirty_buf()
      [60533.503971] Kernel panic - not syncing: VERIFY3U(dn->dn_assigned_txg, ==, tx->tx_txg) failed (0 == 70)
      [60533.503984] CPU: 4 PID: 22004 Comm: mdt01_000 Kdump: loaded Tainted: P           OE     -------  ---  5.14.0-570.32.1.x7.4.000.88.x86_64 #1
      [60533.504001] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325 Gen11, BIOS 3.00 02/06/2026
      [60533.504013] Call Trace:
      [60533.504020]  <TASK>
      [60533.504030]  dump_stack_lvl+0x34/0x48
      [60533.504044]  panic+0x107/0x2bb
      [60533.504055]  spl_panic+0xcc/0xe9 [spl]
      [60533.504074]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.504085]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.504094]  ? mutex_lock+0xe/0x30
      [60533.504102]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.504111]  ? kmem_cache_alloc+0x4d/0x410
      [60533.504122]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.504131]  ? kmem_cache_alloc+0x4d/0x410
      [60533.504140]  ? spl_kmem_cache_alloc+0x14c/0x2d0 [spl]
      [60533.504157]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.504165]  ? spl_kmem_cache_alloc+0x16f/0x2d0 [spl]
      [60533.504182]  dmu_tx_dirty_buf+0x2e5/0x420 [zfs]
      [60533.504319]  dbuf_dirty+0x5f/0x1660 [zfs]
      [60533.504426]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.504436]  ? arc_alloc_buf+0x87/0xd0 [zfs]
      [60533.504533]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.504543]  ? dbuf_noread+0x166/0x2c0 [zfs]
      [60533.504632]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.504642]  dmu_write_impl+0x113/0x1f0 [zfs]
      [60533.504741]  dmu_write_by_dnode.part.0+0x96/0x190 [zfs]
      [60533.504834]  osd_write+0x1f2/0x3c0 [osd_zfs]
      [60533.504854]  lod_sub_write+0x1f1/0x420 [lod]
      [60533.504882]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.504891]  ? kmalloc_trace+0x137/0x430
      [60533.504901]  dt_record_write+0x2f/0x120 [obdclass]
      [60533.504953]  tgt_server_data_write+0xc3/0x230 [ptlrpc]
      [60533.505031]  tgt_last_rcvd_update+0x491/0xac0 [ptlrpc]
      [60533.505095]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.505105]  dt_txn_hook_stop+0x78/0x120 [obdclass]
      [60533.505151]  osd_trans_stop+0x343/0x520 [osd_zfs]
      [60533.505169]  top_trans_stop+0x9d/0xc90 [ptlrpc]
      [60533.505237]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.505439]  ? mdd_attr_set_internal+0x95/0x2d0 [mdd]
      [60533.505644]  mdd_trans_stop+0x24/0x16c [mdd]
      [60533.505844]  mdd_create+0xb2f/0x11d0 [mdd]
      [60533.506047]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.506234]  mdt_create+0xc9d/0x1be0 [mdt]
      [60533.506440]  mdt_reint_create+0x291/0x430 [mdt]
      [60533.506637]  mdt_reint_rec+0x119/0x270 [mdt]
      [60533.506828]  mdt_reint_internal+0x4ea/0x9b0 [mdt]
      [60533.507017]  mdt_reint+0x59/0x110 [mdt]
      [60533.507201]  tgt_handle_request0+0x147/0x770 [ptlrpc]
      [60533.507428]  tgt_request_handle+0x3fd/0xd00 [ptlrpc]
      [60533.507640]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.507795]  ? obd_export_timed_fini+0x9c/0xb0 [obdclass]
      [60533.507985]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.508144]  ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
      [60533.508362]  ? srso_alias_return_thunk+0x5/0xfbef5
      [60533.508525]  ptlrpc_main+0x9bf/0xea0 [ptlrpc]
      [60533.508735]  ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
      [60533.508942]  kthread+0xdd/0x100
      [60533.509108]  ? __pfx_kthread+0x10/0x10
      [60533.509272]  ret_from_fork+0x29/0x50
      [60533.509438]  </TASK>

            akash-b Akash B
            akash-b Akash B
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: