Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15716

OSD-ZFS / panic when mounting Lustre with ZFS_DEBUG enabled

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • Lustre 2.15.0
    • None
    • ZFS: 2.1.3
      Lustre: 2.15 (2.15.0_RC2_38_g8e8bbc0)
    • 3
    • 9223372036854775807

    Description

      While trying to debug LU-15586 we have enabled 
      ZFS_DEBUG with the following set of flags:

      ZFS_DEBUG_MODIFY | ZFS_DEBUG_DBUF_VERIFY | ZFS_DEBUG_DNODE_VERIFY

      Enabling ZFS_DEBUG implies usage of a different set of refcount functions from zfs.
      In order to make lustre build possible zfs module has to be modified in order to export zfs_refcount_add function which is invoked from osd-zfs
      Modification in module/zfs/refcount.c

      +#if defined(_KERNEL)
      +EXPORT_SYMBOL(zfs_refcount_add);
      +#endif

       

      Unfortunately - enabling extra checks in ZFS results in kernel PANIC when mounting ost/mdt resources

      [87796.641026] Kernel panic - not syncing: dirtying dbuf obj=c80092 lvl=0 blkid=10 but not tx_held
      
      [87796.677146] CPU: 18 PID: 16207 Comm: mount.lustre Kdump: loaded Tainted: P          IOE    --------- -  - 4.18.0-348.7.1.el8_5.x86_64 #1
      [87796.715272] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 7.99 03/11/2021
      [87796.735224] Call Trace:
      87796.750373]  dump_stack+0x5c/0x80
      [87796.766219]  panic+0xe7/0x2a9
      [87796.781619]  dmu_tx_dirty_buf+0x117/0x3f0 [zfs]
      [87796.798682]  ? rrw_enter_read_impl+0x125/0x220 [zfs]
      [87796.815679]  dbuf_dirty+0x5e/0x1530 [zfs]
      [87796.831998]  ? dbuf_read+0x139/0x680 [zfs]
      [87796.847813]  dmu_write_impl+0x44/0x150 [zfs]
      [87796.863641]  dmu_write_by_dnode+0x8e/0xe0 [zfs]
      [87796.879597]  osd_write+0x118/0x3a0 [osd_zfs]
      [87796.895392]  dt_record_write+0x32/0x110 [obdclass]
      [87796.911190]  llog_osd_write_rec+0xd06/0x1ae0 [obdclass]
      [87796.927699]  llog_write_rec+0x3f6/0x530 [obdclass]
      [87796.943235]  llog_write+0x4df/0x550 [obdclass]
      [87796.958243]  llog_process_thread+0xb8e/0x1aa0 [obdclass]
      [87796.974051]  ? llog_process_or_fork+0x5e/0x560 [obdclass]
      [87796.990079]  ? kmem_cache_alloc_trace+0x131/0x270
      [87797.004891]  ? llog_write+0x550/0x550 [obdclass]
      [87797.019560]  llog_process_or_fork+0x1c1/0x560 [obdclass]
      [87797.034729]  llog_backup+0x354/0x520 [obdclass]
      [87797.048909]  mgc_llog_local_copy+0x110/0x420 [mgc]
      [87797.063504]  mgc_process_cfg_log+0x971/0xd80 [mgc]
      [87797.077634]  mgc_process_log+0x6c3/0x800 [mgc]
      [87797.091414]  ? config_log_add+0x3f5/0xa00 [mgc]
      [87797.104982]  mgc_process_config+0xb53/0xe60 [mgc]
      [87797.118529]  lustre_process_log+0x5fa/0xad0 [obdclass]
      [87797.132327]  ? server_register_mount+0x4d1/0x740 [obdclass]
      [87797.146529]  server_start_targets+0x1504/0x3010 [obdclass]
      [87797.160454]  ? strlcpy+0x2d/0x40
      [87797.171898]  ? class_config_dump_handler+0x730/0x730 [obdclass]
      [87797.186054]  ? mgc_set_info_async+0x539/0xad0 [mgc]
      [87797.198949]  ? mgc_set_info_async+0x539/0xad0 [mgc]
      [87797.211583]  ? lustre_start_mgc+0xf7c/0x27c0 [obdclass]
      [87797.224758]  server_fill_super+0x8ea/0x10d0 [obdclass]
      [87797.237408]  lustre_fill_super+0x3a1/0x3f0 [lustre]
      [87797.249568]  ? ll_inode_destroy_callback+0x120/0x120 [lustre]
      [87797.262647]  mount_nodev+0x48/0xa0
      [87797.273054]  legacy_get_tree+0x27/0x40
      [87797.283602]  vfs_get_tree+0x25/0xb0
      [87797.294051]  do_mount+0x2e2/0x950
      [87797.303806]  ksys_mount+0xb6/0xd0
      [87797.313335]  __x64_sys_mount+0x21/0x30
      [87797.323140]  do_syscall_64+0x5b/0x1a0
      [87797.332660]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      [87797.343521] RIP: 0033:0x7fb3c21f892e 

      For reference: on the ZFS side problem is tracked here:

      https://github.com/openzfs/zfs/issues/13144

      Attachments

        Activity

          People

            wc-triage WC Triage
            lflis Lukasz Flis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: