Details
-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
Lustre 2.15.0
-
None
-
ZFS: 2.1.3
Lustre: 2.15 (2.15.0_RC2_38_g8e8bbc0)
-
3
-
9223372036854775807
Description
While trying to debug LU-15586 we have enabled
ZFS_DEBUG with the following set of flags:
ZFS_DEBUG_MODIFY | ZFS_DEBUG_DBUF_VERIFY | ZFS_DEBUG_DNODE_VERIFY
Enabling ZFS_DEBUG implies usage of a different set of refcount functions from zfs.
In order to make lustre build possible zfs module has to be modified in order to export zfs_refcount_add function which is invoked from osd-zfs
Modification in module/zfs/refcount.c
+#if defined(_KERNEL)
+EXPORT_SYMBOL(zfs_refcount_add);
+#endif
Unfortunately - enabling extra checks in ZFS results in kernel PANIC when mounting ost/mdt resources
[87796.641026] Kernel panic - not syncing: dirtying dbuf obj=c80092 lvl=0 blkid=10 but not tx_held [87796.677146] CPU: 18 PID: 16207 Comm: mount.lustre Kdump: loaded Tainted: P IOE --------- - - 4.18.0-348.7.1.el8_5.x86_64 #1 [87796.715272] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 7.99 03/11/2021 [87796.735224] Call Trace: 87796.750373] dump_stack+0x5c/0x80 [87796.766219] panic+0xe7/0x2a9 [87796.781619] dmu_tx_dirty_buf+0x117/0x3f0 [zfs] [87796.798682] ? rrw_enter_read_impl+0x125/0x220 [zfs] [87796.815679] dbuf_dirty+0x5e/0x1530 [zfs] [87796.831998] ? dbuf_read+0x139/0x680 [zfs] [87796.847813] dmu_write_impl+0x44/0x150 [zfs] [87796.863641] dmu_write_by_dnode+0x8e/0xe0 [zfs] [87796.879597] osd_write+0x118/0x3a0 [osd_zfs] [87796.895392] dt_record_write+0x32/0x110 [obdclass] [87796.911190] llog_osd_write_rec+0xd06/0x1ae0 [obdclass] [87796.927699] llog_write_rec+0x3f6/0x530 [obdclass] [87796.943235] llog_write+0x4df/0x550 [obdclass] [87796.958243] llog_process_thread+0xb8e/0x1aa0 [obdclass] [87796.974051] ? llog_process_or_fork+0x5e/0x560 [obdclass] [87796.990079] ? kmem_cache_alloc_trace+0x131/0x270 [87797.004891] ? llog_write+0x550/0x550 [obdclass] [87797.019560] llog_process_or_fork+0x1c1/0x560 [obdclass] [87797.034729] llog_backup+0x354/0x520 [obdclass] [87797.048909] mgc_llog_local_copy+0x110/0x420 [mgc] [87797.063504] mgc_process_cfg_log+0x971/0xd80 [mgc] [87797.077634] mgc_process_log+0x6c3/0x800 [mgc] [87797.091414] ? config_log_add+0x3f5/0xa00 [mgc] [87797.104982] mgc_process_config+0xb53/0xe60 [mgc] [87797.118529] lustre_process_log+0x5fa/0xad0 [obdclass] [87797.132327] ? server_register_mount+0x4d1/0x740 [obdclass] [87797.146529] server_start_targets+0x1504/0x3010 [obdclass] [87797.160454] ? strlcpy+0x2d/0x40 [87797.171898] ? class_config_dump_handler+0x730/0x730 [obdclass] [87797.186054] ? mgc_set_info_async+0x539/0xad0 [mgc] [87797.198949] ? mgc_set_info_async+0x539/0xad0 [mgc] [87797.211583] ? lustre_start_mgc+0xf7c/0x27c0 [obdclass] [87797.224758] server_fill_super+0x8ea/0x10d0 [obdclass] [87797.237408] lustre_fill_super+0x3a1/0x3f0 [lustre] [87797.249568] ? ll_inode_destroy_callback+0x120/0x120 [lustre] [87797.262647] mount_nodev+0x48/0xa0 [87797.273054] legacy_get_tree+0x27/0x40 [87797.283602] vfs_get_tree+0x25/0xb0 [87797.294051] do_mount+0x2e2/0x950 [87797.303806] ksys_mount+0xb6/0xd0 [87797.313335] __x64_sys_mount+0x21/0x30 [87797.323140] do_syscall_64+0x5b/0x1a0 [87797.332660] entry_SYSCALL_64_after_hwframe+0x65/0xca [87797.343521] RIP: 0033:0x7fb3c21f892e
For reference: on the ZFS side problem is tracked here: