[LU-16792] dirtying dbuf but not tx_held Created: 02/May/23 Updated: 02/May/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.16.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Brian recommended to try and run Lustre against zfs built with --enable-debug as that does some extra checks and this is the first thing that cropped up right at mount time. I guess dirtying blocks outside of transaction is not very good? [ 128.003305] Lustre: lustre-MDT0000: mounting server target with '-t lustre' d eprecated, use '-t lustre_tgt' [ 129.332151] Kernel panic - not syncing: dirtying dbuf obj=20e lvl=0 blkid=10 but not tx_held [ 129.332151] [ 129.333151] CPU: 3 PID: 9383 Comm: ll_mgs_0001 Kdump: loaded Tainted: G O --------- - - 4.18.0rh8.7-debug #2 [ 129.334318] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 129.334919] Call Trace: [ 129.335196] ? dump_stack+0xf2/0x15e [ 129.335569] ? panic+0x17a/0x4ac [ 129.335932] ? dmu_tx_dirty_buf+0x40c/0x5b0 [zfs] [ 129.336722] ? _raw_spin_unlock+0x3f/0x60 [ 129.337155] ? dbuf_dirty+0x6e/0x29a0 [zfs] [ 129.337765] ? dbuf_read+0x753/0xe40 [zfs] [ 129.338380] ? lock_release+0x343/0x770 [ 129.338808] ? __mutex_unlock_slowpath+0x49/0x430 [ 129.339330] ? dmu_buf_will_dirty_impl+0x19b/0x570 [zfs] [ 129.340103] ? dmu_buf_will_dirty+0x1a/0x30 [zfs] [ 129.340766] ? dmu_write_impl+0x5c/0x1d0 [zfs] [ 129.341418] ? dmu_write_by_dnode+0xa6/0x110 [zfs] [ 129.342162] ? osd_write+0x177/0x8d0 [osd_zfs] [ 129.342685] ? dt_record_write+0x3b/0x180 [obdclass] [ 129.343277] ? llog_osd_write_rec+0xe88/0x1ed0 [obdclass] [ 129.343907] ? llog_write_rec+0x4d8/0x6c0 [obdclass] [ 129.344490] ? llog_write+0x6be/0x760 [obdclass] [ 129.345034] ? record_marker+0x180/0x2a0 [mgs] [ 129.345513] ? mgs_write_log_lov.isra.7+0x2ff/0x980 [mgs] [ 129.346119] ? mgs_write_log_mdt0+0x35e/0xa60 [mgs] [ 129.346630] ? mgs_write_log_mdt+0x115/0x10c0 [mgs] [ 129.347203] ? mgs_write_log_target+0x74b/0x8d0 [mgs] [ 129.347743] ? mgs_target_reg+0xf8f/0x1a90 [mgs] [ 129.348242] ? tgt_handle_request0+0xf9/0xa80 [ptlrpc] [ 129.348947] ? tgt_request_handle+0x3a5/0x1c00 [ptlrpc] [ 129.349595] ? ptlrpc_server_handle_request+0x632/0x11e0 [ptlrpc] [ 129.350328] ? lprocfs_counter_add+0x172/0x240 [obdclass] [ 129.350974] ? ptlrpc_main+0xd30/0x1440 [ptlrpc] [ 129.351555] ? ptlrpc_wait_event+0x990/0x990 [ptlrpc] [ 129.352197] ? kthread+0x197/0x1d0 [ 129.352560] ? set_kthread_struct+0x80/0x80
|
| Comments |
| Comment by Alex Zhuravlev [ 02/May/23 ] |
|
checking.. |
| Comment by Alex Zhuravlev [ 02/May/23 ] |
|
interesting, I can build with --enable-debug, but can't start: osd_zfs: Unknown symbol zfs_refcount_add (err 0) insmod: ERROR: could not insert module /mnt/build/lustre/tests/../osd-zfs/osd_zfs.ko: Unknown symbol in module zfs_recount_add is not exported in 2.1.2, only in 2.1.5+ |
| Comment by Alex Zhuravlev [ 02/May/23 ] |
|
can't build any ZFS
checking whether blk_queue_update_readahead() exists... checking whether disk_update_readahead() exists... no
checking whether blk_queue_discard() is available... configure: error:
*** None of the expected "blk_queue_discard" interfaces were detected.
*** This may be because your kernel version is newer than what is
*** supported, or you are using a patched custom kernel with
*** incompatible modifications.
***
*** ZFS Version: zfs-2.1.3-1
*** Compatible Kernels: 3.10 - 5.16
...
so the root cause is:
make: Entering directory '/home/alexey/linux-4.18.0-425.3.1.el8'
CC [M] /home/alexey/zfs/build/blk_queue_discard/blk_queue_discard.o
/home/alexey/zfs/build/blk_queue_discard/blk_queue_discard.c: In function ‘main’:
/home/alexey/zfs/build/blk_queue_discard/blk_queue_discard.c:103:1: error: the frame size of 4256 bytes is larger than 4096 bytes [-Werror=frame-larger-than=]
103 | }
| ^
cc1: all warnings being treated as errors
I guess it's kernel's debug options inflating request_queue struct: CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y CONFIG_DEBUG_RWSEMS=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_DEBUG_LOCKDEP=y CONFIG_DEBUG_ATOMIC_SLEEP=y |
| Comment by Alex Zhuravlev [ 02/May/23 ] |
|
disabling CONFIG_LOCK_STAT helped. |
| Comment by Alex Zhuravlev [ 02/May/23 ] |