Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for alexey lyashkov <c17817@cray.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/34a5b227-36f5-441f-94d6-31914d7b4004
[14277.488692] WARNING: MMP writes to pool 'lustre-ost5' have not succeeded in over 60019 ms; suspending pool. Hrtime 14277488675560
[14277.490967] Kernel panic - not syncing: Pool 'lustre-ost5' has encountered an uncorrectable I/O failure and the failure mode property for this pool is set to panic.
[14277.493640] CPU: 1 PID: 519418 Comm: mmp Kdump: loaded Tainted: P OE --------- - - 4.18.0-240.22.1.el8_lustre.x86_64 #1
[14277.495797] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[14277.496854] Call Trace:
[14277.497397] dump_stack+0x5c/0x80
[14277.498052] panic+0xe7/0x2a9
[14277.499014] zio_suspend+0x103/0x110 [zfs]
[14277.499843] mmp_thread+0x61c/0x710 [zfs]
[14277.500651] ? mmp_write_uberblock+0x700/0x700 [zfs]
[14277.501615] ? __thread_exit+0x20/0x20 [spl]
[14277.502438] thread_generic_wrapper+0x6f/0x80 [spl]
[14277.503383] kthread+0x112/0x130
[14277.504000] ? kthread_flush_work_fn+0x10/0x10
[14277.504827] ret_from_fork+0x35/0x40
Attachments
Issue Links
- duplicates
-
LU-10956 sanity-pfl test_3: Kernel panic - not syncing: Pool has encountered an uncorrectable I/O failure and the failure mode property for this pool is set to panic
-
- Open
-
This happens intermittently with ZFS-based systems when the VM is stalled, possibly because other VMs are doing heavy IO to the host. It looks like the tuning to increase the fail retry count is missing on the new test cluster and needs to be applied.
It would be better if ZFS MMP handled this more gracefully, by resuming (and verifying MMP has not been modified) if the IO completes.