Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
Server: RHEL8
-
3
-
9223372036854775807
Description
[458781.070693] LDISKFS-fs error (device md65): ldiskfs_xattr_inode_iget:407: comm lfsck: EA inode 2047917093 does not have LDISKFS_EA_INODE_FL flag [458781.136989] Aborting journal on device md65-8. [458781.142323] LDISKFS-fs error (device md65) in ldiskfs_evict_inode:251: Journal has aborted [458781.153243] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.155099] LustreError: 98016:0:(osd_handler.c:1783:osd_trans_commit_cb()) transaction @0x000000002c9fd616 commit error: 2 [458781.158848] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.170295] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.170297] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.175978] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.182078] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.199967] Kernel panic - not syncing: LDISKFS-fs (device md65): panic forced after error [458781.199972] LDISKFS-fs (md65): Remounting filesystem read-only [458781.199979] LDISKFS-fs (md65): Remounting filesystem read-only [458781.200005] LDISKFS-fs (md65): Remounting filesystem read-only [458781.200549] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.200552] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.200840] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.201096] LDISKFS-fs (md65): Remounting filesystem read-only [458781.260424] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.262419] CPU: 4 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.10.2.x6.0.24.x86_64 #1 [458781.262421] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018 [458781.333307] Call Trace: [458781.336774] dump_stack+0x5c/0x80 [458781.341219] panic+0xe7/0x2a9 [458781.345208] ? wake_up_q+0x54/0x80 [458781.349955] ldiskfs_handle_error.cold.139+0x13/0x13 [ldiskfs] [458781.356863] __ldiskfs_error+0x8b/0x100 [ldiskfs] [458781.362710] ? ldiskfs_htree_fill_tree+0xa0/0x2d0 [ldiskfs] [458781.369344] ldiskfs_xattr_inode_iget+0xf4/0x170 [ldiskfs] [458781.375883] ldiskfs_xattr_inode_get+0x4c/0x1e0 [ldiskfs] [458781.382279] ? xattr_find_entry+0x95/0x110 [ldiskfs] [458781.388253] ldiskfs_xattr_ibody_get+0x15f/0x180 [ldiskfs] [458781.394742] ldiskfs_xattr_get+0x85/0x2d0 [ldiskfs] [458781.400634] __vfs_getxattr+0x53/0x70 [458781.405326] osd_xattr_get+0x167/0x650 [osd_ldiskfs] [458781.411326] lfsck_layout_get_lovea.part.77+0x6c/0x260 [lfsck] [458781.418171] lfsck_layout_master_exec_oit+0x1b5/0xc90 [lfsck] [458781.424910] lfsck_master_oit_engine+0xc52/0x1360 [lfsck] [458781.432113] lfsck_master_engine+0x50e/0xcd0 [lfsck] [458781.438056] ? finish_wait+0x80/0x80 [458781.442568] ? lfsck_master_oit_engine+0x1360/0x1360 [lfsck] [458781.449177] kthread+0x116/0x130 [458781.453342] ? kthread_flush_work_fn+0x10/0x10 [458781.458686] ret_from_fork+0x1f/0x40
And many backtraces:
[456491.541627] ret_from_fork+0x1f/0x40 [456491.547490] CPU: 1 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.10.2.x6.0.24.x86_64 #1 [456491.561264] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018 [456491.569958] Call Trace: [456491.573363] dump_stack+0x5c/0x80 [456491.577599] lfsck_trans_create.part.58+0x63/0x70 [lfsck] [456491.583966] lfsck_namespace_trace_update+0xa3b/0xa50 [lfsck] [456491.590650] lfsck_namespace_exec_oit+0x4b3/0x990 [lfsck] [456491.597048] ? down_write+0xe/0x40 [456491.601438] lfsck_master_oit_engine+0xc52/0x1360 [lfsck] [456491.607787] lfsck_master_engine+0x50e/0xcd0 [lfsck] [456491.613699] ? finish_wait+0x80/0x80 [456491.618187] ? lfsck_master_oit_engine+0x1360/0x1360 [lfsck] [456491.624716] kthread+0x116/0x130 [456491.628964] ? kthread_flush_work_fn+0x10/0x10 [456491.634325] ret_from_fork+0x1f/0x40 [456494.228001] CPU: 18 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.10.2.x6.0.24.x86_64 #1 [456494.241276] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018 [456494.249695] Call Trace: [456494.252853] dump_stack+0x5c/0x80 [456494.256885] lfsck_trans_create.part.58+0x63/0x70 [lfsck] [456494.262955] lfsck_namespace_trace_update+0xa3b/0xa50 [lfsck] [456494.269296] lfsck_namespace_exec_oit+0x4b3/0x990 [lfsck] [456494.275275] ? down_write+0xe/0x40 [456494.279264] lfsck_master_oit_engine+0xc52/0x1360 [lfsck] [456494.285258] lfsck_master_engine+0x50e/0xcd0 [lfsck] [456494.290924] ? finish_wait+0x80/0x80 [456494.295116] ? lfsck_master_oit_engine+0x1360/0x1360 [lfsck] [456494.301388] kthread+0x116/0x130 [456494.305199] ? kthread_flush_work_fn+0x10/0x10 [456494.310227] ret_from_fork+0x1f/0x40 [456494.314569] CPU: 8 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.10.2.x6.0.24.x86_64 #1 [456494.338328] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018
crash> bt -l PID: 2861532 TASK: ffff9c083c05af80 CPU: 4 COMMAND: "lfsck" #0 [ffffbd866a4cf8f0] machine_kexec at ffffffff9dc6156e /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/arch/x86/kernel/machine_kexec_64.c: 389 #1 [ffffbd866a4cf948] __crash_kexec at ffffffff9dd8f94d /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/kernel/kexec_core.c: 957 #2 [ffffbd866a4cfa10] panic at ffffffff9dce0dc7 /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/./arch/x86/include/asm/smp.h: 72 #3 [ffffbd866a4cfaa0] __ldiskfs_error at ffffffffc1a9252b [ldiskfs] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/inode.c: 4523 #4 [ffffbd866a4cfb48] ldiskfs_xattr_inode_iget at ffffffffc1a5cf14 [ldiskfs] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/trace/events/ldiskfs.h: 2666 #5 [ffffbd866a4cfb80] ldiskfs_xattr_inode_get at ffffffffc1a5fd9c [ldiskfs] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/trace/events/ldiskfs.h: 1775 #6 [ffffbd866a4cfbe0] ldiskfs_xattr_ibody_get at ffffffffc1a601ef [ldiskfs] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/ldiskfs.h: 1572 #7 [ffffbd866a4cfc48] ldiskfs_xattr_get at ffffffffc1a60295 [ldiskfs] /usr/src/kernels/4.18.0-305.10.2.x6.0.24.x86_64/include/linux/quotaops.h: 19 #8 [ffffbd866a4cfca0] __vfs_getxattr at ffffffff9df43223 /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/fs/xattr.c: 374 #9 [ffffbd866a4cfcd0] osd_xattr_get at ffffffffc1b28c07 [osd_ldiskfs] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/include/lustre_compat.h: 540 #10 [ffffbd866a4cfd18] lfsck_layout_get_lovea at ffffffffc158bd5c [lfsck] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/include/dt_object.h: 2875 #11 [ffffbd866a4cfd50] lfsck_layout_master_exec_oit at ffffffffc1597025 [lfsck] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/lfsck/lfsck_layout.c: 5711 #12 [ffffbd866a4cfe08] lfsck_master_oit_engine at ffffffffc1560de2 [lfsck] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/lfsck/lfsck_engine.c: 531 #13 [ffffbd866a4cfe78] lfsck_master_engine at ffffffffc15619fe [lfsck] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/lfsck/lfsck_engine.c: 1083 #14 [ffffbd866a4cff10] kthread at ffffffff9dd043a6 /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/kernel/kthread.c: 319 #15 [ffffbd866a4cff50] ret_from_fork at ffffffff9e60023f /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/arch/x86/entry/entry_64.S: 319
With (READ ONLY) lfsck enabled this crash persisted after rebooting, running e2fsck and raid re-sysc.
lfsck was eventually cleared by running lctl lfsck_stop on the MDT nodes as early as possible in the mount (and/or failback) until no more lfsck activity was observed.
This bug is fallout from
LU-15404. zam wrote in an internal HPE ticket:That said, it would be ideal if lfsck would handle the situation gracefully, instead of crashing. Let's downgrade this issue knowing that it won't happen (in this way) if the corruption from
LU-15404is addressed. Then, the scope of this ticket will focus on making lfsck gracefully handle the condition instead (for example, as withLU-14105).