Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
Server: RHEL8
-
3
-
9223372036854775807
Description
[458781.070693] LDISKFS-fs error (device md65): ldiskfs_xattr_inode_iget:407: comm lfsck: EA inode 2047917093 does not have LDISKFS_EA_INODE_FL flag [458781.136989] Aborting journal on device md65-8. [458781.142323] LDISKFS-fs error (device md65) in ldiskfs_evict_inode:251: Journal has aborted [458781.153243] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.155099] LustreError: 98016:0:(osd_handler.c:1783:osd_trans_commit_cb()) transaction @0x000000002c9fd616 commit error: 2 [458781.158848] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.170295] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.170297] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.175978] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.182078] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.199967] Kernel panic - not syncing: LDISKFS-fs (device md65): panic forced after error [458781.199972] LDISKFS-fs (md65): Remounting filesystem read-only [458781.199979] LDISKFS-fs (md65): Remounting filesystem read-only [458781.200005] LDISKFS-fs (md65): Remounting filesystem read-only [458781.200549] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.200552] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.200840] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.201096] LDISKFS-fs (md65): Remounting filesystem read-only [458781.260424] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal [458781.262419] CPU: 4 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.10.2.x6.0.24.x86_64 #1 [458781.262421] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018 [458781.333307] Call Trace: [458781.336774] dump_stack+0x5c/0x80 [458781.341219] panic+0xe7/0x2a9 [458781.345208] ? wake_up_q+0x54/0x80 [458781.349955] ldiskfs_handle_error.cold.139+0x13/0x13 [ldiskfs] [458781.356863] __ldiskfs_error+0x8b/0x100 [ldiskfs] [458781.362710] ? ldiskfs_htree_fill_tree+0xa0/0x2d0 [ldiskfs] [458781.369344] ldiskfs_xattr_inode_iget+0xf4/0x170 [ldiskfs] [458781.375883] ldiskfs_xattr_inode_get+0x4c/0x1e0 [ldiskfs] [458781.382279] ? xattr_find_entry+0x95/0x110 [ldiskfs] [458781.388253] ldiskfs_xattr_ibody_get+0x15f/0x180 [ldiskfs] [458781.394742] ldiskfs_xattr_get+0x85/0x2d0 [ldiskfs] [458781.400634] __vfs_getxattr+0x53/0x70 [458781.405326] osd_xattr_get+0x167/0x650 [osd_ldiskfs] [458781.411326] lfsck_layout_get_lovea.part.77+0x6c/0x260 [lfsck] [458781.418171] lfsck_layout_master_exec_oit+0x1b5/0xc90 [lfsck] [458781.424910] lfsck_master_oit_engine+0xc52/0x1360 [lfsck] [458781.432113] lfsck_master_engine+0x50e/0xcd0 [lfsck] [458781.438056] ? finish_wait+0x80/0x80 [458781.442568] ? lfsck_master_oit_engine+0x1360/0x1360 [lfsck] [458781.449177] kthread+0x116/0x130 [458781.453342] ? kthread_flush_work_fn+0x10/0x10 [458781.458686] ret_from_fork+0x1f/0x40
And many backtraces:
[456491.541627] ret_from_fork+0x1f/0x40 [456491.547490] CPU: 1 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.10.2.x6.0.24.x86_64 #1 [456491.561264] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018 [456491.569958] Call Trace: [456491.573363] dump_stack+0x5c/0x80 [456491.577599] lfsck_trans_create.part.58+0x63/0x70 [lfsck] [456491.583966] lfsck_namespace_trace_update+0xa3b/0xa50 [lfsck] [456491.590650] lfsck_namespace_exec_oit+0x4b3/0x990 [lfsck] [456491.597048] ? down_write+0xe/0x40 [456491.601438] lfsck_master_oit_engine+0xc52/0x1360 [lfsck] [456491.607787] lfsck_master_engine+0x50e/0xcd0 [lfsck] [456491.613699] ? finish_wait+0x80/0x80 [456491.618187] ? lfsck_master_oit_engine+0x1360/0x1360 [lfsck] [456491.624716] kthread+0x116/0x130 [456491.628964] ? kthread_flush_work_fn+0x10/0x10 [456491.634325] ret_from_fork+0x1f/0x40 [456494.228001] CPU: 18 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.10.2.x6.0.24.x86_64 #1 [456494.241276] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018 [456494.249695] Call Trace: [456494.252853] dump_stack+0x5c/0x80 [456494.256885] lfsck_trans_create.part.58+0x63/0x70 [lfsck] [456494.262955] lfsck_namespace_trace_update+0xa3b/0xa50 [lfsck] [456494.269296] lfsck_namespace_exec_oit+0x4b3/0x990 [lfsck] [456494.275275] ? down_write+0xe/0x40 [456494.279264] lfsck_master_oit_engine+0xc52/0x1360 [lfsck] [456494.285258] lfsck_master_engine+0x50e/0xcd0 [lfsck] [456494.290924] ? finish_wait+0x80/0x80 [456494.295116] ? lfsck_master_oit_engine+0x1360/0x1360 [lfsck] [456494.301388] kthread+0x116/0x130 [456494.305199] ? kthread_flush_work_fn+0x10/0x10 [456494.310227] ret_from_fork+0x1f/0x40 [456494.314569] CPU: 8 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.10.2.x6.0.24.x86_64 #1 [456494.338328] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018
crash> bt -l PID: 2861532 TASK: ffff9c083c05af80 CPU: 4 COMMAND: "lfsck" #0 [ffffbd866a4cf8f0] machine_kexec at ffffffff9dc6156e /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/arch/x86/kernel/machine_kexec_64.c: 389 #1 [ffffbd866a4cf948] __crash_kexec at ffffffff9dd8f94d /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/kernel/kexec_core.c: 957 #2 [ffffbd866a4cfa10] panic at ffffffff9dce0dc7 /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/./arch/x86/include/asm/smp.h: 72 #3 [ffffbd866a4cfaa0] __ldiskfs_error at ffffffffc1a9252b [ldiskfs] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/inode.c: 4523 #4 [ffffbd866a4cfb48] ldiskfs_xattr_inode_iget at ffffffffc1a5cf14 [ldiskfs] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/trace/events/ldiskfs.h: 2666 #5 [ffffbd866a4cfb80] ldiskfs_xattr_inode_get at ffffffffc1a5fd9c [ldiskfs] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/trace/events/ldiskfs.h: 1775 #6 [ffffbd866a4cfbe0] ldiskfs_xattr_ibody_get at ffffffffc1a601ef [ldiskfs] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/ldiskfs.h: 1572 #7 [ffffbd866a4cfc48] ldiskfs_xattr_get at ffffffffc1a60295 [ldiskfs] /usr/src/kernels/4.18.0-305.10.2.x6.0.24.x86_64/include/linux/quotaops.h: 19 #8 [ffffbd866a4cfca0] __vfs_getxattr at ffffffff9df43223 /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/fs/xattr.c: 374 #9 [ffffbd866a4cfcd0] osd_xattr_get at ffffffffc1b28c07 [osd_ldiskfs] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/include/lustre_compat.h: 540 #10 [ffffbd866a4cfd18] lfsck_layout_get_lovea at ffffffffc158bd5c [lfsck] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/include/dt_object.h: 2875 #11 [ffffbd866a4cfd50] lfsck_layout_master_exec_oit at ffffffffc1597025 [lfsck] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/lfsck/lfsck_layout.c: 5711 #12 [ffffbd866a4cfe08] lfsck_master_oit_engine at ffffffffc1560de2 [lfsck] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/lfsck/lfsck_engine.c: 531 #13 [ffffbd866a4cfe78] lfsck_master_engine at ffffffffc15619fe [lfsck] /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/lfsck/lfsck_engine.c: 1083 #14 [ffffbd866a4cff10] kthread at ffffffff9dd043a6 /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/kernel/kthread.c: 319 #15 [ffffbd866a4cff50] ret_from_fork at ffffffff9e60023f /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/arch/x86/entry/entry_64.S: 319
With (READ ONLY) lfsck enabled this crash persisted after rebooting, running e2fsck and raid re-sysc.
lfsck was eventually cleared by running lctl lfsck_stop on the MDT nodes as early as possible in the mount (and/or failback) until no more lfsck activity was observed.