Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15238

lfsck crashes MDT LDISKFS-fs error (device md65): ldiskfs_xattr_inode_iget:407: comm lfsck: EA inode 2047917093 does not have LDISKFS_EA_INODE_FL flag

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • None
    • None
    • Server: RHEL8
    • 3
    • 9223372036854775807

    Description

      [458781.070693] LDISKFS-fs error (device md65): ldiskfs_xattr_inode_iget:407: comm lfsck: EA inode 2047917093 does not have LDISKFS_EA_INODE_FL flag
      [458781.136989] Aborting journal on device md65-8.
      [458781.142323] LDISKFS-fs error (device md65) in ldiskfs_evict_inode:251: Journal has aborted
      [458781.153243] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.155099] LustreError: 98016:0:(osd_handler.c:1783:osd_trans_commit_cb()) transaction @0x000000002c9fd616 commit error: 2
      [458781.158848] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.170295] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.170297] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.175978] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.182078] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.199967] Kernel panic - not syncing: LDISKFS-fs (device md65): panic forced after error
      
      [458781.199972] LDISKFS-fs (md65): Remounting filesystem read-only
      [458781.199979] LDISKFS-fs (md65): Remounting filesystem read-only
      [458781.200005] LDISKFS-fs (md65): Remounting filesystem read-only
      [458781.200549] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.200552] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.200840] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.201096] LDISKFS-fs (md65): Remounting filesystem read-only
      [458781.260424] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.262419] CPU: 4 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-305.10.2.x6.0.24.x86_64 #1
      [458781.262421] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018
      [458781.333307] Call Trace:
      [458781.336774]  dump_stack+0x5c/0x80
      [458781.341219]  panic+0xe7/0x2a9
      [458781.345208]  ? wake_up_q+0x54/0x80
      [458781.349955]  ldiskfs_handle_error.cold.139+0x13/0x13 [ldiskfs]
      [458781.356863]  __ldiskfs_error+0x8b/0x100 [ldiskfs]
      [458781.362710]  ? ldiskfs_htree_fill_tree+0xa0/0x2d0 [ldiskfs]
      [458781.369344]  ldiskfs_xattr_inode_iget+0xf4/0x170 [ldiskfs]
      [458781.375883]  ldiskfs_xattr_inode_get+0x4c/0x1e0 [ldiskfs]
      [458781.382279]  ? xattr_find_entry+0x95/0x110 [ldiskfs]
      [458781.388253]  ldiskfs_xattr_ibody_get+0x15f/0x180 [ldiskfs]
      [458781.394742]  ldiskfs_xattr_get+0x85/0x2d0 [ldiskfs]
      [458781.400634]  __vfs_getxattr+0x53/0x70
      [458781.405326]  osd_xattr_get+0x167/0x650 [osd_ldiskfs]
      [458781.411326]  lfsck_layout_get_lovea.part.77+0x6c/0x260 [lfsck]
      [458781.418171]  lfsck_layout_master_exec_oit+0x1b5/0xc90 [lfsck]
      [458781.424910]  lfsck_master_oit_engine+0xc52/0x1360 [lfsck]
      [458781.432113]  lfsck_master_engine+0x50e/0xcd0 [lfsck]
      [458781.438056]  ? finish_wait+0x80/0x80
      [458781.442568]  ? lfsck_master_oit_engine+0x1360/0x1360 [lfsck]
      [458781.449177]  kthread+0x116/0x130
      [458781.453342]  ? kthread_flush_work_fn+0x10/0x10
      [458781.458686]  ret_from_fork+0x1f/0x40
      

      And many backtraces:

      [456491.541627]  ret_from_fork+0x1f/0x40
      [456491.547490] CPU: 1 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-305.10.2.x6.0.24.x86_64 #1
      [456491.561264] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018
      [456491.569958] Call Trace:
      [456491.573363]  dump_stack+0x5c/0x80
      [456491.577599]  lfsck_trans_create.part.58+0x63/0x70 [lfsck]
      [456491.583966]  lfsck_namespace_trace_update+0xa3b/0xa50 [lfsck]
      [456491.590650]  lfsck_namespace_exec_oit+0x4b3/0x990 [lfsck]
      [456491.597048]  ? down_write+0xe/0x40
      [456491.601438]  lfsck_master_oit_engine+0xc52/0x1360 [lfsck]
      [456491.607787]  lfsck_master_engine+0x50e/0xcd0 [lfsck]
      [456491.613699]  ? finish_wait+0x80/0x80
      [456491.618187]  ? lfsck_master_oit_engine+0x1360/0x1360 [lfsck]
      [456491.624716]  kthread+0x116/0x130
      [456491.628964]  ? kthread_flush_work_fn+0x10/0x10
      [456491.634325]  ret_from_fork+0x1f/0x40
      [456494.228001] CPU: 18 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-305.10.2.x6.0.24.x86_64 #1
      [456494.241276] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018
      [456494.249695] Call Trace:
      [456494.252853]  dump_stack+0x5c/0x80
      [456494.256885]  lfsck_trans_create.part.58+0x63/0x70 [lfsck]
      [456494.262955]  lfsck_namespace_trace_update+0xa3b/0xa50 [lfsck]
      [456494.269296]  lfsck_namespace_exec_oit+0x4b3/0x990 [lfsck]
      [456494.275275]  ? down_write+0xe/0x40
      [456494.279264]  lfsck_master_oit_engine+0xc52/0x1360 [lfsck]
      [456494.285258]  lfsck_master_engine+0x50e/0xcd0 [lfsck]
      [456494.290924]  ? finish_wait+0x80/0x80
      [456494.295116]  ? lfsck_master_oit_engine+0x1360/0x1360 [lfsck]
      [456494.301388]  kthread+0x116/0x130
      [456494.305199]  ? kthread_flush_work_fn+0x10/0x10
      [456494.310227]  ret_from_fork+0x1f/0x40
      [456494.314569] CPU: 8 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-305.10.2.x6.0.24.x86_64 #1
      [456494.338328] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018
      
      crash> bt -l
      PID: 2861532 TASK: ffff9c083c05af80 CPU: 4 COMMAND: "lfsck"
      #0 [ffffbd866a4cf8f0] machine_kexec at ffffffff9dc6156e
      /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/arch/x86/kernel/machine_kexec_64.c: 389
      #1 [ffffbd866a4cf948] __crash_kexec at ffffffff9dd8f94d
      /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/kernel/kexec_core.c: 957
      #2 [ffffbd866a4cfa10] panic at ffffffff9dce0dc7
      /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/./arch/x86/include/asm/smp.h: 72
      #3 [ffffbd866a4cfaa0] __ldiskfs_error at ffffffffc1a9252b [ldiskfs]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/inode.c: 4523
      #4 [ffffbd866a4cfb48] ldiskfs_xattr_inode_iget at ffffffffc1a5cf14 [ldiskfs]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/trace/events/ldiskfs.h: 2666
      #5 [ffffbd866a4cfb80] ldiskfs_xattr_inode_get at ffffffffc1a5fd9c [ldiskfs]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/trace/events/ldiskfs.h: 1775
      #6 [ffffbd866a4cfbe0] ldiskfs_xattr_ibody_get at ffffffffc1a601ef [ldiskfs]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/ldiskfs.h: 1572
      #7 [ffffbd866a4cfc48] ldiskfs_xattr_get at ffffffffc1a60295 [ldiskfs]
      /usr/src/kernels/4.18.0-305.10.2.x6.0.24.x86_64/include/linux/quotaops.h: 19
      #8 [ffffbd866a4cfca0] __vfs_getxattr at ffffffff9df43223
      /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/fs/xattr.c: 374
      #9 [ffffbd866a4cfcd0] osd_xattr_get at ffffffffc1b28c07 [osd_ldiskfs]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/include/lustre_compat.h: 540
      #10 [ffffbd866a4cfd18] lfsck_layout_get_lovea at ffffffffc158bd5c [lfsck]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/include/dt_object.h: 2875
      #11 [ffffbd866a4cfd50] lfsck_layout_master_exec_oit at ffffffffc1597025 [lfsck]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/lfsck/lfsck_layout.c: 5711
      #12 [ffffbd866a4cfe08] lfsck_master_oit_engine at ffffffffc1560de2 [lfsck]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/lfsck/lfsck_engine.c: 531
      #13 [ffffbd866a4cfe78] lfsck_master_engine at ffffffffc15619fe [lfsck]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/lfsck/lfsck_engine.c: 1083
      #14 [ffffbd866a4cff10] kthread at ffffffff9dd043a6
      /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/kernel/kthread.c: 319
      #15 [ffffbd866a4cff50] ret_from_fork at ffffffff9e60023f
      /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/arch/x86/entry/entry_64.S: 319
      

      With (READ ONLY) lfsck enabled this crash persisted after rebooting, running e2fsck and raid re-sysc.

      lfsck was eventually cleared by running lctl lfsck_stop on the MDT nodes as early as possible in the mount (and/or failback) until no more lfsck activity was observed.

      Attachments

        Issue Links

          Activity

            [LU-15238] lfsck crashes MDT LDISKFS-fs error (device md65): ldiskfs_xattr_inode_iget:407: comm lfsck: EA inode 2047917093 does not have LDISKFS_EA_INODE_FL flag
            spitzcor Cory Spitz added a comment -

            This bug is fallout from LU-15404. zam wrote in an internal HPE ticket:

            [the] bug may cause EA pointer to point to old EA inode (freed or reused within committed transaction) with the symptoms from this ticket.

            That said, it would be ideal if lfsck would handle the situation gracefully, instead of crashing. Let's downgrade this issue knowing that it won't happen (in this way) if the corruption from LU-15404 is addressed. Then, the scope of this ticket will focus on making lfsck gracefully handle the condition instead (for example, as with LU-14105).

            spitzcor Cory Spitz added a comment - This bug is fallout from LU-15404 . zam wrote in an internal HPE ticket: [the] bug may cause EA pointer to point to old EA inode (freed or reused within committed transaction) with the symptoms from this ticket. That said, it would be ideal if lfsck would handle the situation gracefully, instead of crashing. Let's downgrade this issue knowing that it won't happen (in this way) if the corruption from LU-15404 is addressed. Then, the scope of this ticket will focus on making lfsck gracefully handle the condition instead (for example, as with LU-14105 ).

            Cory mentioned that this may be fallout from LU-15404, when the large xattr has failed to be unlinked because of transaction credits, so it may be that this problem goes away when that issue is fixed (i.e. it may not leave a large xattr inode in the filesystem without LDISKFS_EA_INODE_FL set).

            It probably makes sense to change this case from ext4_error() to ext4_warning_inode() or similar, and return -EIO when accessing that large xattr so that it doesn't cause the filesystem to be remounted read-only? That would be a lot more robust, and would only affect the one inode's xattr.

            adilger Andreas Dilger added a comment - Cory mentioned that this may be fallout from LU-15404 , when the large xattr has failed to be unlinked because of transaction credits, so it may be that this problem goes away when that issue is fixed (i.e. it may not leave a large xattr inode in the filesystem without LDISKFS_EA_INODE_FL set). It probably makes sense to change this case from ext4_error() to ext4_warning_inode() or similar, and return -EIO when accessing that large xattr so that it doesn't cause the filesystem to be remounted read-only? That would be a lot more robust, and would only affect the one inode's xattr.
            spitzcor Cory Spitz added a comment - - edited

            In case it wasn't clear that this bug is about lfsck crashing, not the on-disk defect. I think LU-15265 may be tracking that. It needs to be confirmed yet.

            spitzcor Cory Spitz added a comment - - edited In case it wasn't clear that this bug is about lfsck crashing, not the on-disk defect. I think LU-15265 may be tracking that. It needs to be confirmed yet.

            >Cory Spitz iwho on your team working on this issue? Artem Blagodarenko perhaps?
            pjones, yes, I do.

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - - edited >Cory Spitz iwho on your team working on this issue? Artem Blagodarenko perhaps? pjones , yes, I do.
            pjones Peter Jones added a comment -

            spitzcor iwho on your team working on this issue? artem_blagodarenko perhaps?

            pjones Peter Jones added a comment - spitzcor iwho on your team working on this issue? artem_blagodarenko perhaps?

            My first guess here would be that there is some mismatch in how the RHEL8 kernel is implementing the "ea_inode" feature when it was ported to the upstream kernel vs. how it was patched into ldiskfs previously. That said, the on-disk xattr storage should be internal to ldiskfs, and osd-ldiskfs shouldn't even see that (let alone higher levels), so there does seem to be something wrong in ldiskfs/ext4 itself to get an inconsistency like this.

            adilger Andreas Dilger added a comment - My first guess here would be that there is some mismatch in how the RHEL8 kernel is implementing the " ea_inode " feature when it was ported to the upstream kernel vs. how it was patched into ldiskfs previously. That said, the on-disk xattr storage should be internal to ldiskfs, and osd-ldiskfs shouldn't even see that (let alone higher levels), so there does seem to be something wrong in ldiskfs/ext4 itself to get an inconsistency like this.
            spitzcor Cory Spitz added a comment -

            I've bumped this to blocker because once you fall into the trap you can't (easily) get out of it.

            spitzcor Cory Spitz added a comment - I've bumped this to blocker because once you fall into the trap you can't (easily) get out of it.

            People

              wc-triage WC Triage
              stancheff Shaun Tancheff
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: