Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15238

lfsck crashes MDT LDISKFS-fs error (device md65): ldiskfs_xattr_inode_iget:407: comm lfsck: EA inode 2047917093 does not have LDISKFS_EA_INODE_FL flag

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • None
    • None
    • Server: RHEL8
    • 3
    • 9223372036854775807

    Description

      [458781.070693] LDISKFS-fs error (device md65): ldiskfs_xattr_inode_iget:407: comm lfsck: EA inode 2047917093 does not have LDISKFS_EA_INODE_FL flag
      [458781.136989] Aborting journal on device md65-8.
      [458781.142323] LDISKFS-fs error (device md65) in ldiskfs_evict_inode:251: Journal has aborted
      [458781.153243] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.155099] LustreError: 98016:0:(osd_handler.c:1783:osd_trans_commit_cb()) transaction @0x000000002c9fd616 commit error: 2
      [458781.158848] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.170295] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.170297] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.175978] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.182078] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.199967] Kernel panic - not syncing: LDISKFS-fs (device md65): panic forced after error
      
      [458781.199972] LDISKFS-fs (md65): Remounting filesystem read-only
      [458781.199979] LDISKFS-fs (md65): Remounting filesystem read-only
      [458781.200005] LDISKFS-fs (md65): Remounting filesystem read-only
      [458781.200549] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.200552] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.200840] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.201096] LDISKFS-fs (md65): Remounting filesystem read-only
      [458781.260424] LDISKFS-fs error (device md65): ldiskfs_journal_check_start:61: Detected aborted journal
      [458781.262419] CPU: 4 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-305.10.2.x6.0.24.x86_64 #1
      [458781.262421] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018
      [458781.333307] Call Trace:
      [458781.336774]  dump_stack+0x5c/0x80
      [458781.341219]  panic+0xe7/0x2a9
      [458781.345208]  ? wake_up_q+0x54/0x80
      [458781.349955]  ldiskfs_handle_error.cold.139+0x13/0x13 [ldiskfs]
      [458781.356863]  __ldiskfs_error+0x8b/0x100 [ldiskfs]
      [458781.362710]  ? ldiskfs_htree_fill_tree+0xa0/0x2d0 [ldiskfs]
      [458781.369344]  ldiskfs_xattr_inode_iget+0xf4/0x170 [ldiskfs]
      [458781.375883]  ldiskfs_xattr_inode_get+0x4c/0x1e0 [ldiskfs]
      [458781.382279]  ? xattr_find_entry+0x95/0x110 [ldiskfs]
      [458781.388253]  ldiskfs_xattr_ibody_get+0x15f/0x180 [ldiskfs]
      [458781.394742]  ldiskfs_xattr_get+0x85/0x2d0 [ldiskfs]
      [458781.400634]  __vfs_getxattr+0x53/0x70
      [458781.405326]  osd_xattr_get+0x167/0x650 [osd_ldiskfs]
      [458781.411326]  lfsck_layout_get_lovea.part.77+0x6c/0x260 [lfsck]
      [458781.418171]  lfsck_layout_master_exec_oit+0x1b5/0xc90 [lfsck]
      [458781.424910]  lfsck_master_oit_engine+0xc52/0x1360 [lfsck]
      [458781.432113]  lfsck_master_engine+0x50e/0xcd0 [lfsck]
      [458781.438056]  ? finish_wait+0x80/0x80
      [458781.442568]  ? lfsck_master_oit_engine+0x1360/0x1360 [lfsck]
      [458781.449177]  kthread+0x116/0x130
      [458781.453342]  ? kthread_flush_work_fn+0x10/0x10
      [458781.458686]  ret_from_fork+0x1f/0x40
      

      And many backtraces:

      [456491.541627]  ret_from_fork+0x1f/0x40
      [456491.547490] CPU: 1 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-305.10.2.x6.0.24.x86_64 #1
      [456491.561264] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018
      [456491.569958] Call Trace:
      [456491.573363]  dump_stack+0x5c/0x80
      [456491.577599]  lfsck_trans_create.part.58+0x63/0x70 [lfsck]
      [456491.583966]  lfsck_namespace_trace_update+0xa3b/0xa50 [lfsck]
      [456491.590650]  lfsck_namespace_exec_oit+0x4b3/0x990 [lfsck]
      [456491.597048]  ? down_write+0xe/0x40
      [456491.601438]  lfsck_master_oit_engine+0xc52/0x1360 [lfsck]
      [456491.607787]  lfsck_master_engine+0x50e/0xcd0 [lfsck]
      [456491.613699]  ? finish_wait+0x80/0x80
      [456491.618187]  ? lfsck_master_oit_engine+0x1360/0x1360 [lfsck]
      [456491.624716]  kthread+0x116/0x130
      [456491.628964]  ? kthread_flush_work_fn+0x10/0x10
      [456491.634325]  ret_from_fork+0x1f/0x40
      [456494.228001] CPU: 18 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-305.10.2.x6.0.24.x86_64 #1
      [456494.241276] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018
      [456494.249695] Call Trace:
      [456494.252853]  dump_stack+0x5c/0x80
      [456494.256885]  lfsck_trans_create.part.58+0x63/0x70 [lfsck]
      [456494.262955]  lfsck_namespace_trace_update+0xa3b/0xa50 [lfsck]
      [456494.269296]  lfsck_namespace_exec_oit+0x4b3/0x990 [lfsck]
      [456494.275275]  ? down_write+0xe/0x40
      [456494.279264]  lfsck_master_oit_engine+0xc52/0x1360 [lfsck]
      [456494.285258]  lfsck_master_engine+0x50e/0xcd0 [lfsck]
      [456494.290924]  ? finish_wait+0x80/0x80
      [456494.295116]  ? lfsck_master_oit_engine+0x1360/0x1360 [lfsck]
      [456494.301388]  kthread+0x116/0x130
      [456494.305199]  ? kthread_flush_work_fn+0x10/0x10
      [456494.310227]  ret_from_fork+0x1f/0x40
      [456494.314569] CPU: 8 PID: 2861532 Comm: lfsck Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-305.10.2.x6.0.24.x86_64 #1
      [456494.338328] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0040 06/29/2018
      
      crash> bt -l
      PID: 2861532 TASK: ffff9c083c05af80 CPU: 4 COMMAND: "lfsck"
      #0 [ffffbd866a4cf8f0] machine_kexec at ffffffff9dc6156e
      /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/arch/x86/kernel/machine_kexec_64.c: 389
      #1 [ffffbd866a4cf948] __crash_kexec at ffffffff9dd8f94d
      /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/kernel/kexec_core.c: 957
      #2 [ffffbd866a4cfa10] panic at ffffffff9dce0dc7
      /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/./arch/x86/include/asm/smp.h: 72
      #3 [ffffbd866a4cfaa0] __ldiskfs_error at ffffffffc1a9252b [ldiskfs]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/inode.c: 4523
      #4 [ffffbd866a4cfb48] ldiskfs_xattr_inode_iget at ffffffffc1a5cf14 [ldiskfs]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/trace/events/ldiskfs.h: 2666
      #5 [ffffbd866a4cfb80] ldiskfs_xattr_inode_get at ffffffffc1a5fd9c [ldiskfs]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/trace/events/ldiskfs.h: 1775
      #6 [ffffbd866a4cfbe0] ldiskfs_xattr_ibody_get at ffffffffc1a601ef [ldiskfs]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/ldiskfs/ldiskfs.h: 1572
      #7 [ffffbd866a4cfc48] ldiskfs_xattr_get at ffffffffc1a60295 [ldiskfs]
      /usr/src/kernels/4.18.0-305.10.2.x6.0.24.x86_64/include/linux/quotaops.h: 19
      #8 [ffffbd866a4cfca0] __vfs_getxattr at ffffffff9df43223
      /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/fs/xattr.c: 374
      #9 [ffffbd866a4cfcd0] osd_xattr_get at ffffffffc1b28c07 [osd_ldiskfs]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/include/lustre_compat.h: 540
      #10 [ffffbd866a4cfd18] lfsck_layout_get_lovea at ffffffffc158bd5c [lfsck]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/include/dt_object.h: 2875
      #11 [ffffbd866a4cfd50] lfsck_layout_master_exec_oit at ffffffffc1597025 [lfsck]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/lfsck/lfsck_layout.c: 5711
      #12 [ffffbd866a4cfe08] lfsck_master_oit_engine at ffffffffc1560de2 [lfsck]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/lfsck/lfsck_engine.c: 531
      #13 [ffffbd866a4cfe78] lfsck_master_engine at ffffffffc15619fe [lfsck]
      /home/centos/rpmbuild/BUILD/lustre-2.14.55_81_gc26b347/lustre/lfsck/lfsck_engine.c: 1083
      #14 [ffffbd866a4cff10] kthread at ffffffff9dd043a6
      /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/kernel/kthread.c: 319
      #15 [ffffbd866a4cff50] ret_from_fork at ffffffff9e60023f
      /usr/src/debug/kernel-4.18.0-305.10.2.x6.0.24/linux-4.18.0-305.10.2.x6.0.24.x86_64/arch/x86/entry/entry_64.S: 319
      

      With (READ ONLY) lfsck enabled this crash persisted after rebooting, running e2fsck and raid re-sysc.

      lfsck was eventually cleared by running lctl lfsck_stop on the MDT nodes as early as possible in the mount (and/or failback) until no more lfsck activity was observed.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              stancheff Shaun Tancheff
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: