Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      LDISKFS-fs error (device md6): ldiskfs_get_inode_usage:818: inode #310: comm ll_ost_io00_570: corrupted in-inode xattr
      Aborting journal on device md6-8.
      Kernel panic - not syncing: LDISKFS-fs (device md6): panic forced after error
      
      LDISKFS-fs error (device md6): ldiskfs_journal_check_start:61: Detected aborted journal
      LDISKFS-fs error (device md6): ldiskfs_journal_check_start:61: Detected aborted journal
      LDISKFS-fs (md6): Remounting filesystem read-only
      CPU: 9 PID: 55386 Comm: ll_ost_io00_570 4.18.0-305.10.2.x6.0.29.x86_64 #1
      Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0043 09/24/2019
      Call Trace:
        dump_stack+0x5c/0x80
        panic+0xe7/0x2a9
        ldiskfs_handle_error.cold.139+0x13/0x13 [ldiskfs]
        __ldiskfs_error_inode+0xaf/0x130 [ldiskfs]
        __xattr_check_inode+0x4a/0x70 [ldiskfs]
        ldiskfs_get_inode_usage+0x195/0x290 [ldiskfs]
        __dquot_transfer+0x8a/0x5d0
        dquot_transfer+0x8e/0x130
        osd_quota_transfer+0x188/0x310 [osd_ldiskfs]
        osd_attr_set+0xd4/0x740 [osd_ldiskfs]
        ofd_write_attr_set+0x7bf/0x1070 [ofd]
        ofd_commitrw_write+0x222/0x1a50 [ofd]
        ofd_commitrw+0x458/0xa60 [ofd]
        tgt_brw_write+0x18de/0x2390 [ptlrpc]
        tgt_request_handle+0xc93/0x1a00 [ptlrpc]
        ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
        ptlrpc_main+0xc06/0x1550 [ptlrpc]
      

      Attachments

        Issue Links

          Activity

            [LU-15171] corrupted in-inode xattr
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-16082 [ LU-16082 ]
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45424/
            Subject: LU-15171 osd-ldiskfs: xattr_sem locking is missing for dquot_transfer
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e6c7fcdaf40b130c39af2e3ee8b108c6e31a8ca8

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45424/ Subject: LU-15171 osd-ldiskfs: xattr_sem locking is missing for dquot_transfer Project: fs/lustre-release Branch: master Current Patch Set: Commit: e6c7fcdaf40b130c39af2e3ee8b108c6e31a8ca8
            panda Andrew Perepechko made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Andrew Perepechko [ panda ]

            "Andrew Perepechko <andrew.perepechko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45424
            Subject: LU-15171 osd-ldiskfs: xattr_sem locking is missing for dquot_transfer
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: fb9f97b3481214017be8de89555076262ecaa6e1

            gerrit Gerrit Updater added a comment - "Andrew Perepechko <andrew.perepechko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45424 Subject: LU-15171 osd-ldiskfs: xattr_sem locking is missing for dquot_transfer Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: fb9f97b3481214017be8de89555076262ecaa6e1

            It seems that this issue is not really a corruption. At least I haven't found any corruption in the buffer heads from the crash dump

            Commit 7a9ca53ae (~v4.13) added the requirement for xattr_sem locking when calling *dquot_transfer:

            commit 7a9ca53aea10ad4677a0f347ad7639c304b80194
            Author: Tahsin Erdogan <tahsin@google.com>
            Date:   Thu Jun 22 11:46:48 2017 -0400
            
                quota: add get_inode_usage callback to transfer multi-inode charges
            
            ...
            
            diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
            index 962f28a0e176..d9733aa955e9 100644
            --- a/fs/ext4/inode.c
            +++ b/fs/ext4/inode.c
            @@ -5295,7 +5295,14 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
                                    error = PTR_ERR(handle);
                                    goto err_out;
                            }
            +
            +               /* dquot_transfer() calls back ext4_get_inode_usage() which
            +                * counts xattr inode references.
            +                */
            +               down_read(&EXT4_I(inode)->xattr_sem);
                            error = dquot_transfer(inode, attr);
            +               up_read(&EXT4_I(inode)->xattr_sem);
            +
                            if (error) {
                                    ext4_journal_stop(handle);
                                    return error;
            diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
            index dde8deb11e59..42b3a73143cf 100644
            --- a/fs/ext4/ioctl.c
            +++ b/fs/ext4/ioctl.c
            @@ -373,7 +373,13 @@ static int ext4_ioctl_setproject(struct file *filp, __u32 projid)
             
                    transfer_to[PRJQUOTA] = dqget(sb, make_kqid_projid(kprojid));
                    if (!IS_ERR(transfer_to[PRJQUOTA])) {
            +
            +               /* __dquot_transfer() calls back ext4_get_inode_usage() which
            +                * counts xattr inode references.
            +                */
            +               down_read(&EXT4_I(inode)->xattr_sem);
                            err = __dquot_transfer(inode, transfer_to);
            +               up_read(&EXT4_I(inode)->xattr_sem);
                            dqput(transfer_to[PRJQUOTA]);
                            if (err)
                                    goto out_dirty;
            ...
            

            In Lustre, we do not take this lock. It seems that a first write race is possible when one thread attempts to modify inode xattrs and another thread performs dquot_transfer which analyzes xattr consistency (and eventually fails).

            It seems that we can simply wrap *dquot_transfer() calls with xattr locking for the newer kernels. It should be ok performance-wise for the OST side and, hopefully, for the MDT side either. A proof of concept path will be uploaded for review.

            panda Andrew Perepechko added a comment - It seems that this issue is not really a corruption. At least I haven't found any corruption in the buffer heads from the crash dump Commit 7a9ca53ae (~v4.13) added the requirement for xattr_sem locking when calling *dquot_transfer: commit 7a9ca53aea10ad4677a0f347ad7639c304b80194 Author: Tahsin Erdogan <tahsin@google.com> Date: Thu Jun 22 11:46:48 2017 -0400 quota: add get_inode_usage callback to transfer multi-inode charges ... diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 962f28a0e176..d9733aa955e9 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5295,7 +5295,14 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) error = PTR_ERR(handle); goto err_out; } + + /* dquot_transfer() calls back ext4_get_inode_usage() which + * counts xattr inode references. + */ + down_read(&EXT4_I(inode)->xattr_sem); error = dquot_transfer(inode, attr); + up_read(&EXT4_I(inode)->xattr_sem); + if (error) { ext4_journal_stop(handle); return error; diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index dde8deb11e59..42b3a73143cf 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -373,7 +373,13 @@ static int ext4_ioctl_setproject(struct file *filp, __u32 projid) transfer_to[PRJQUOTA] = dqget(sb, make_kqid_projid(kprojid)); if (!IS_ERR(transfer_to[PRJQUOTA])) { + + /* __dquot_transfer() calls back ext4_get_inode_usage() which + * counts xattr inode references. + */ + down_read(&EXT4_I(inode)->xattr_sem); err = __dquot_transfer(inode, transfer_to); + up_read(&EXT4_I(inode)->xattr_sem); dqput(transfer_to[PRJQUOTA]); if (err) goto out_dirty; ... In Lustre, we do not take this lock. It seems that a first write race is possible when one thread attempts to modify inode xattrs and another thread performs dquot_transfer which analyzes xattr consistency (and eventually fails). It seems that we can simply wrap *dquot_transfer() calls with xattr locking for the newer kernels. It should be ok performance-wise for the OST side and, hopefully, for the MDT side either. A proof of concept path will be uploaded for review.
            adilger Andreas Dilger made changes -
            Labels New: rhel8.3
            adilger Andreas Dilger made changes -
            Priority Original: Minor [ 4 ] New: Major [ 3 ]
            adilger Andreas Dilger made changes -
            Description Original: {code}
            [38643.116004] LDISKFS-fs error (device md6): ldiskfs_get_inode_usage:818: inode #310: comm ll_ost_io00_570: corrupted in-inode xattr
            [38643.161869] Aborting journal on device md6-8.
            [38643.187165] Kernel panic - not syncing: LDISKFS-fs (device md6): panic forced after error

            [38643.198095] LDISKFS-fs error (device md6): ldiskfs_journal_check_start:61: Detected aborted journal
            [38643.198618] LDISKFS-fs error (device md6): ldiskfs_journal_check_start:61: Detected aborted journal
            [38643.198620] LDISKFS-fs (md6): Remounting filesystem read-only
            [38643.199839] CPU: 9 PID: 55386 Comm: ll_ost_io00_570 Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.10.2.x6.0.29.x86_64 #1
            [38643.199840] Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0043 09/24/2019
            [38643.199840] Call Trace:
            [38643.199859] dump_stack+0x5c/0x80
            [38643.265166] panic+0xe7/0x2a9
            [38643.269992] ldiskfs_handle_error.cold.139+0x13/0x13 [ldiskfs]
            [38643.277646] __ldiskfs_error_inode+0xaf/0x130 [ldiskfs]
            [38643.284771] __xattr_check_inode+0x4a/0x70 [ldiskfs]
            [38643.291567] ldiskfs_get_inode_usage+0x195/0x290 [ldiskfs]
            [38643.298829] __dquot_transfer+0x8a/0x5d0
            [38643.304535] ? tracing_record_taskinfo_skip+0x42/0x50
            [38643.311296] ? tracing_record_taskinfo+0xe/0xa0
            [38643.317560] ? try_to_wake_up+0x1cd/0x540
            [38643.323053] ? schedule+0x38/0xa0
            [38643.327761] ? wake_up_q+0x54/0x80
            [38643.332413] ? __mutex_unlock_slowpath.isra.19+0xbe/0x120
            [38643.339066] ? dqget+0x219/0x450
            [38643.343570] dquot_transfer+0x8e/0x130
            [38643.348604] osd_quota_transfer+0x188/0x310 [osd_ldiskfs]
            [38643.355300] ? jbd2__journal_start+0xea/0x1f0 [jbd2]
            [38643.361550] ? osd_trans_start+0x2eb/0x550 [osd_ldiskfs]
            [38643.368413] osd_attr_set+0xd4/0x740 [osd_ldiskfs]
            [38643.374772] ? ofd_attr_handle_id+0x13e/0x530 [ofd]
            [38643.381226] ofd_write_attr_set+0x7bf/0x1070 [ofd]
            [38643.387375] ofd_commitrw_write+0x222/0x1a50 [ofd]
            [38643.393361] ? try_to_del_timer_sync+0x4d/0x80
            [38643.398948] ? del_timer_sync+0x25/0x40
            [38643.404094] ? schedule_timeout+0x19b/0x2f0
            [38643.409443] ofd_commitrw+0x458/0xa60 [ofd]
            [38643.414940] ? apic_timer_interrupt+0xa/0x20
            [38643.420323] ? tgt_brw_write+0x18de/0x2390 [ptlrpc]
            [38643.426585] tgt_brw_write+0x18de/0x2390 [ptlrpc]
            [38643.432550] ? libcfs_debug_msg+0xa7d/0xb20 [libcfs]
            [38643.438639] ? string+0x40/0x50
            [38643.442838] ? _cond_resched+0x15/0x30
            [38643.447477] ? mutex_lock+0xe/0x30
            [38643.452152] tgt_request_handle+0xc93/0x1a00 [ptlrpc]
            [38643.458169] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
            [38643.464906] ptlrpc_main+0xc06/0x1550 [ptlrpc]
            [38643.470380] ? ptlrpc_wait_event+0x500/0x500 [ptlrpc]
            [38643.476396] kthread+0x116/0x130
            [38643.480582] ? kthread_flush_work_fn+0x10/0x10
            [38643.485934] ret_from_fork+0x1f/0x40
            {code}
            New: {code:java}
            LDISKFS-fs error (device md6): ldiskfs_get_inode_usage:818: inode #310: comm ll_ost_io00_570: corrupted in-inode xattr
            Aborting journal on device md6-8.
            Kernel panic - not syncing: LDISKFS-fs (device md6): panic forced after error

            LDISKFS-fs error (device md6): ldiskfs_journal_check_start:61: Detected aborted journal
            LDISKFS-fs error (device md6): ldiskfs_journal_check_start:61: Detected aborted journal
            LDISKFS-fs (md6): Remounting filesystem read-only
            CPU: 9 PID: 55386 Comm: ll_ost_io00_570 4.18.0-305.10.2.x6.0.29.x86_64 #1
            Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0043 09/24/2019
            Call Trace:
              dump_stack+0x5c/0x80
              panic+0xe7/0x2a9
              ldiskfs_handle_error.cold.139+0x13/0x13 [ldiskfs]
              __ldiskfs_error_inode+0xaf/0x130 [ldiskfs]
              __xattr_check_inode+0x4a/0x70 [ldiskfs]
              ldiskfs_get_inode_usage+0x195/0x290 [ldiskfs]
              __dquot_transfer+0x8a/0x5d0
              dquot_transfer+0x8e/0x130
              osd_quota_transfer+0x188/0x310 [osd_ldiskfs]
              osd_attr_set+0xd4/0x740 [osd_ldiskfs]
              ofd_write_attr_set+0x7bf/0x1070 [ofd]
              ofd_commitrw_write+0x222/0x1a50 [ofd]
              ofd_commitrw+0x458/0xa60 [ofd]
              tgt_brw_write+0x18de/0x2390 [ptlrpc]
              tgt_request_handle+0xc93/0x1a00 [ptlrpc]
              ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
              ptlrpc_main+0xc06/0x1550 [ptlrpc]
            {code}

            People

              panda Andrew Perepechko
              stancheff Shaun Tancheff
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: