[LU-9723] test large xattr (ea_inode) patch interoperability - Whamcloud Community JIRA

Details

Type: Task
Resolution: Fixed
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
None

Rank (Obsolete):
9223372036854775807

Description

The large xattr (ea_inode) feature patches are being merged into the upstream 4.13 kernel, but additional changes are being built on top of the EA inode functionality that we use in the ldiskfs patch series. In particular, in the upstream patch series the EA inodes can be shared among multiple parent inodes, and instead of having a backpointer to the parent inode/generation the shared inodes have a refcount and a hash of the xattr value to verify that the correct xattr is referenced. Once we update to a newer vendor kernel that includes these changes, we can remove the patch from ldiskfs to reduce ongoing maintenance efforts, but we can't wait until that time to verify that the feature works properly with existing Lustre filesystems.

There is supposed to be interoperability functionality in the upstream feature to allow access to existing Lustre EA inodes. We need to test that the ext4 patches being landed to the upstream kernel are able to work with xattrs created by Lustre, and that the upstream e2fsprogs will not consider Lustre EA inodes to be corrupt. This testing needs to be done in the next week or two, to ensure that we can feed back any issues to the upstream ext4 maintainers before that feature is released in an upstream kernel.

For testing, something like the following process should be sufficient:

create an MDT filesystem with an existing RHEL7 kernel with --mkfsoptions="-O ea_inode"
mount filesystem as type lustre
create large user.test xattrs via setfattr (up to 64KiB) with verifiable data (e.g. filename repeated many times)
dump all xattrs via getfattr -d -m user.test testfiles > xattrs.lustre
upgrade the kernel to upstream kernel
disable the dirdata feature via debugfs -w -R "feature ^dirdata" /dev/MDT and mount it as type ext4
dump all xattrs via getfattr -d -m user.test testfiles > xattrs.ext4 and compare to xattrs.lustre to verify xattr consistency and ensure that no errors are generated by the kernel
run new e2fsck on updated filesystem to verify that it does not consider the EA inodes as corrupted, at worst it should offer to fix the refcount and hash values of those inodes

The upstream kernel patches are included on the dev branch of https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git (all of the patches from author "Tahsin Erdogan") but could potentially also be pushed as a series to the fs/linux-staging branch (with Test-Parameters: forbuildonly since there is no benefit to Lustre testing on them) in order to have Jenkins build the patches for testing via loadjenkinsbuild.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

e2fsck_new.log
2 kB
10/Jul/17 8:27 AM
e2fsck.log
2 kB
10/Jul/17 8:27 AM
xattrs_after_new_e2fs.lustre
64 kB
10/Jul/17 8:27 AM
xattrs.ext4
64 kB
10/Jul/17 8:27 AM
xattrs.lustre
64 kB
10/Jul/17 8:27 AM

Issue Links

is related to

LU-9724 update ext4-large-eas.patch to match upstream ext4 feature

Resolved

Activity

[LU-9723] test large xattr (ea_inode) patch interoperability

Emoly Liu added a comment - 12/Jul/17 12:49 AM

adilger, do you mean ext4 "dev" branch? There is no "next" branch at https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git

-sh-4.1$ git branch -a
  dev
* master
  remotes/origin/3.2-punch-fix
  remotes/origin/HEAD -> origin/master
  remotes/origin/backport-to-3.10
  remotes/origin/crypto
  remotes/origin/crypto-3.14
  remotes/origin/dev
  remotes/origin/ext4-tools
  remotes/origin/for-stable
  remotes/origin/fscrypt
  remotes/origin/lazy_journal
  remotes/origin/master
  remotes/origin/origin
  remotes/origin/test
  remotes/origin/test-mb_generate_buddy-failure
  remotes/origin/unstable

Emoly Liu added a comment - 12/Jul/17 12:49 AM adilger , do you mean ext4 "dev" branch? There is no "next" branch at https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git -sh-4.1$ git branch -a dev * master remotes/origin/3.2-punch-fix remotes/origin/HEAD -> origin/master remotes/origin/backport-to-3.10 remotes/origin/crypto remotes/origin/crypto-3.14 remotes/origin/dev remotes/origin/ext4-tools remotes/origin/for-stable remotes/origin/fscrypt remotes/origin/lazy_journal remotes/origin/master remotes/origin/origin remotes/origin/test remotes/origin/test-mb_generate_buddy-failure remotes/origin/unstable

Andreas Dilger added a comment - 11/Jul/17 3:34 PM

Emoly, can you please make a patch (git commit) for this on the ext4 "next" branch, with a proper commit comment with Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9723 and Signed-off-by: line, and then send it to the list with "git send-email --to tytso@mit.edu --cc linux-ext4@vger.kernel.org". This should be done as soon as possible so that it will be included with the patch series going upstream.

Andreas Dilger added a comment - 11/Jul/17 3:34 PM Emoly, can you please make a patch (git commit) for this on the ext4 "next" branch, with a proper commit comment with Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9723 and Signed-off-by: line, and then send it to the list with " git send-email --to tytso@mit.edu --cc linux-ext4@vger.kernel.org ". This should be done as soon as possible so that it will be included with the patch series going upstream.

Emoly Liu added a comment - 10/Jul/17 8:29 AM

I just uploaded the latest testing results here.

Emoly Liu added a comment - 10/Jul/17 8:29 AM I just uploaded the latest testing results here.

Emoly Liu added a comment - 10/Jul/17 8:03 AM

adilger, with the following fix, all the tests can pass.

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index ce12c3f..c7876c2 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -451,6 +451,7 @@ static int ext4_xattr_inode_iget(struct inode *parent, unsigned long ea_ino,
                }
                /* Do not add ea_inode to the cache. */
                ea_inode_cache = NULL;
+               err = 0;
        } else if (err)
                goto out;

Emoly Liu added a comment - 10/Jul/17 8:03 AM adilger , with the following fix, all the tests can pass. diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index ce12c3f..c7876c2 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -451,6 +451,7 @@ static int ext4_xattr_inode_iget(struct inode *parent, unsigned long ea_ino, } /* Do not add ea_inode to the cache. */ ea_inode_cache = NULL; + err = 0; } else if (err) goto out;

Andreas Dilger added a comment - 10/Jul/17 4:37 AM

So it looks like all that is needed is to set err = 0 in the case that the parent inode and generation match ("Do not add ea_inode to the cache") . That must be true or the error would have been printed.

Could you please give that a try? If that fixes the problem then please attach the patch here for review and then we can send it upstream.

Andreas Dilger added a comment - 10/Jul/17 4:37 AM So it looks like all that is needed is to set err = 0 in the case that the parent inode and generation match ("Do not add ea_inode to the cache") . That must be true or the error would have been printed. Could you please give that a try? If that fixes the problem then please attach the patch here for review and then we can send it upstream.

Emoly Liu added a comment - 10/Jul/17 3:57 AM - edited

adilger and bzzz, the debugging information shows the error " -EFSCORRUPTED" is returned from the following code in fs/ext4/xattr.c: ext4_xattr_inode_verify_hashes()

static int
ext4_xattr_inode_verify_hashes(struct inode *ea_inode,
                               struct ext4_xattr_entry *entry, void *buffer,
                               size_t size)
{
        u32 hash;

        /* Verify stored hash matches calculated hash. */
        hash = ext4_xattr_inode_hash(EXT4_SB(ea_inode->i_sb), buffer, size);
        if (hash != ext4_xattr_inode_get_hash(ea_inode))
                return -EFSCORRUPTED;   //return error in this line

        if (entry) {
                __le32 e_hash, tmp_data;

                /* Verify entry hash. */
                tmp_data = cpu_to_le32(hash);
                e_hash = ext4_xattr_hash_entry(entry->e_name, entry->e_name_len,
                                               &tmp_data, 1);
                if (e_hash != entry->e_hash)
                        return -EFSCORRUPTED;
        }
        return 0;
}

and then called by ext4_xattr_inode_get():

        err = ext4_xattr_inode_verify_hashes(ea_inode, entry, buffer, size);
        /*
         * Compatibility check for old Lustre ea_inode implementation. Old
         * version does not have hash validation, but it has a backpointer
         * from ea_inode to the parent inode.
         */
        if (err == -EFSCORRUPTED) {
                if (EXT4_XATTR_INODE_GET_PARENT(ea_inode) != inode->i_ino ||
                    ea_inode->i_generation != inode->i_generation) {
                        ext4_warning_inode(ea_inode,
                                           "EA inode hash validation failed");
                        goto out;
                }
                /* Do not add ea_inode to the cache. */
                ea_inode_cache = NULL;
        } else if (err)
                goto out;

Emoly Liu added a comment - 10/Jul/17 3:57 AM - edited adilger and bzzz , the debugging information shows the error " -EFSCORRUPTED" is returned from the following code in fs/ext4/xattr.c: ext4_xattr_inode_verify_hashes() static int ext4_xattr_inode_verify_hashes(struct inode *ea_inode, struct ext4_xattr_entry *entry, void *buffer, size_t size) { u32 hash; /* Verify stored hash matches calculated hash. */ hash = ext4_xattr_inode_hash(EXT4_SB(ea_inode->i_sb), buffer, size); if (hash != ext4_xattr_inode_get_hash(ea_inode)) return -EFSCORRUPTED; //return error in this line if (entry) { __le32 e_hash, tmp_data; /* Verify entry hash. */ tmp_data = cpu_to_le32(hash); e_hash = ext4_xattr_hash_entry(entry->e_name, entry->e_name_len, &tmp_data, 1); if (e_hash != entry->e_hash) return -EFSCORRUPTED; } return 0; } and then called by ext4_xattr_inode_get(): err = ext4_xattr_inode_verify_hashes(ea_inode, entry, buffer, size); /* * Compatibility check for old Lustre ea_inode implementation. Old * version does not have hash validation, but it has a backpointer * from ea_inode to the parent inode. */ if (err == -EFSCORRUPTED) { if (EXT4_XATTR_INODE_GET_PARENT(ea_inode) != inode->i_ino || ea_inode->i_generation != inode->i_generation) { ext4_warning_inode(ea_inode, "EA inode hash validation failed"); goto out; } /* Do not add ea_inode to the cache. */ ea_inode_cache = NULL; } else if (err) goto out;

Emoly Liu added a comment - 07/Jul/17 10:54 AM

Could you please also run the upstream e2fsck from https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git"next" branch against the MDT filesystem after the Lustre e2fsck 1.42.13.wc5 is run to clean up the dirdata errors. That should be possible on the current filesystem if it is still available, but it may prevent debugging the -EUCLEAN error further.

Here is the output of running the new e2fsck on the MDT device after the Lustre e2fsck 1.42.13.wc5 is run to clean up the dirdata errors.

[root@centos7-4 e2fsck]# ./e2fsck -d -v -t -t -f -y /tmp/lustre-mdt1 2>&1 | tee e2fsck_new.log
e2fsck 1.43.5-WIP (17-Feb-2017)
Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 272k/0k (94k/179k), time:  0.00/ 0.00/ 0.00
Pass 1: I/O read: 1MB, write: 1MB, rate: 521.10MB/s
Pass 2: Checking directory structure
Entry '..' in /ROOT/.lustre/fid (53378) has an incorrect filetype (was 18, should be 2).
Fix? yes

Entry '..' in /ROOT/.lustre/lost+found (53379) has an incorrect filetype (was 18, should be 2).
Fix? yes

Pass 2: Memory used: 272k/0k (102k/171k), time:  0.00/ 0.00/ 0.00
Pass 2: I/O read: 1MB, write: 1MB, rate: 660.07MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 272k/0k (103k/170k), time:  0.01/ 0.00/ 0.00
Pass 3A: Memory used: 272k/0k (103k/170k), time:  0.00/ 0.00/ 0.00
Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 3: Memory used: 272k/0k (101k/172k), time:  0.00/ 0.00/ 0.00
Pass 3: I/O read: 1MB, write: 0MB, rate: 6211.18MB/s
Pass 4: Checking reference counts
Pass 4: Memory used: 272k/0k (65k/208k), time:  0.00/ 0.00/ 0.00
Pass 4: I/O read: 1MB, write: 0MB, rate: 554.32MB/s
Pass 5: Checking group summary information
Pass 5: Memory used: 272k/0k (64k/209k), time:  0.00/ 0.00/ 0.00
Pass 5: I/O read: 1MB, write: 0MB, rate: 547.65MB/s

lustre-MDT0000: ***** FILE SYSTEM WAS MODIFIED *****

         265 inodes used (0.33%, out of 79992)
           3 non-contiguous files (1.1%)
           0 non-contiguous directories (0.0%)
             # of inodes with ind/dind/tind blocks: 1/0/0
       24543 blocks used (49.09%, out of 50000)
           0 bad blocks
           1 large file

         140 regular files
         116 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
           0 symbolic links (0 fast symbolic links)
           0 sockets
------------
         255 files
Memory used: 272k/0k (63k/210k), time:  0.03/ 0.01/ 0.00
I/O read: 2MB, write: 1MB, rate: 74.95MB/s

I will upload the result in file e2fsck_new.log.

It would also be good to check if it is possible to mount the filesystem back as type Lustre on the old kernel after the new e2fsck is run, since this should keep the existing large xattrs as-is so that we can still mount use the old ldiskfs ea_inode code.

The answer is yes. I can mount the filesystem back as type Lustre on the old kernel after the new e2fsck is run. Here is the output:

[root@centos7-4 tests]# mount -t lustre   -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1
[root@centos7-4 tests]# mount -t lustre   -o loop /tmp/lustre-ost1 /mnt/lustre-ost1
[root@centos7-4 tests]# mount -t lustre -o user_xattr,flock centos7-4@tcp:/lustre /mnt/lustre
[root@centos7-4 tests]# ls -al /mnt/lustre/testfile 
-rw-r--r--. 1 root root 0 Jul  7 12:28 /mnt/lustre/testfile
[root@centos7-4 tests]# getfattr -d -m user.test /mnt/lustre/testfile | tee xattrs_after_new_e2fs.lustre
getfattr: Removing leading '/' from absolute path names
# file: mnt/lustre/testfile
user.test="abcdefghijklmnopqrstuvwxyz123456abcdefghijk...

I will upload file xattrs_after_new_e2fs.lustre later too.

Next, as we discussed, I will add some printk() lines to the places where -EFSCORRUPTED is returned so that we can see why it thinks the xattr is bad.

Emoly Liu added a comment - 07/Jul/17 10:54 AM Could you please also run the upstream e2fsck from https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git "next" branch against the MDT filesystem after the Lustre e2fsck 1.42.13.wc5 is run to clean up the dirdata errors. That should be possible on the current filesystem if it is still available, but it may prevent debugging the -EUCLEAN error further. Here is the output of running the new e2fsck on the MDT device after the Lustre e2fsck 1.42.13.wc5 is run to clean up the dirdata errors. [root@centos7-4 e2fsck]# ./e2fsck -d -v -t -t -f -y /tmp/lustre-mdt1 2>&1 | tee e2fsck_new.log e2fsck 1.43.5-WIP (17-Feb-2017) Pass 1: Checking inodes, blocks, and sizes Pass 1: Memory used: 272k/0k (94k/179k), time: 0.00/ 0.00/ 0.00 Pass 1: I/O read: 1MB, write: 1MB, rate: 521.10MB/s Pass 2: Checking directory structure Entry '..' in /ROOT/.lustre/fid (53378) has an incorrect filetype (was 18, should be 2). Fix? yes Entry '..' in /ROOT/.lustre/lost+found (53379) has an incorrect filetype (was 18, should be 2). Fix? yes Pass 2: Memory used: 272k/0k (102k/171k), time: 0.00/ 0.00/ 0.00 Pass 2: I/O read: 1MB, write: 1MB, rate: 660.07MB/s Pass 3: Checking directory connectivity Peak memory: Memory used: 272k/0k (103k/170k), time: 0.01/ 0.00/ 0.00 Pass 3A: Memory used: 272k/0k (103k/170k), time: 0.00/ 0.00/ 0.00 Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s Pass 3: Memory used: 272k/0k (101k/172k), time: 0.00/ 0.00/ 0.00 Pass 3: I/O read: 1MB, write: 0MB, rate: 6211.18MB/s Pass 4: Checking reference counts Pass 4: Memory used: 272k/0k (65k/208k), time: 0.00/ 0.00/ 0.00 Pass 4: I/O read: 1MB, write: 0MB, rate: 554.32MB/s Pass 5: Checking group summary information Pass 5: Memory used: 272k/0k (64k/209k), time: 0.00/ 0.00/ 0.00 Pass 5: I/O read: 1MB, write: 0MB, rate: 547.65MB/s lustre-MDT0000: ***** FILE SYSTEM WAS MODIFIED ***** 265 inodes used (0.33%, out of 79992) 3 non-contiguous files (1.1%) 0 non-contiguous directories (0.0%) # of inodes with ind/dind/tind blocks: 1/0/0 24543 blocks used (49.09%, out of 50000) 0 bad blocks 1 large file 140 regular files 116 directories 0 character device files 0 block device files 0 fifos 0 links 0 symbolic links (0 fast symbolic links) 0 sockets ------------ 255 files Memory used: 272k/0k (63k/210k), time: 0.03/ 0.01/ 0.00 I/O read: 2MB, write: 1MB, rate: 74.95MB/s I will upload the result in file e2fsck_new.log. It would also be good to check if it is possible to mount the filesystem back as type Lustre on the old kernel after the new e2fsck is run, since this should keep the existing large xattrs as-is so that we can still mount use the old ldiskfs ea_inode code. The answer is yes. I can mount the filesystem back as type Lustre on the old kernel after the new e2fsck is run. Here is the output: [root@centos7-4 tests]# mount -t lustre -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1 [root@centos7-4 tests]# mount -t lustre -o loop /tmp/lustre-ost1 /mnt/lustre-ost1 [root@centos7-4 tests]# mount -t lustre -o user_xattr,flock centos7-4@tcp:/lustre /mnt/lustre [root@centos7-4 tests]# ls -al /mnt/lustre/testfile -rw-r--r--. 1 root root 0 Jul 7 12:28 /mnt/lustre/testfile [root@centos7-4 tests]# getfattr -d -m user.test /mnt/lustre/testfile | tee xattrs_after_new_e2fs.lustre getfattr: Removing leading '/' from absolute path names # file: mnt/lustre/testfile user.test="abcdefghijklmnopqrstuvwxyz123456abcdefghijk... I will upload file xattrs_after_new_e2fs.lustre later too. Next, as we discussed, I will add some printk() lines to the places where -EFSCORRUPTED is returned so that we can see why it thinks the xattr is bad.

Emoly Liu added a comment - 07/Jul/17 6:46 AM

Please check if there are any console messages from ext4 when trying to access the large xattr on the new kernel. This will help debug where the error is coming from.

I didn't see more messages on the new kernel console and didn't see other useful messages in the system log.

Could you please also run the upstream e2fsck from https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git"next" branch against the MDT filesystem after the Lustre e2fsck 1.42.13.wc5 is run to clean up the dirdata errors. That should be possible on the current filesystem if it is still available, but it may prevent debugging the -EUCLEAN error further.

It would also be good to check if it is possible to mount the filesystem back as type Lustre on the old kernel after the new e2fsck is run, since this should keep the existing large xattrs as-is so that we can still mount use the old ldiskfs ea_inode code.

OK, I will download that upstream e2fsck and do these tests.

Emoly Liu added a comment - 07/Jul/17 6:46 AM Please check if there are any console messages from ext4 when trying to access the large xattr on the new kernel. This will help debug where the error is coming from. I didn't see more messages on the new kernel console and didn't see other useful messages in the system log. Could you please also run the upstream e2fsck from https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git "next" branch against the MDT filesystem after the Lustre e2fsck 1.42.13.wc5 is run to clean up the dirdata errors. That should be possible on the current filesystem if it is still available, but it may prevent debugging the -EUCLEAN error further. It would also be good to check if it is possible to mount the filesystem back as type Lustre on the old kernel after the new e2fsck is run, since this should keep the existing large xattrs as-is so that we can still mount use the old ldiskfs ea_inode code. OK, I will download that upstream e2fsck and do these tests.

Andreas Dilger added a comment - 07/Jul/17 6:27 AM

Please check if there are any console messages from ext4 when trying to access the large xattr on the new kernel. This will help debug where the error is coming from.

Could you please also run the upstream e2fsck from https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git "next" branch against the MDT filesystem after the Lustre e2fsck 1.42.13.wc5 is run to clean up the dirdata errors. That should be possible on the current filesystem if it is still available, but it may prevent debugging the -EUCLEAN error further.

It would also be good to check if it is possible to mount the filesystem back as type Lustre on the old kernel after the new e2fsck is run, since this should keep the existing large xattrs as-is so that we can still mount use the old ldiskfs ea_inode code.

Andreas Dilger added a comment - 07/Jul/17 6:27 AM Please check if there are any console messages from ext4 when trying to access the large xattr on the new kernel. This will help debug where the error is coming from. Could you please also run the upstream e2fsck from https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git "next" branch against the MDT filesystem after the Lustre e2fsck 1.42.13.wc5 is run to clean up the dirdata errors. That should be possible on the current filesystem if it is still available, but it may prevent debugging the -EUCLEAN error further. It would also be good to check if it is possible to mount the filesystem back as type Lustre on the old kernel after the new e2fsck is run, since this should keep the existing large xattrs as-is so that we can still mount use the old ldiskfs ea_inode code.

Emoly Liu added a comment - 07/Jul/17 4:36 AM - edited

Andreas, sorry, the test I did yesterday was not correct. Here are the new steps and results:

env:
- lustre: top commit "f7df236 ~~LU-9183~~"
- lustre kernel: 3.10.0-514.16.1.el7_lustre.x86_64
- upstream kernel: top commit "0c5f031 Add linux-next specific files for 20170705" applied with ext4 patch "6f82dfb ext4: change fast symlink test to not rely on i_blocks"; compiled version "4.12.0-lu9723-next-20170705+"
- e2fsprogs: 1.42.13.wc6-7.el7.x86_64

steps:

# mkfs.lustre --mgs --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=lov.stripesize=1048576 --param=lov.stripecount=0 --param=mdt.identity_upcall=/root/lustre-release/lustre/tests/../utils/l_getidentity --backfstype=ldiskfs --device-size=200000 --mkfsoptions="-O ea_inode" --reformat /tmp/lustre-mdt1
# mkfs.lustre --mgsnode=centos7-4@tcp --fsname=lustre --ost --index=0 --param=sys.timeout=20 --backfstype=ldiskfs --device-size=400000 --reformat /tmp/lustre-ost1
# mount -t lustre   -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1
# mount -t lustre   -o loop /tmp/lustre-ost1 /mnt/lustre-ost1
# mount -t lustre -o user_xattr,flock centos7-4@tcp:/lustre /mnt/lustre
# touch /mnt/lustre/testfile
# setfattr -n user.test -v $(for i in `seq 1 2048`; do echo -n "abcdefghijklmnopqrstuvwxyz123456";done) /mnt/lustre/testfile
# getfattr -d -m user.test /mnt/lustre/testfile > xattrs.lustre

Then upgrade to the upstream kernel

# debugfs -w -R 'feature ^dirdata' /tmp/lustre-mdt1
debugfs 1.42.13.wc6 (05-Feb-2017)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype flex_bg ea_inode sparse_super large_file huge_file uninit_bg dir_nlink quota
# mount -t ext4 /tmp/lustre-mdt1 /mnt/lustre-mds1
# getfattr -d -m user.test /mnt/lustre-mds1/ROOT/testfile 2>&1 | tee xattrs.ext4
/mnt/lustre-mds1/ROOT/testfile: user.test: Structure needs cleaning
# umount /mnt/lustre-mds1
# e2fsck -d -v -t -t -f -y /tmp/lustre-mdt1 2>&1 | tee e2fsck.log
e2fsck 1.42.13.wc6 (05-Feb-2017)
Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 264k/0k (83k/182k), time:  0.00/ 0.00/ 0.00
Pass 1: I/O read: 1MB, write: 1MB, rate: 492.85MB/s
Pass 2: Checking directory structure
Entry '[0x200000400:0x1:0x0]' in /update_log_dir (53375) dirdata length set incorrectly.
Clear? yes

Entry '.lustre' in /ROOT (53376) dirdata length set incorrectly.
Clear? yes

Entry 'testfile' in /ROOT (53376) dirdata length set incorrectly.
Clear? yes

Entry '..' in /ROOT/.lustre (53377) dirdata length set incorrectly.
Clear? yes

Entry 'fid' in /ROOT/.lustre (53377) dirdata length set incorrectly.
Clear? yes

Entry 'lost+found' in /ROOT/.lustre (53377) dirdata length set incorrectly.
Clear? yes

Entry '..' in /ROOT/.lustre/fid (53378) dirdata length set incorrectly.
Clear? yes

Entry '..' in /ROOT/.lustre/lost+found (53379) dirdata length set incorrectly.
Clear? yes

Pass 2: Memory used: 264k/0k (91k/174k), time:  0.00/ 0.00/ 0.00
Pass 2: I/O read: 1MB, write: 1MB, rate: 229.78MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 264k/0k (91k/174k), time:  0.01/ 0.00/ 0.00
Pass 3A: Memory used: 264k/0k (91k/174k), time:  0.00/ 0.00/ 0.00
Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 3: Memory used: 264k/0k (89k/176k), time:  0.00/ 0.00/ 0.00
Pass 3: I/O read: 1MB, write: 0MB, rate: 9803.92MB/s
Pass 4: Checking reference counts
Pass 4: Memory used: 264k/0k (61k/204k), time:  0.00/ 0.00/ 0.00
Pass 4: I/O read: 1MB, write: 0MB, rate: 512.56MB/s
Pass 5: Checking group summary information
Pass 5: Memory used: 264k/0k (60k/205k), time:  0.00/ 0.00/ 0.00
Pass 5: I/O read: 1MB, write: 0MB, rate: 396.04MB/s

lustre-MDT0000: ***** FILE SYSTEM WAS MODIFIED *****

         265 inodes used (0.33%, out of 79992)
           3 non-contiguous files (1.1%)
           0 non-contiguous directories (0.0%)
             # of inodes with ind/dind/tind blocks: 1/0/0
       24543 blocks used (49.09%, out of 50000)
           0 bad blocks
           1 large file

         140 regular files
         116 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
           0 symbolic links (0 fast symbolic links)
           0 sockets
------------
         255 files
Memory used: 264k/0k (60k/205k), time:  0.02/ 0.01/ 0.00
I/O read: 1MB, write: 1MB, rate: 60.79MB/s

We can see getfattr report error "Structure needs cleaning" on upstream kernel. When I ran "strace getfattr xxx", it showed the following logs:

getxattr("/mnt/lustre-mds1/ROOT/testfile", "user.test", 0x0, 0) = 65536
getxattr("/mnt/lustre-mds1/ROOT/testfile", "user.test", 0x1685b20, 65536) = -1 EUCLEAN (Structure needs cleaning)
write(2, "/mnt/lustre-mds1/ROOT/testfile: ", 32/mnt/lustre-mds1/ROOT/testfile: ) = 32

I will upload xattr.*, e2fsck.log and strace.log later.

Emoly Liu added a comment - 07/Jul/17 4:36 AM - edited Andreas, sorry, the test I did yesterday was not correct. Here are the new steps and results: env: lustre: top commit "f7df236 LU-9183 " lustre kernel: 3.10.0-514.16.1.el7_lustre.x86_64 upstream kernel: top commit "0c5f031 Add linux-next specific files for 20170705" applied with ext4 patch "6f82dfb ext4: change fast symlink test to not rely on i_blocks"; compiled version "4.12.0-lu9723-next-20170705+" e2fsprogs: 1.42.13.wc6-7.el7.x86_64 steps: # mkfs.lustre --mgs --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=lov.stripesize=1048576 --param=lov.stripecount=0 --param=mdt.identity_upcall=/root/lustre-release/lustre/tests/../utils/l_getidentity --backfstype=ldiskfs --device-size=200000 --mkfsoptions="-O ea_inode" --reformat /tmp/lustre-mdt1 # mkfs.lustre --mgsnode=centos7-4@tcp --fsname=lustre --ost --index=0 --param=sys.timeout=20 --backfstype=ldiskfs --device-size=400000 --reformat /tmp/lustre-ost1 # mount -t lustre -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1 # mount -t lustre -o loop /tmp/lustre-ost1 /mnt/lustre-ost1 # mount -t lustre -o user_xattr,flock centos7-4@tcp:/lustre /mnt/lustre # touch /mnt/lustre/testfile # setfattr -n user.test -v $(for i in `seq 1 2048`; do echo -n "abcdefghijklmnopqrstuvwxyz123456";done) /mnt/lustre/testfile # getfattr -d -m user.test /mnt/lustre/testfile > xattrs.lustre Then upgrade to the upstream kernel # debugfs -w -R 'feature ^dirdata' /tmp/lustre-mdt1 debugfs 1.42.13.wc6 (05-Feb-2017) Filesystem features: has_journal ext_attr resize_inode dir_index filetype flex_bg ea_inode sparse_super large_file huge_file uninit_bg dir_nlink quota # mount -t ext4 /tmp/lustre-mdt1 /mnt/lustre-mds1 # getfattr -d -m user.test /mnt/lustre-mds1/ROOT/testfile 2>&1 | tee xattrs.ext4 /mnt/lustre-mds1/ROOT/testfile: user.test: Structure needs cleaning # umount /mnt/lustre-mds1 # e2fsck -d -v -t -t -f -y /tmp/lustre-mdt1 2>&1 | tee e2fsck.log e2fsck 1.42.13.wc6 (05-Feb-2017) Pass 1: Checking inodes, blocks, and sizes Pass 1: Memory used: 264k/0k (83k/182k), time: 0.00/ 0.00/ 0.00 Pass 1: I/O read: 1MB, write: 1MB, rate: 492.85MB/s Pass 2: Checking directory structure Entry '[0x200000400:0x1:0x0]' in /update_log_dir (53375) dirdata length set incorrectly. Clear? yes Entry '.lustre' in /ROOT (53376) dirdata length set incorrectly. Clear? yes Entry 'testfile' in /ROOT (53376) dirdata length set incorrectly. Clear? yes Entry '..' in /ROOT/.lustre (53377) dirdata length set incorrectly. Clear? yes Entry 'fid' in /ROOT/.lustre (53377) dirdata length set incorrectly. Clear? yes Entry 'lost+found' in /ROOT/.lustre (53377) dirdata length set incorrectly. Clear? yes Entry '..' in /ROOT/.lustre/fid (53378) dirdata length set incorrectly. Clear? yes Entry '..' in /ROOT/.lustre/lost+found (53379) dirdata length set incorrectly. Clear? yes Pass 2: Memory used: 264k/0k (91k/174k), time: 0.00/ 0.00/ 0.00 Pass 2: I/O read: 1MB, write: 1MB, rate: 229.78MB/s Pass 3: Checking directory connectivity Peak memory: Memory used: 264k/0k (91k/174k), time: 0.01/ 0.00/ 0.00 Pass 3A: Memory used: 264k/0k (91k/174k), time: 0.00/ 0.00/ 0.00 Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s Pass 3: Memory used: 264k/0k (89k/176k), time: 0.00/ 0.00/ 0.00 Pass 3: I/O read: 1MB, write: 0MB, rate: 9803.92MB/s Pass 4: Checking reference counts Pass 4: Memory used: 264k/0k (61k/204k), time: 0.00/ 0.00/ 0.00 Pass 4: I/O read: 1MB, write: 0MB, rate: 512.56MB/s Pass 5: Checking group summary information Pass 5: Memory used: 264k/0k (60k/205k), time: 0.00/ 0.00/ 0.00 Pass 5: I/O read: 1MB, write: 0MB, rate: 396.04MB/s lustre-MDT0000: ***** FILE SYSTEM WAS MODIFIED ***** 265 inodes used (0.33%, out of 79992) 3 non-contiguous files (1.1%) 0 non-contiguous directories (0.0%) # of inodes with ind/dind/tind blocks: 1/0/0 24543 blocks used (49.09%, out of 50000) 0 bad blocks 1 large file 140 regular files 116 directories 0 character device files 0 block device files 0 fifos 0 links 0 symbolic links (0 fast symbolic links) 0 sockets ------------ 255 files Memory used: 264k/0k (60k/205k), time: 0.02/ 0.01/ 0.00 I/O read: 1MB, write: 1MB, rate: 60.79MB/s We can see getfattr report error " Structure needs cleaning " on upstream kernel. When I ran "strace getfattr xxx", it showed the following logs: getxattr("/mnt/lustre-mds1/ROOT/testfile", "user.test", 0x0, 0) = 65536 getxattr("/mnt/lustre-mds1/ROOT/testfile", "user.test", 0x1685b20, 65536) = -1 EUCLEAN (Structure needs cleaning) write(2, "/mnt/lustre-mds1/ROOT/testfile: ", 32/mnt/lustre-mds1/ROOT/testfile: ) = 32 I will upload xattr.*, e2fsck.log and strace.log later.

Andreas Dilger added a comment - 06/Jul/17 6:54 PM

Hi Emoly, thank you for your testing. It looks like the upstream kernel ea_inode feature works properly to access the Lustre xattrs, which is great.

Could you please check if there are any error messages on the console when accessing the large xattr on the upstream kernel. Could you also run the upstream e2fsprogs with the ea_inode feature against the MDT filesystem (possibly after the Lustre e2fsprogs runs to clean up dirdata errors) to see that the new e2fsprogs works properly with existing large xattrs.

Andreas Dilger added a comment - 06/Jul/17 6:54 PM Hi Emoly, thank you for your testing. It looks like the upstream kernel ea_inode feature works properly to access the Lustre xattrs, which is great. Could you please check if there are any error messages on the console when accessing the large xattr on the upstream kernel. Could you also run the upstream e2fsprogs with the ea_inode feature against the MDT filesystem (possibly after the Lustre e2fsprogs runs to clean up dirdata errors) to see that the new e2fsprogs works properly with existing large xattrs.

People

Assignee:: Emoly Liu

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Due:: 07/Jul/17

Created:: 29/Jun/17 10:39 PM

Updated:: 10/Aug/18 3:11 AM

Resolved:: 08/Aug/18 9:03 PM