[LU-6455] EL7 client replay-vbr test_4i: version changed unexpectedly Created: 10/Apr/15  Updated: 07/Jul/17  Resolved: 20/Jul/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0, Lustre 2.10.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None
Environment:

client and server: lustre-master build # 2983 EL7 client


Issue Links:
Blocker
Duplicate
is duplicated by LU-6456 EL7 client replay-vbr test_10b: trans... Resolved
is duplicated by LU-6651 replay-single test 28: fchmod: No suc... Resolved
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/ce29134a-df8a-11e4-bf2e-5254006e85c2.

The sub-test test_4i failed with the following error:

version changed unexpectedly: pre 0x8d00000008, post 0x8d0000000d

test log

== replay-vbr test 4i: setattr of times does not change versions ===================================== 20:19:13 (1428635953)
CMD: onyx-42vm3 /usr/sbin/lctl get_param mdd.lustre-MDT0000.atime_diff
CMD: onyx-42vm3 /usr/sbin/lctl set_param mdd.lustre-MDT0000.atime_diff=0
mdd.lustre-MDT0000.atime_diff=0
CMD: onyx-42vm6.onyx.hpdd.intel.com mcreate /mnt/lustre/f4i.replay-vbr
CMD: onyx-42vm6.onyx.hpdd.intel.com /usr/bin/lfs path2fid /mnt/lustre/f4i.replay-vbr
CMD: onyx-42vm3 /usr/sbin/lctl --device lustre-MDT0000 getobjversion \"[0x200014450:0x1:0x0]\"
CMD: onyx-42vm6.onyx.hpdd.intel.com touch /mnt/lustre/f4i.replay-vbr
CMD: onyx-42vm6.onyx.hpdd.intel.com /usr/bin/lfs path2fid /mnt/lustre/f4i.replay-vbr
CMD: onyx-42vm3 /usr/sbin/lctl --device lustre-MDT0000 getobjversion \"[0x200014450:0x1:0x0]\"
CMD: onyx-42vm3 /usr/sbin/lctl set_param mdd.lustre-MDT0000.atime_diff=60
mdd.lustre-MDT0000.atime_diff=60
 replay-vbr test_4i: @@@@@@ FAIL: version changed unexpectedly: pre 0x8d00000008, post 0x8d0000000d 
  Trace dump:


 Comments   
Comment by Peter Jones [ 21/Apr/15 ]

Hongchao

Could you please look into this failure?

Thanks

Peter

Comment by Hongchao Zhang [ 22/Apr/15 ]

the issue has been reproduced locally, will try to create a patch soon.

Comment by Hongchao Zhang [ 24/Apr/15 ]

status update: this issue is related to the new feature "Integrity Measurement Architecture" (IMA) introduced in RHEL7.x,
the xattr "security.ima" is deleted when closing some file, which cause the version of the inode is increased.

Comment by Gerrit Updater [ 05/May/15 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/14673
Subject: LU-6455 mdt: don't update i_version for IMA
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 585c8696798e8cea214d3c86afc77ac4787c11fa

Comment by Andreas Dilger [ 06/May/15 ]

Where are these xattrs generated? The i_version update in Lustre is used by VBR to check whether the RPC has been applied or not, so this change has the potential to break VBR. I'm wondering whether there is some safe way to replay this RPC that doesn't affect the version?

Comment by Mikhail Pershin [ 06/May/15 ]

On other hand, if VBR uses/modify i_version then it doesn't make sense to use IMA as well. I have no idea now how to handle this properly, but it looks that Lustre is not capable to support IMA now. Another issue I see about these new EA is about performance, while enabled they adds extra modification even at file close time. Considering it is not compatible with Lustre now, it would be better to disable IMA at least until it will work with Lustre.

Comment by Gerrit Updater [ 25/May/15 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/14928
Subject: LU-6455 mdt: disable IMA support
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 05f0b632d7a40f37bad9e7a1913f295cdae14d4c

Comment by Andreas Dilger [ 29/May/15 ]

Hongchao, can you please comment here with details about this feature (website, email of developers, mailing list) so I can learn more about it and contact the developers.

Comment by Hongchao Zhang [ 01/Jun/15 ]

the wiki of IMA is at http://sourceforge.net/p/linux-ima/wiki/Home/

the general workflow is as follows,
1, hash the file data and metadata
2, encrypt the hash value by private key
3, store the encrypted hash to the extended attributes "security.ima"

4, the user received the file and hash the file content by the same way
5, decrypt the encrypted hash value contained in "security.ima" by public key
6, compare the hash to check whether it matches.

the public/private key pair is generated by TPM (Trusted Platform Module).

Comment by Andreas Dilger [ 03/Jun/15 ]

So, I was going to send an email to the upstream IMA developers about this problem, but I want to make sure I understand the problem exactly. Can you verify that setting the security.ima xattr on a file in ext4 does NOT update the i_version field, but it does update the i_version field for Lustre? It seems to me that setting security.ima on an ext4 filesystem should also change the i_version field if I read the code correctly, unless that is being handled specially for the security xattr handler. Is the hashing of metadata and storing of security.ima done by the client kernel (e.g. through an SELinux security policy inside the kernel), or automatically by libraries linked in userspace (e.g. glibc), or by a separate tool being run on the client (though I wonder how it would be run in the middle of this test)?

Next, does this i_version update by Lustre actually break the IMA functionality? It seems that the main problem being reported here is because setting the security.ima xattr is done automatically by some userspace tool or maybe by the client kernel, and that is causing a second transaction on the file and confusing the VBR test? Is IMA still functioning correctly and only our test is broken, or is IMA broken by our i_version updates?

Comment by Hongchao Zhang [ 04/Jun/15 ]

I looked at the related codes, and the i_version will be updated whenever the "security.ima" is updated, for Lustre, there is one more xattr
"trusted.version" (XATTR_NAME_VERSION) to be updated to store the updated "version" of the inode.

I'm not clear which part of the linux to do the hashing and storing, and will look at the related code lines more deeply to check it.

the problem here is the "security.ima" broke the assumption in the subtest 4i in replay_vbr, which thought the "touch" won't affect the "version" of
the file, but in the EL7 with IMA support, in "ima_inode_post_setattr" (SyS_utimensat/do_utimes/utimes_common/notify_change/ima_inode_post_setattr)
will remove the "security.ima" if the file is not necessarily to be appraised.

 void ima_inode_post_setattr(struct dentry *dentry)
305 {
306         struct inode *inode = dentry->d_inode;
307         struct integrity_iint_cache *iint;
308         int must_appraise, rc;
309 
310         if (!ima_initialized || !ima_appraise || !S_ISREG(inode->i_mode)
311             || !inode->i_op->removexattr)
312                 return;
313 
314         must_appraise = ima_must_appraise(inode, MAY_ACCESS, POST_SETATTR);
315         iint = integrity_iint_find(inode);
316         if (iint) {
317                 iint->flags &= ~(IMA_APPRAISE | IMA_APPRAISED |
318                                  IMA_APPRAISE_SUBMASK | IMA_APPRAISED_SUBMASK |
319                                  IMA_ACTION_FLAGS);
320                 if (must_appraise)
321                         iint->flags |= IMA_APPRAISE;
322         }
323         if (!must_appraise)
324                 rc = inode->i_op->removexattr(dentry, XATTR_NAME_IMA);           <-- here, it will affect the "version" of file in Lustre
325         return;
326 }
Comment by Andreas Dilger [ 05/Jun/15 ]

Hongchao,
can you please push a new patch (separate from 14673 and 14928) to skip these failing tests only if IMA is enabled. That will give us more time to understand what the correct solution is.

Next, it isn't clear to me from your previous comment whether the Lustre changing of i_version is causing IMA itself to break when the security.ima xattr is set? If this does not break IMA, then it seems the only problem is in these tests. If this does break IMA to change i_version when security.ima is set then we should land one of the other patches, probably 14928, since 14673 appears to cause other failures in recovery.

Comment by Andreas Dilger [ 05/Jun/15 ]

The tests to be disabled for IMA appear to be:

  • replay-vbr.sh test_4i, test_4j, test_4k, test_10b
  • replay-single test_28
Comment by Gerrit Updater [ 05/Jun/15 ]

Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/15164
Subject: LU-6455 test: disable IMA related tests
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a8396032c5aace5b6218f49d04506d704c596ae9

Comment by Mikhail Pershin [ 08/Jun/15 ]

Andreas, it was my comment about i_version change may break IMA. I got it wrong that IMA uses i_version directly. Now I understand that Hongchao meant that i_version is changed by IMA just because it modifies file and our versioning mechanism is doing that actually.

Hongchao, is that right?

So, to be clear, if IMA uses only xattrs then it is OK, we need to fix tests. But from commit message it looks like IMA also depends on i_version.

Comment by Hongchao Zhang [ 09/Jun/15 ]

Yes, it won't break the IMA,

"security.ima" is used to protect the content of the file
"security.evm" is used to protect the metadata of the file, currently, it contains "inode number", "inode generation", UID, GID, file mode, security.selinux,
security.SMACK64, security.ima, security.capability. and it could also contain some additional attributes, "filesystem UUID" and extra SMACK extended attributes.

our extended attributes to store the inode version "trusted.version" is not included to calculate the EVM or IMA signature.

there is an command "evmctl" to calculate or verify the signature. (http://sourceforge.net/p/linux-ima/ima-evm-utils/ci/master/tree/)

Comment by Mikhail Pershin [ 11/Jun/15 ]

Hongchao, note that we don't store version in xattr with ldiskfs, but uses i_fs_version in inode, please, make sure it is not one protected by EVM

Comment by Andreas Dilger [ 12/Jun/15 ]

It looks like the correct solution here is to either skip these tests, or mark them error_ignore for clients with IMA enabled (not sure how that would be checked). I don't think we need to disable the xattrs on the client or try to skip the version update for the setxattr.

Comment by Gerrit Updater [ 13/Jun/15 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/15164/
Subject: LU-6455 test: disable IMA related tests
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2f93aa4e73148b2d8daeea974a82effc5fcf0321

Comment by Andreas Dilger [ 15/Jun/15 ]

The IMA-affected tests have been disabled for RHEL7 only. Unfortunately the fact that the IMA feature sets an extra xattr automatically means that there isn't any way that the replay-vbr.sh tests can verify correct operation. At best we might do something like check if the inode version was increased only once when IMA is enabled, but that is tricky.

A small improvement would be run the tests anyway and make them error_ignore for RHEL7, so that at least they are being run, but we can address this when the next distro using IMA appears (SLES13 or whatever).

Comment by Sarah Liu [ 17/Jul/15 ]

still hit this issue on EL7 lustre-master build # 3093 ZFS

https://testing.hpdd.intel.com/test_sets/19b0e0c4-2917-11e5-b07d-5254006e85c2

Comment by Oleg Drokin [ 20/Jul/15 ]

So the patch has a flaw assuming lsb_release is present when it's not.

From the log:

20:12:25:/usr/lib64/lustre/tests/replay-vbr.sh: line 10: lsb_release: command not found
Comment by Bob Glossman (Inactive) [ 20/Jul/15 ]

I think the needed rpm is 'redhat-lsb'. should be installed on test nodes. should be installed everywhere.

Comment by Peter Jones [ 20/Jul/15 ]

This has been done now and so the this test should no longer be failing

Comment by Gerrit Updater [ 18/Aug/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14928/
Subject: LU-6455 mdt: disable IMA support
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 259c1ec0cfb76b2645efa2be6ec7ad48229eb658

Comment by James Nunez (Inactive) [ 02/Feb/16 ]

replay-single test 28 and replay-vbr tests 4i, 4j, 4k and 10b are still on the ALWAYS_EXCEPT (not being run) list for el7. Since all of the patches for this ticket have landed, can we run these tests again on el7?

Comment by Jian Yu [ 02/Feb/16 ]

I think yes, James. Could you please push a patch to enable the tests and add test parameters to verify them on el7?

Comment by Gerrit Updater [ 28/Jul/16 ]

James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/21565
Subject: LU-6455 tests: Re-enable replay-vbr 4i-k and 10b
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 19ee53fc9b242e3871b426a28d390f346d9dbb5f

Comment by Gerrit Updater [ 31/Jan/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/21565/
Subject: LU-6455 tests: Re-enable replay-vbr and replay-single tests
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8b98c8a669d8d6c36f8bd767acba5e6f912360b7

Generated at Sat Feb 10 02:00:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.