[LU-6455] EL7 client replay-vbr test_4i: version changed unexpectedly Created: 10/Apr/15 Updated: 07/Jul/17 Resolved: 20/Jul/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0, Lustre 2.10.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
client and server: lustre-master build # 2983 EL7 client |
||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/ce29134a-df8a-11e4-bf2e-5254006e85c2. The sub-test test_4i failed with the following error: version changed unexpectedly: pre 0x8d00000008, post 0x8d0000000d test log == replay-vbr test 4i: setattr of times does not change versions ===================================== 20:19:13 (1428635953) CMD: onyx-42vm3 /usr/sbin/lctl get_param mdd.lustre-MDT0000.atime_diff CMD: onyx-42vm3 /usr/sbin/lctl set_param mdd.lustre-MDT0000.atime_diff=0 mdd.lustre-MDT0000.atime_diff=0 CMD: onyx-42vm6.onyx.hpdd.intel.com mcreate /mnt/lustre/f4i.replay-vbr CMD: onyx-42vm6.onyx.hpdd.intel.com /usr/bin/lfs path2fid /mnt/lustre/f4i.replay-vbr CMD: onyx-42vm3 /usr/sbin/lctl --device lustre-MDT0000 getobjversion \"[0x200014450:0x1:0x0]\" CMD: onyx-42vm6.onyx.hpdd.intel.com touch /mnt/lustre/f4i.replay-vbr CMD: onyx-42vm6.onyx.hpdd.intel.com /usr/bin/lfs path2fid /mnt/lustre/f4i.replay-vbr CMD: onyx-42vm3 /usr/sbin/lctl --device lustre-MDT0000 getobjversion \"[0x200014450:0x1:0x0]\" CMD: onyx-42vm3 /usr/sbin/lctl set_param mdd.lustre-MDT0000.atime_diff=60 mdd.lustre-MDT0000.atime_diff=60 replay-vbr test_4i: @@@@@@ FAIL: version changed unexpectedly: pre 0x8d00000008, post 0x8d0000000d Trace dump: |
| Comments |
| Comment by Peter Jones [ 21/Apr/15 ] |
|
Hongchao Could you please look into this failure? Thanks Peter |
| Comment by Hongchao Zhang [ 22/Apr/15 ] |
|
the issue has been reproduced locally, will try to create a patch soon. |
| Comment by Hongchao Zhang [ 24/Apr/15 ] |
|
status update: this issue is related to the new feature "Integrity Measurement Architecture" (IMA) introduced in RHEL7.x, |
| Comment by Gerrit Updater [ 05/May/15 ] |
|
Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/14673 |
| Comment by Andreas Dilger [ 06/May/15 ] |
|
Where are these xattrs generated? The i_version update in Lustre is used by VBR to check whether the RPC has been applied or not, so this change has the potential to break VBR. I'm wondering whether there is some safe way to replay this RPC that doesn't affect the version? |
| Comment by Mikhail Pershin [ 06/May/15 ] |
|
On other hand, if VBR uses/modify i_version then it doesn't make sense to use IMA as well. I have no idea now how to handle this properly, but it looks that Lustre is not capable to support IMA now. Another issue I see about these new EA is about performance, while enabled they adds extra modification even at file close time. Considering it is not compatible with Lustre now, it would be better to disable IMA at least until it will work with Lustre. |
| Comment by Gerrit Updater [ 25/May/15 ] |
|
Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/14928 |
| Comment by Andreas Dilger [ 29/May/15 ] |
|
Hongchao, can you please comment here with details about this feature (website, email of developers, mailing list) so I can learn more about it and contact the developers. |
| Comment by Hongchao Zhang [ 01/Jun/15 ] |
|
the wiki of IMA is at http://sourceforge.net/p/linux-ima/wiki/Home/ the general workflow is as follows, 4, the user received the file and hash the file content by the same way the public/private key pair is generated by TPM (Trusted Platform Module). |
| Comment by Andreas Dilger [ 03/Jun/15 ] |
|
So, I was going to send an email to the upstream IMA developers about this problem, but I want to make sure I understand the problem exactly. Can you verify that setting the security.ima xattr on a file in ext4 does NOT update the i_version field, but it does update the i_version field for Lustre? It seems to me that setting security.ima on an ext4 filesystem should also change the i_version field if I read the code correctly, unless that is being handled specially for the security xattr handler. Is the hashing of metadata and storing of security.ima done by the client kernel (e.g. through an SELinux security policy inside the kernel), or automatically by libraries linked in userspace (e.g. glibc), or by a separate tool being run on the client (though I wonder how it would be run in the middle of this test)? Next, does this i_version update by Lustre actually break the IMA functionality? It seems that the main problem being reported here is because setting the security.ima xattr is done automatically by some userspace tool or maybe by the client kernel, and that is causing a second transaction on the file and confusing the VBR test? Is IMA still functioning correctly and only our test is broken, or is IMA broken by our i_version updates? |
| Comment by Hongchao Zhang [ 04/Jun/15 ] |
|
I looked at the related codes, and the i_version will be updated whenever the "security.ima" is updated, for Lustre, there is one more xattr I'm not clear which part of the linux to do the hashing and storing, and will look at the related code lines more deeply to check it. the problem here is the "security.ima" broke the assumption in the subtest 4i in replay_vbr, which thought the "touch" won't affect the "version" of void ima_inode_post_setattr(struct dentry *dentry)
305 {
306 struct inode *inode = dentry->d_inode;
307 struct integrity_iint_cache *iint;
308 int must_appraise, rc;
309
310 if (!ima_initialized || !ima_appraise || !S_ISREG(inode->i_mode)
311 || !inode->i_op->removexattr)
312 return;
313
314 must_appraise = ima_must_appraise(inode, MAY_ACCESS, POST_SETATTR);
315 iint = integrity_iint_find(inode);
316 if (iint) {
317 iint->flags &= ~(IMA_APPRAISE | IMA_APPRAISED |
318 IMA_APPRAISE_SUBMASK | IMA_APPRAISED_SUBMASK |
319 IMA_ACTION_FLAGS);
320 if (must_appraise)
321 iint->flags |= IMA_APPRAISE;
322 }
323 if (!must_appraise)
324 rc = inode->i_op->removexattr(dentry, XATTR_NAME_IMA); <-- here, it will affect the "version" of file in Lustre
325 return;
326 }
|
| Comment by Andreas Dilger [ 05/Jun/15 ] |
|
Hongchao, Next, it isn't clear to me from your previous comment whether the Lustre changing of i_version is causing IMA itself to break when the security.ima xattr is set? If this does not break IMA, then it seems the only problem is in these tests. If this does break IMA to change i_version when security.ima is set then we should land one of the other patches, probably 14928, since 14673 appears to cause other failures in recovery. |
| Comment by Andreas Dilger [ 05/Jun/15 ] |
|
The tests to be disabled for IMA appear to be:
|
| Comment by Gerrit Updater [ 05/Jun/15 ] |
|
Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/15164 |
| Comment by Mikhail Pershin [ 08/Jun/15 ] |
|
Andreas, it was my comment about i_version change may break IMA. I got it wrong that IMA uses i_version directly. Now I understand that Hongchao meant that i_version is changed by IMA just because it modifies file and our versioning mechanism is doing that actually. Hongchao, is that right? So, to be clear, if IMA uses only xattrs then it is OK, we need to fix tests. But from commit message it looks like IMA also depends on i_version. |
| Comment by Hongchao Zhang [ 09/Jun/15 ] |
|
Yes, it won't break the IMA, "security.ima" is used to protect the content of the file our extended attributes to store the inode version "trusted.version" is not included to calculate the EVM or IMA signature. there is an command "evmctl" to calculate or verify the signature. (http://sourceforge.net/p/linux-ima/ima-evm-utils/ci/master/tree/) |
| Comment by Mikhail Pershin [ 11/Jun/15 ] |
|
Hongchao, note that we don't store version in xattr with ldiskfs, but uses i_fs_version in inode, please, make sure it is not one protected by EVM |
| Comment by Andreas Dilger [ 12/Jun/15 ] |
|
It looks like the correct solution here is to either skip these tests, or mark them error_ignore for clients with IMA enabled (not sure how that would be checked). I don't think we need to disable the xattrs on the client or try to skip the version update for the setxattr. |
| Comment by Gerrit Updater [ 13/Jun/15 ] |
|
Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/15164/ |
| Comment by Andreas Dilger [ 15/Jun/15 ] |
|
The IMA-affected tests have been disabled for RHEL7 only. Unfortunately the fact that the IMA feature sets an extra xattr automatically means that there isn't any way that the replay-vbr.sh tests can verify correct operation. At best we might do something like check if the inode version was increased only once when IMA is enabled, but that is tricky. A small improvement would be run the tests anyway and make them error_ignore for RHEL7, so that at least they are being run, but we can address this when the next distro using IMA appears (SLES13 or whatever). |
| Comment by Sarah Liu [ 17/Jul/15 ] |
|
still hit this issue on EL7 lustre-master build # 3093 ZFS https://testing.hpdd.intel.com/test_sets/19b0e0c4-2917-11e5-b07d-5254006e85c2 |
| Comment by Oleg Drokin [ 20/Jul/15 ] |
|
So the patch has a flaw assuming lsb_release is present when it's not. From the log: 20:12:25:/usr/lib64/lustre/tests/replay-vbr.sh: line 10: lsb_release: command not found |
| Comment by Bob Glossman (Inactive) [ 20/Jul/15 ] |
|
I think the needed rpm is 'redhat-lsb'. should be installed on test nodes. should be installed everywhere. |
| Comment by Peter Jones [ 20/Jul/15 ] |
|
This has been done now and so the this test should no longer be failing |
| Comment by Gerrit Updater [ 18/Aug/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14928/ |
| Comment by James Nunez (Inactive) [ 02/Feb/16 ] |
|
replay-single test 28 and replay-vbr tests 4i, 4j, 4k and 10b are still on the ALWAYS_EXCEPT (not being run) list for el7. Since all of the patches for this ticket have landed, can we run these tests again on el7? |
| Comment by Jian Yu [ 02/Feb/16 ] |
|
I think yes, James. Could you please push a patch to enable the tests and add test parameters to verify them on el7? |
| Comment by Gerrit Updater [ 28/Jul/16 ] |
|
James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/21565 |
| Comment by Gerrit Updater [ 31/Jan/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/21565/ |