[LU-9260] posix failure: access.43 Unresolved Created: 27/Mar/17 Updated: 18/May/20 Resolved: 18/May/20 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0, Lustre 2.11.0, Lustre 2.12.0, Lustre 2.10.4, Lustre 2.13.0, Lustre 2.10.6, Lustre 2.12.1, Lustre 2.12.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Sarah Liu | Assignee: | Sarah Liu |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
client and server: EL7 |
||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
Maloo link: https://testing.hpdd.intel.com/test_sets/461c1d7e-12fc-11e7-b742-5254006e85c2 test log FAILURE SUMMARY: POSIX failures: 6 Test Name Baseline Lustre Report access.43 Succeeded Unresolved chmod.18 Succeeded Unresolved chown.18 Succeeded Unresolved creat.28 Succeeded Unresolved creat.30 Succeeded Unresolved link.23 Succeeded Unresolved FAILURE DESCRIPTIONS: #################################################### Test Name: access.43 Unresolved Test Description: If the implementation supports a read-only file system, EROFS in errno and a return value of -1 on a call to access(path, amode) when write access is requested for a file on a read-only file system. Posix Ref: Component ACCESS Assertion 5.6.3.4-48(C) Test Information: deletion reason: mnt_ro(/dev/loop0, access-d.43) failed |
| Comments |
| Comment by Andreas Dilger [ 27/Mar/17 ] |
|
It isn't clear why the test script is trying to mount /dev/loop0 (which might be the MDT0000 device?) as read-only, instead of remounting the client filesystem read-only? Definitely the right test here would be remounting the client filesystem read-only. It would be worthwhile to look at some older POSIX test logs to see why this is failing now, when (presumably) it didn't fail before. |
| Comment by Sarah Liu [ 12/Apr/17 ] |
|
After investigation, it looks like we always use loop devices in posix read-only tests(against lustre) even in EL6, which is obviously wrong, instead it should provide MGS/MDS here. in test access.43, it uses setuprofs to setup the read-only fs. test43()
{
char *errptr;
int pathok = 0;
/* write access on read only file system */
DBUG_ENTER("test43");
testfail = 0;
globok = 0;
if (setuprofs(t43_dir, t43_file, 'f', (mode_t) MODEANY) != 0)
{
DBUG_VOID_RETURN;
}
setuprofs.c if ((rofs = tet_getvar(VSX_ROFS)) == NULL || *rofs == '\0')
{
xx_rpt(DELETION);
in_rpt("deletion reason: parameter %s is not set", VSX_ROFS);
DBUG_PRINTF("return", "setuprofs = 1");
DBUG_RETURN(1);
}
in scripts/vsx-pcts/parameterisations.sh echo "VSX_ROFS=\"$NOSPC_DEV\"" >> $1 in test_sets/SRC/vsxparams NOSPC_DEV="/dev/loop0" The reason why El6 passed before, but EL7 failed, seems because in EL7 loop device can not be cleanup, then remount the same device failed. In order to re run the test suites at a later date run the rerun_tests program in vsx0's home directory as the vsx0 user /usr/src/posix/ext4 rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/ioprim/write/d.write/write-d.25': Device or resource busy rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/ioprim/write/d.write/write-d.16': Device or resource busy rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/files/mkdir/d.mkdir/mkdir-d.19': Device or resource busy rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/files/mkfifo/d.mkfifo/mkfifo-d.17': Device or resource busy rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/files/link/d.link/link-d.25': Device or resource busy rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/files/rmdir/d.rmdir/rmdir-d.9': Device or resource busy rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/files/open/d.open/open-d.46': Device or resource busy rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/files/rename/d.rename/rename-d.17': Device or resource busy rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/files/unlink/d.unlink/unlink-d.9': Device or resource busy Install and build POSIX test suite successfully! Run POSIX test against lustre filesystem |
| Comment by Andreas Dilger [ 13/Apr/17 ] |
|
I don't think it is the MDS that should be mounted read-only, but rather the client. The test is running on the client. Also, I don't think that he MDS can be mounted read-only and still work. |
| Comment by Sarah Liu [ 13/Apr/17 ] |
|
Yes, I meant to mount lustre client as readonly, not MDS. But for mounting client, it needs to provide MGS info. Before posix read-only tests, it needs to remount FS as read-only, but it always uses /dev/loop0, see above rofs is always /dev/loop0, so it is not testing lustre at all. I think we need replace this /dev/loop0 as mgs:/fsname. |
| Comment by James Nunez (Inactive) [ 20/Apr/17 ] |
|
I think I understand why we are using the loop device when mounting a readonly file system. In the configuration file TESTROOT/tetexec.cfg, we can set what file system to mount as read only # File system which can be mounted read only. # Can be the same as VSX_MOUNT_DEV and VSX_NOSPC_DEV. # Set to "unsup" if read only file systems are not supported. VSX_ROFS=/dev/loop0 I need to look into how to change this configuration file in our POSIX configuration/setup. |
| Comment by Sarah Liu [ 26/May/17 ] |
|
a quick update of the processing of the issue. I passed the MGSNID:/FSNAME into the suite to replace the "/dev/loop0" and got following error. The error changes and it seems it try to mount r/w but the test seems test mount read-only. Will add some debug info to those tests and see what happens there. FAILURE SUMMARY:
POSIX failures: 5
Test Name Baseline Lustre Report
chmod.18 Succeeded Unresolved
chown.18 Succeeded Unresolved
creat.28 Succeeded Unresolved
creat.30 Succeeded Unresolved
link.23 Succeeded Unresolved
FAILURE DESCRIPTIONS:
####################################################
Test Name: chmod.18 Unresolved
Test Description:
If the implementation supports a read-only file system, EROFS in errno
and a return value of -1 on a call to chmod(path, mode) when the named
file resides on a read-only file system. No change to the file mode
shall occur.
Posix Ref: Component CHMOD Assertion 5.6.4.4-39(C)
Test Information:
Test Agency: Unknown System Tested: Unknown
Test Date: May 19, 2017 Page 29 of 69
X/OPEN Verification Suite
Test-Set Summary Test-Set Summary
deletion reason: mnt_rw(onyx-69@tcp:/lustre, chmod-d.18) failed
####################################################
|
| Comment by Sarah Liu [ 04/Aug/17 ] |
|
Did more investigation and here is what I found In that c file, it defines the fstype as ext2 / modify the following for different mountable filesystem types / #define __USERINTF_FSTYPE "ext2" So the process would be |
| Comment by Gerrit Updater [ 23/Aug/17 ] |
|
Wei Liu (wei3.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/28661 |
| Comment by Sarah Liu [ 23/Aug/17 ] |
|
The above patch fixes the problem from lustre side, corresponding changes on Posix test suite are also needed which will be made to toolkit. |
| Comment by Gerrit Updater [ 23/Aug/17 ] |
|
Wei Liu (wei3.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/28669 |
| Comment by Gerrit Updater [ 30/Aug/17 ] |
|
Minh Diep (minh.diep@intel.com) merged in patch https://review.whamcloud.com/28669/ |
| Comment by Gerrit Updater [ 13/Sep/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28661/ |
| Comment by Peter Jones [ 13/Sep/17 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 16/Oct/17 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29628 |
| Comment by James Casper [ 17/Oct/17 ] |
|
Seen again with 2.10.54: https://testing.hpdd.intel.com/test_sessions/77fc6fa0-eddf-4fdb-b9ad-8724d84acb75 |
| Comment by Gerrit Updater [ 26/Oct/17 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/29628/ |
| Comment by Jian Yu [ 22/Jan/18 ] |
|
The failure occurred at least 9 times in the last week. Here is a failure instance on 2.10.57: |
| Comment by Sarah Liu [ 24/Jan/18 ] |
|
I searched Maloo for the last week and saw 4 failures in all master branch and interop between b2_10 and master. The failures seen in interop testing between master and b2_9 and/or prior should not count since the client doesn't have the fix. Among the 4 failures, 3 of them were regular configs and 1 was interop config. All of those 3 regular configs failed on ONYX. I suspect this is an env related issue. |
| Comment by Sarah Liu [ 02/Feb/18 ] |
|
Cannot reproduce the problem on onyx with physical nodes or vms, will try to provision the node with snapshot and see what I can find |
| Comment by Sarah Liu [ 05/Feb/18 ] |
|
Cannot reproduce the problem on vm either, nodes were installed via snapshot https://testing.hpdd.intel.com/test_sessions/c64e9304-0aa5-11e8-bd00-52540065bddc |
| Comment by Joseph Gmitter (Inactive) [ 27/Mar/18 ] |
|
Should we resolve this issue as fixed in 2.11.0 if it is no longer seen nor reproducible? |
| Comment by James Nunez (Inactive) [ 28/Mar/18 ] |
|
We are still seeing this issue or something very close to it. Here is an example https://testing.hpdd.intel.com/test_sets/5fbcc25c-2a4c-11e8-b6a0-52540065bddc.
|
| Comment by Jian Yu [ 14/Dec/18 ] |
|
+1 on master branch: |
| Comment by James Nunez (Inactive) [ 18/May/20 ] |
|
We will not fix this issue because we’ve replaced the POSIX test suite with pjdfstest. |