[LU-6707] EL7 client cannot find loop device for posix test Created: 31/Jan/15  Updated: 09/May/18  Resolved: 16/Mar/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0, Lustre 2.10.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Critical
Reporter: Sarah Liu Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: triage
Environment:

server: lustre-master #2770 RHEL6
client: EL7


Issue Links:
Related
is related to LU-9102 Header files are missing from EL7 whi... Resolved
is related to LU-7758 Interop master<->2.7.1: POSIX fstat.1... Closed
Severity: 3
Rank (Obsolete): 17309

 Description   

https://testing.hpdd.intel.com/test_sets/209ce6aa-7ee3-11e4-ab67-5254006e85c2

posix test_1: @@@@@@ FAIL: /dev/loop/1 and /dev/loop1 gone? 


 Comments   
Comment by Jian Yu [ 01/May/15 ]

Lustre build: https://build.hpdd.intel.com/job/lustre-b_ieel2_0/176/
Distro/Arch: RHEL6.6/x86_64 (server), RHEL7.1/x86_64 (client)

The same failure occurred:
https://testing.hpdd.intel.com/test_sets/80ca46e0-efcb-11e4-bd0b-5254006e85c2
https://testing.hpdd.intel.com/test_sets/34267458-ee9f-11e4-a2ce-5254006e85c2

On a RHEL 7.1 client:

# ls /dev/loop*
/dev/loop-control

There is no loop device.

Comment by Minh Diep [ 01/May/15 ]

This need to investigate by testing manually. I can't find anything that point to TEI.

Comment by Jodi Levi (Inactive) [ 04/May/15 ]

Sarah,
Would you be able to test this one manually and post the results here?

Comment by Sarah Liu [ 04/May/15 ]

Jodi,

Sure, I will do it today and update the ticket when I have some results

Comment by Sarah Liu [ 08/May/15 ]

Right after provision EL7 client, nothing has been run, there is no loop device under /dev, will do more investigation.

[root@eagle-39vm3 ~]# ls /dev|grep loop
loop-control
[root@eagle-39vm3 ~]# rpm -a|grep lustre
[root@eagle-39vm3 ~]# rpm -qa|grep lustre
lustre-client-modules-2.7.52-3.10.0_229.1.2.el7.x86_64_gbd07c02.x86_64
lustre-iokit-2.7.52-3.10.0_229.1.2.el7.x86_64_gbd07c02.x86_64
lustre-client-2.7.52-3.10.0_229.1.2.el7.x86_64_gbd07c02.x86_64
lustre-client-tests-2.7.52-3.10.0_229.1.2.el7.x86_64_gbd07c02.x86_64
[root@eagle-39vm3 ~]# uname -a
Linux eagle-39vm3.eagle.hpdd.intel.com 3.10.0-229.1.2.el7.x86_64 #1 SMP Fri Mar 27 03:04:26 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@eagle-39vm3 ~]# 

This is what I got from EL6

[root@eagle-39vm1 ~]# uname -a
Linux eagle-39vm1.eagle.hpdd.intel.com 2.6.32-504.16.2.el6_lustre.gec772b8.x86_64 #1 SMP Thu Apr 30 21:09:20 PDT 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@eagle-39vm1 ~]# ls /dev|grep loop
loop0
loop1
loop2
loop3
loop4
loop5
loop6
loop7
Comment by Sarah Liu [ 09/May/15 ]

On EL7, there is no loop device has been set up during system initialization, cannot find the setup commands in /etc/rc.d/init.d/functions

[root@eagle-39vm3 init.d]# pwd
/etc/rc.d/init.d
[root@eagle-39vm3 init.d]# grep -r "losetup" .
[root@eagle-39vm3 init.d]# uname -a
Linux eagle-39vm3.eagle.hpdd.intel.com 3.10.0-229.1.2.el7.x86_64 #1 SMP Fri Mar 27 03:04:26 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@eagle-39vm3 init.d]# rpm -qf functions
initscripts-9.49.17-1.el7_0.1.x86_64
[root@eagle-39vm3 init.d]# 

While on EL6, /etc/rc.d/init.d/functions has the following commands.

[root@eagle-39vm1 init.d]# grep -r "losetup" .
./functions:			losetup $dev > /dev/null 2>&1 && \
./functions:				losetup -d $dev
[root@eagle-39vm1 init.d]# rpm -qf functions 
initscripts-9.03.46-1.el6.centos.1.x86_64
[root@eagle-39vm1 init.d]# pwd
/etc/rc.d/init.d
[root@eagle-39vm1 init.d]# 

I think this is the reason why EL7 doesn't have loop devices setup after booting. May be our provision script should handle this.

Comment by Minh Diep [ 11/May/15 ]

This needs more investigation, but if EL7 doesn't have the loop devices created, perhaps posix testsuite need to do that. I wonder if we are missing any packages in EL7

Comment by Sarah Liu [ 11/May/15 ]

I have tried that the loop device can be created manually with losetup, I used the same provision command for EL6 and EL7

Comment by Jian Yu [ 12/May/15 ]

The same issue is on SLES12 client:
https://testing.hpdd.intel.com/test_sets/1bbaa112-f8f1-11e4-94ea-5254006e85c2

Comment by Minh Diep [ 15/May/15 ]

so looks like starting the new kernel, the default loop devices are not there. This means we have to create it from posix test suit(Lustre QE) should be owned by lustre QE group.

Comment by Andrea Garcia (Inactive) [ 18/May/15 ]

Yu Jian will update this ticket with what needs to be done (think is the posix test suite that needs to be modified/updated).
Sarah will work with YuJian to get this done.
QE will own taking these types of changes in the future.

Comment by Jian Yu [ 18/May/15 ]

Hi Minh,
Can we add some codes into the node provisioning procedure to check and create loop devices on the node? Similar like what we did for creating sanityusr, quota_usr, mpiuser, etc.
This will avoid posix test failing with test environment issues while creating loop devices in itself.

Comment by Minh Diep [ 18/May/15 ]

Hi Yujian

I think the testsuite has to take care of creating what's needed to run the test ( and remove it afterward ideally). creating sanityusr... was not the right way to begin with.
Someone might run posix test on non-autotest system and will fail and has to create loop device manually. If we use posix to create then it's all there.

Comment by Jian Yu [ 18/May/15 ]

OK, so the changes will be adding functions into test-framework.sh to check, create and remove loop devices. And in lustre/tests/posix.sh, make setup_loop_dev() and cleanup_loop_dev() call those functions.

Comment by Sarah Liu [ 03/Jun/15 ]

it turns out that if loading loop module with options max_loop=8 , the system will have loop devices

[root@eagle-54vm1 modprobe.d]# modprobe loop max_loop=8
[  307.399441] loop: module loaded
[root@eagle-54vm1 modprobe.d]# ls /dev/loop*
/dev/loop0  /dev/loop2	/dev/loop4  /dev/loop6	/dev/loop-control
/dev/loop1  /dev/loop3	/dev/loop5  /dev/loop7
Comment by Gerrit Updater [ 03/Jun/15 ]

Wei Liu (wei3.liu@intel.com) uploaded a new patch: http://review.whamcloud.com/15130
Subject: TEI-3114 test: load loop module to have loop devices
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1612288b7ca7b5ac5d8585fd3d96d395c3da2859

Comment by Sarah Liu [ 18/Jun/15 ]

This issue is blocked by TEI-3627, posix test needs stropts.h and xtitypes.h which are missing from current EL7 build.

Comment by Yang Sheng [ 26/Jun/15 ]

RHEL7 uses /dev/loop-control to manage loop devices. So haven't any /dev/loopXX device be created in advance. But you can stilll use losetup to maintenance loop device.

Comment by Jian Yu [ 26/Aug/15 ]

Here is the test report of POSIX compliance testing on RHEL 7.1 client and server:
https://testing.hpdd.intel.com/test_sessions/86d3c726-4b17-11e5-9143-5254006e85c2

FAILURE SUMMARY:

POSIX failures: 6

Test Name                   Baseline   Lustre Report
access.43                  Succeeded      Unresolved
chmod.18                   Succeeded      Unresolved
chown.18                   Succeeded      Unresolved
creat.28                   Succeeded      Unresolved
creat.30                   Succeeded      Unresolved
link.23                    Succeeded      Unresolved
FAILURE DESCRIPTIONS:

####################################################
Test Name: access.43 Unresolved

	Test Description:
If the implementation supports a read-only file system, EROFS in errno
and a return value of -1 on a call to access(path, amode) when write
access is requested for a file on a read-only file system.
Posix Ref: Component ACCESS Assertion 5.6.3.4-48(C)

	Test Information:
deletion reason: mnt_ro(/dev/loop0, access-d.43) failed

####################################################
Test Name: chmod.18 Unresolved

	Test Description:
If the implementation supports a read-only file system, EROFS in errno
and a return value of -1 on a call to chmod(path, mode) when the named
file resides on a read-only file system. No change to the file mode
shall occur.
Posix Ref: Component CHMOD Assertion 5.6.4.4-39(C)

	Test Information:
deletion reason: mnt_ro(/dev/loop0, chmod-d.18) failed

####################################################
Test Name: chown.18 Unresolved

	Test Description:
If the implementation supports a read-only file system, EROFS in errno
and a return value of -1 on a call to chown(path, owner, group) when
the named file resides on a read-only file system.  No change shall be
made to the owner and group of the file.
Posix Ref: Component CHOWN Assertion 5.6.5.4-40(C)

	Test Information:
deletion reason: mnt_ro(/dev/loop0, chown-d.18) failed

####################################################
Test Name: creat.28 Unresolved

	Test Description:
EROFS in errno and a return value of -1 on a call to creat(path, mode)
when:
a. the file exists and the named file resides on a read-only file
system;
b. the named file is to reside on a read-only file system and the file
does not exist.  The time related elements st_ctime and st_mtime field
of the parent directory shall not be updated and the file shall not be
truncated.
Posix Ref: Component CREAT Assertion 5.3.2.4-55(C)
Posix Ref: Component CREAT Assertion 5.3.2.4-56(C)

	Test Information:
deletion reason: mnt_ro(/dev/loop0, ./creat-d.28) failed

####################################################
Test Name: creat.30 Unresolved

	Test Description:
ENOSPC in errno and a return value of -1 on a call to creat(path) when
the directory or file system which would contain the new file cannot
be extended.
Posix Ref: Component CREAT Assertion 5.3.2.4-53(B)

	Test Information:
File system not set up correctly for ENOSPC tests

####################################################
Test Name: link.23 Unresolved

	Test Description:
EROFS in errno and a return value of -1 on a call to link() when the
requested link requires writing in a directory on a read-only file
system.
Posix Ref: Component LINK Assertion 5.3.4.4-63(C)

	Test Information:
deletion reason: mnt_ro(/dev/loop0, link-d.23) failed

####################################################
 posix test_1: @@@@@@ FAIL: Run POSIX testsuite on /mnt/lustre failed 

Except creat.30, other failures are related to /dev/loop0.

In addition, the following errors occurred before running POSIX test against Lustre filesystem:

rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/files/mkfifo/d.mkfifo/mkfifo-d.17': Device or resource busy
rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/files/rmdir/d.rmdir/rmdir-d.9': Device or resource busy
rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/files/mkdir/d.mkdir/mkdir-d.19': Device or resource busy
rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/files/link/d.link/link-d.25': Device or resource busy
rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/files/unlink/d.unlink/unlink-d.9': Device or resource busy
rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/files/open/d.open/open-d.46': Device or resource busy
rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/files/rename/d.rename/rename-d.17': Device or resource busy
rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/ioprim/write/d.write/write-d.16': Device or resource busy
rm: cannot remove '/usr/src/posix/ext4/TESTROOT/tset/POSIX.os/ioprim/write/d.write/write-d.25': Device or resource busy

lsof and fuser output nothing:

# lsof +D /usr/src/posix/ext4/TESTROOT/
# lsof +D /usr/src/posix/ext4/TESTROOT/tset/POSIX.os/ioprim/write/d.write/write-d.16/
# fuser -u /usr/src/posix/ext4/TESTROOT/tset/POSIX.os/ioprim/write/d.write/write-d.16/
# ls -la /usr/src/posix/ext4/TESTROOT/tset/POSIX.os/ioprim/write/d.write/write-d.16/
total 2
drwxr-xr-x 2 root root  1024 Aug 25 17:34 .
drwxrwsrwx 4 vsx0 vsxg0 1024 Aug 25 17:34 ..

Still investigating.

Comment by Saurabh Tandan (Inactive) [ 11/Dec/15 ]

master, build# 3264, 2.7.64 tag
Regression:EL7.1 Server/EL7.1 Client
https://testing.hpdd.intel.com/test_sets/7acb9552-9f37-11e5-ba94-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 16/Dec/15 ]

Server: 2.5.5, b2_5_fe/62
Client: Master, Build# 3266, Tag 2.7.64 , RHEL 7
https://testing.hpdd.intel.com/test_sets/27d8f730-a05f-11e5-90cc-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 19/Dec/15 ]

Another instance for EL7.1 Server/EL7.1 Client - DNE
Master , Build# 3270
https://testing.hpdd.intel.com/test_sets/74fdd71e-a26d-11e5-bdef-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 20/Jan/16 ]

Another instance found for interop : 2.5.5 Server/EL7 Client
Server: 2.5.5, b2_5_fe/62
Client: master, build# 3303, RHEL 7
https://testing.hpdd.intel.com/test_sets/89223572-bb0a-11e5-87b4-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 03/Feb/16 ]

Encountered another instance for tag 2.7.66 for FULL - EL7.1 Server/EL7.1 Client , master , build# 3314.
https://testing.hpdd.intel.com/test_sets/c3cee9b2-ca88-11e5-84d3-5254006e85c2

Another instance for FULL - EL7.1 Server/EL7.1 Client - DNE, master, build# 3314
https://testing.hpdd.intel.com/test_sets/b64f210c-cac5-11e5-9609-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 10/Feb/16 ]

Another instance found for interop tag 2.7.66 - 2.7.1 Server/EL7 Client, build# 3316
https://testing.hpdd.intel.com/test_sets/9f032636-ccde-11e5-8b0e-5254006e85c2

Another instance found for interop tag 2.7.66 - 2.5.5 Server/EL7 Client, build# 3316
https://testing.hpdd.intel.com/test_sets/6241fb0c-ccc8-11e5-b80c-5254006e85c2

Another instance found for Full tag 2.7.66 - EL7.1 Server/EL7.1 Client, build# 3314
https://testing.hpdd.intel.com/test_sets/c3cee9b2-ca88-11e5-84d3-5254006e85c2

Another instance found for Full tag 2.7.66 -EL7.1 Server/EL7.1 Client - DNE, build# 3314
https://testing.hpdd.intel.com/test_sets/b64f210c-cac5-11e5-9609-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 24/Feb/16 ]

Another instance found for interop - EL7 Server/2.7.1 Client, tag 2.7.90.
https://testing.hpdd.intel.com/test_sessions/495aabae-d306-11e5-be5c-5254006e85c2
Another instance found for interop - 2.7.1 Server/EL7 Client, tag 2.7.90.
https://testing.hpdd.intel.com/test_sessions/3b9722f8-d2f8-11e5-bf08-5254006e85c2
Another instance found for interop - 2.5.5 Server/EL7 Client, tag 2.7.90.
https://testing.hpdd.intel.com/test_sessions/ba9d84fe-d300-11e5-be5c-5254006e85c2

Comment by Sarah Liu [ 10/Feb/17 ]

This ticket is only for the loop device issue, header missing problem is tracking under LU-9102

Comment by Gerrit Updater [ 16/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/15130/
Subject: LU-6707 test: load loop module to have loop devices
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b7b080bcb0e2cac15174de44b1b822fc11feec02

Comment by Peter Jones [ 16/Mar/17 ]

Landed for 2.10

Comment by Gerrit Updater [ 09/May/17 ]

James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/27012
Subject: LU-6707 tests: Add ability to skip tests in POSIX
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: dea19ff2b5b237e4a5d0df6162f88bbdd8ed4893

Comment by Gerrit Updater [ 24/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27012/
Subject: LU-6707 tests: Add ability to skip tests in POSIX
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 95c0d6fa814657b04375d2276e8c1a4d484d028d

Comment by James Casper [ 26/Sep/17 ]

2.10.1 b26 <--> 2.9.0 b22:
https://testing.hpdd.intel.com/test_sessions/86b672ef-d6c1-431c-a6a9-c5b9f5358da1

Generated at Sat Feb 10 02:02:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.