[LU-12231] parallel-scale-nfsv4 test racer_on_nfs fails with 'test_racer_on_nfs failed with 1' Created: 26/Apr/19 Updated: 19/May/23 Resolved: 19/May/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.3, Lustre 2.12.4, Lustre 2.12.5, Lustre 2.12.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | ubuntu, ubuntu18, ubuntu20 | ||
| Environment: |
Ubuntu 18.04 clients |
||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
parallel-scale-nfsv4 test_ racer_on_nfs fails for Ubuntu 18.04 clients only with the following errors and hangs 100% of the time for Ubuntu 18.04 clients. Looking at the logs for a recent failure, with logs at https://testing.whamcloud.com/test_sets/4ca25e50-6661-11e9-aeec-52540065bddc, there are several messages that may be related to the failure. In the suite_log, we see that the main issue is that the file system is read-only == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 07:04:44 (1556089484) CMD: trevis-43vm10,trevis-43vm9.trevis.whamcloud.com MDSCOUNT=1 OSTCOUNT=7 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/d0.parallel-scale-nfs trevis-43vm9: mkdir: cannot create directory '/mnt/lustre/d0.parallel-scale-nfs': Read-only file system trevis-43vm10: mkdir: cannot create directory '/mnt/lustre/d0.parallel-scale-nfs': Read-only file system On the console log for the MDS (vm12), we see [ 238.469591] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 07:04:44 (1556089484) [ 539.092142] RPC request reserved 84 but used 100 [ 545.688053] Lustre: DEBUG MARKER: /usr/sbin/lctl mark parallel-scale-nfsv4 test_racer_on_nfs: @@@@@@ FAIL: test_racer_on_nfs failed with 1 For a few of the failues, we don’t see this RPC message. In the console log for both clients we see [ 228.129232] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 07:04:44 (1556089484) [ 228.311480] Lustre: DEBUG MARKER: MDSCOUNT=1 OSTCOUNT=7 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/d0.parallel-scale-nfs [ 265.262396] random: crng init done [ 265.263333] random: 4 urandom warning(s) missed due to ratelimiting There are several examples of this failure, but here are just a couple of additional links to logs |
| Comments |
| Comment by James Nunez (Inactive) [ 16/Oct/19 ] |
|
We're still seeing racer_on_nfs fail with the 'read-only' file system issue described here, but we are also seeing test metabench fail with issues related to a 'read-only' file system. See https://testing.whamcloud.com/test_sets/ead02450-eb26-11e9-b62b-52540065bddc for logs == parallel-scale-nfsv4 test metabench: metabench ==================================================== 06:11:52 (1570687912) OPTIONS: METABENCH=/usr/bin/metabench clients=trevis-63vm10,trevis-63vm9.trevis.whamcloud.com mbench_NFILES=10000 mbench_THREADS=4 trevis-63vm10 trevis-63vm9.trevis.whamcloud.com mkdir: cannot create directory '/mnt/lustre/d0.parallel-scale-nfs': Read-only file system chmod: cannot access '/mnt/lustre/d0.parallel-scale-nfs/d0.metabench': No such file or directory + /usr/bin/metabench -w /mnt/lustre/d0.parallel-scale-nfs/d0.metabench -c 10000 -C -S + chmod 0777 /mnt/lustre chmod: changing permissions of '/mnt/lustre': Read-only file system dr-xr-xr-x 24 root root 4096 Aug 29 20:41 /mnt/lustre + su mpiuser sh -c "/usr/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh --oversubscribe -machinefile /tmp/parallel-scale-nfs.machines -np 8 /usr/bin/metabench -w /mnt/lustre/d0.parallel-scale-nfs/d0.metabench -c 10000 -C -S " Metadata Test <no-name> on 10/10/2019 at 06:11:55 Rank 0 process on node trevis-63vm10.trevis.whamcloud.com Rank 1 process on node trevis-63vm10.trevis.whamcloud.com Rank 2 process on node trevis-63vm10.trevis.whamcloud.com Rank 3 process on node trevis-63vm10.trevis.whamcloud.com Rank 4 process on node trevis-63vm9.trevis.whamcloud.com Rank 5 process on node trevis-63vm9.trevis.whamcloud.com Rank 6 process on node trevis-63vm9.trevis.whamcloud.com Rank 7 process on node trevis-63vm9.trevis.whamcloud.com [10/10/2019 06:11:56] FATAL error on process 0 Proc 0: cannot create component d0.parallel-scale-nfs in /mnt/lustre/d0.parallel-scale-nfs/d0.metabench: Read-only file system |
| Comment by James Nunez (Inactive) [ 11/Nov/19 ] |
|
We are seeing this issue for non-Ubuntu clients. Here is an example of RHEL ZFS test session that fails with these errors https://testing.whamcloud.com/test_sets/0d54458c-035b-11ea-b934-52540065bddc . |
| Comment by James A Simmons [ 19/May/23 ] |
|
patch https://review.whamcloud.com/c/fs/lustre-release/+/49062 resloved this bug |