Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12231

parallel-scale-nfsv4 test racer_on_nfs fails with 'test_racer_on_nfs failed with 1'

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.3, Lustre 2.12.4, Lustre 2.12.5, Lustre 2.12.6
    • Ubuntu 18.04 clients
    • 3
    • 9223372036854775807

    Description

      parallel-scale-nfsv4 test_ racer_on_nfs fails for Ubuntu 18.04 clients only with the following errors and hangs 100% of the time for Ubuntu 18.04 clients.

      Looking at the logs for a recent failure, with logs at https://testing.whamcloud.com/test_sets/4ca25e50-6661-11e9-aeec-52540065bddc, there are several messages that may be related to the failure. In the suite_log, we see that the main issue is that the file system is read-only

      == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 07:04:44 (1556089484)
      CMD: trevis-43vm10,trevis-43vm9.trevis.whamcloud.com MDSCOUNT=1 OSTCOUNT=7 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/d0.parallel-scale-nfs
      trevis-43vm9: mkdir: cannot create directory '/mnt/lustre/d0.parallel-scale-nfs': Read-only file system
      trevis-43vm10: mkdir: cannot create directory '/mnt/lustre/d0.parallel-scale-nfs': Read-only file system
      

      On the console log for the MDS (vm12), we see

      [  238.469591] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 07:04:44 (1556089484)
      [  539.092142] RPC request reserved 84 but used 100
      [  545.688053] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  parallel-scale-nfsv4 test_racer_on_nfs: @@@@@@ FAIL: test_racer_on_nfs failed with 1 
      

      For a few of the failues, we don’t see this RPC message.

      In the console log for both clients we see

      [  228.129232] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 07:04:44 (1556089484)
      [  228.311480] Lustre: DEBUG MARKER: MDSCOUNT=1 OSTCOUNT=7 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/d0.parallel-scale-nfs
      [  265.262396] random: crng init done
      [  265.263333] random: 4 urandom warning(s) missed due to ratelimiting
      

      There are several examples of this failure, but here are just a couple of additional links to logs
      https://testing.whamcloud.com/test_sets/32607d94-6358-11e9-8bb1-52540065bddc
      https://testing.whamcloud.com/test_sets/4b6d37a0-6614-11e9-aeec-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-12231] parallel-scale-nfsv4 test racer_on_nfs fails with 'test_racer_on_nfs failed with 1'
            simmonsja James A Simmons added a comment - patch https://review.whamcloud.com/c/fs/lustre-release/+/49062 resloved this bug

            We are seeing this issue for non-Ubuntu clients. Here is an example of RHEL ZFS test session that fails with these errors https://testing.whamcloud.com/test_sets/0d54458c-035b-11ea-b934-52540065bddc .

            jamesanunez James Nunez (Inactive) added a comment - We are seeing this issue for non-Ubuntu clients. Here is an example of RHEL ZFS test session that fails with these errors https://testing.whamcloud.com/test_sets/0d54458c-035b-11ea-b934-52540065bddc .

            We're still seeing racer_on_nfs fail with the 'read-only' file system issue described here, but we are also seeing test metabench fail with issues related to a 'read-only' file system. See https://testing.whamcloud.com/test_sets/ead02450-eb26-11e9-b62b-52540065bddc for logs

            == parallel-scale-nfsv4 test metabench: metabench ==================================================== 06:11:52 (1570687912)
            OPTIONS:
            METABENCH=/usr/bin/metabench
            clients=trevis-63vm10,trevis-63vm9.trevis.whamcloud.com
            mbench_NFILES=10000
            mbench_THREADS=4
            trevis-63vm10
            trevis-63vm9.trevis.whamcloud.com
            mkdir: cannot create directory '/mnt/lustre/d0.parallel-scale-nfs': Read-only file system
            chmod: cannot access '/mnt/lustre/d0.parallel-scale-nfs/d0.metabench': No such file or directory
            + /usr/bin/metabench -w /mnt/lustre/d0.parallel-scale-nfs/d0.metabench -c 10000 -C -S 
            + chmod 0777 /mnt/lustre
            chmod: changing permissions of '/mnt/lustre': Read-only file system
            dr-xr-xr-x 24 root root 4096 Aug 29 20:41 /mnt/lustre
            + su mpiuser sh -c "/usr/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh --oversubscribe -machinefile /tmp/parallel-scale-nfs.machines -np 8 /usr/bin/metabench -w /mnt/lustre/d0.parallel-scale-nfs/d0.metabench -c 10000 -C -S "
            Metadata Test <no-name> on 10/10/2019 at 06:11:55
            
            Rank   0 process on node trevis-63vm10.trevis.whamcloud.com
            Rank   1 process on node trevis-63vm10.trevis.whamcloud.com
            Rank   2 process on node trevis-63vm10.trevis.whamcloud.com
            Rank   3 process on node trevis-63vm10.trevis.whamcloud.com
            Rank   4 process on node trevis-63vm9.trevis.whamcloud.com
            Rank   5 process on node trevis-63vm9.trevis.whamcloud.com
            Rank   6 process on node trevis-63vm9.trevis.whamcloud.com
            Rank   7 process on node trevis-63vm9.trevis.whamcloud.com
            
            [10/10/2019 06:11:56] FATAL error on process 0
            Proc 0: cannot create component d0.parallel-scale-nfs in /mnt/lustre/d0.parallel-scale-nfs/d0.metabench: Read-only file system
            
            jamesanunez James Nunez (Inactive) added a comment - We're still seeing racer_on_nfs fail with the 'read-only' file system issue described here, but we are also seeing test metabench fail with issues related to a 'read-only' file system. See https://testing.whamcloud.com/test_sets/ead02450-eb26-11e9-b62b-52540065bddc for logs == parallel-scale-nfsv4 test metabench: metabench ==================================================== 06:11:52 (1570687912) OPTIONS: METABENCH=/usr/bin/metabench clients=trevis-63vm10,trevis-63vm9.trevis.whamcloud.com mbench_NFILES=10000 mbench_THREADS=4 trevis-63vm10 trevis-63vm9.trevis.whamcloud.com mkdir: cannot create directory '/mnt/lustre/d0.parallel-scale-nfs': Read-only file system chmod: cannot access '/mnt/lustre/d0.parallel-scale-nfs/d0.metabench': No such file or directory + /usr/bin/metabench -w /mnt/lustre/d0.parallel-scale-nfs/d0.metabench -c 10000 -C -S + chmod 0777 /mnt/lustre chmod: changing permissions of '/mnt/lustre': Read-only file system dr-xr-xr-x 24 root root 4096 Aug 29 20:41 /mnt/lustre + su mpiuser sh -c "/usr/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh --oversubscribe -machinefile /tmp/parallel-scale-nfs.machines -np 8 /usr/bin/metabench -w /mnt/lustre/d0.parallel-scale-nfs/d0.metabench -c 10000 -C -S " Metadata Test <no-name> on 10/10/2019 at 06:11:55 Rank 0 process on node trevis-63vm10.trevis.whamcloud.com Rank 1 process on node trevis-63vm10.trevis.whamcloud.com Rank 2 process on node trevis-63vm10.trevis.whamcloud.com Rank 3 process on node trevis-63vm10.trevis.whamcloud.com Rank 4 process on node trevis-63vm9.trevis.whamcloud.com Rank 5 process on node trevis-63vm9.trevis.whamcloud.com Rank 6 process on node trevis-63vm9.trevis.whamcloud.com Rank 7 process on node trevis-63vm9.trevis.whamcloud.com [10/10/2019 06:11:56] FATAL error on process 0 Proc 0: cannot create component d0.parallel-scale-nfs in /mnt/lustre/d0.parallel-scale-nfs/d0.metabench: Read-only file system

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: