Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10566

parallel-scale-nfsv4 test_metabench: mkdir: cannot create directory on Read-only file system

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • Lustre 2.12.0, Lustre 2.10.4
    • Lustre 2.11.0, Lustre 2.10.4, Lustre 2.12.5, Lustre 2.12.6
    • None
    • 3
    • 9223372036854775807

    Description

      parallel-scale-nfsv4 test_metabench - metabench failed! 1
      ^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^

      This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

      This issue relates to the following test suite run:
      https://testing.hpdd.intel.com/test_sets/6e890fe6-fd53-11e7-a6ad-52540065bddc

      test_metabench failed with the following error:

      metabench failed! 1
      

      server: 2.10.57 RHEL7 ldiskfs
      client: SLES12SP3

      test log

      == parallel-scale-nfsv4 test metabench: metabench ==================================================== 11:23:59 (1516389839)
      OPTIONS:
      METABENCH=/usr/bin/metabench
      clients=onyx-33vm1,onyx-33vm2
      mbench_NFILES=10000
      mbench_THREADS=4
      onyx-33vm1
      onyx-33vm2
      mkdir: cannot create directory ‘/mnt/lustre/d0.parallel-scale-nfs’: Read-only file system
      chmod: cannot access '/mnt/lustre/d0.parallel-scale-nfs/d0.metabench': No such file or directory
      + /usr/bin/metabench -w /mnt/lustre/d0.parallel-scale-nfs/d0.metabench -c 10000 -C -S 
      + chmod 0777 /mnt/lustre
      chmod: changing permissions of '/mnt/lustre': Read-only file system
      dr-xr-xr-x 23 root root 4096 Jan 19 00:29 /mnt/lustre
      + su mpiuser sh -c "/usr/lib64/mpi/gcc/openmpi/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh -machinefile /tmp/parallel-scale-nfs.machines -np 8 /usr/bin/metabench -w /mnt/lustre/d0.parallel-scale-nfs/d0.metabench -c 10000 -C -S "
      [onyx-33vm2:14600] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm2:14600] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm2:14601] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm2:14601] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm1:09898] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm1:09898] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm1:09900] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm1:09900] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm1:09899] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm1:09899] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm2:14602] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm2:14602] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm2:14604] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm2:14604] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm1:09902] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      [onyx-33vm1:09902] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
      Metadata Test <no-name> on 01/19/2018 at 11:23:59
      
      Rank   0 process on node onyx-33vm1
      Rank   1 process on node onyx-33vm1
      Rank   2 process on node onyx-33vm1
      Rank   3 process on node onyx-33vm1
      Rank   4 process on node onyx-33vm2
      Rank   5 process on node onyx-33vm2
      Rank   6 process on node onyx-33vm2
      Rank   7 process on node onyx-33vm2
      
      [01/19/2018 11:23:59] FATAL error on process 0
      Proc 0: cannot create component d0.parallel-scale-nfs in /mnt/lustre/d0.parallel-scale-nfs/d0.metabench: Read-only file system
      [onyx-33vm1][[7407,1],1][btl_tcp_frag.c:238:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
      

      Attachments

        Issue Links

          Activity

            [LU-10566] parallel-scale-nfsv4 test_metabench: mkdir: cannot create directory on Read-only file system
            jamesanunez James Nunez (Inactive) added a comment - - edited

            I'm reopening this ticket because we are still seeing the read-only file system problem for 2.12.5 RC1 at https://testing.whamcloud.com/test_sets/bd833e99-d557-4ec6-a768-91440b98b55e .

            Maybe LU-12231 is the same issue and this one can be closed since LU-12231 is still open?

            jamesanunez James Nunez (Inactive) added a comment - - edited I'm reopening this ticket because we are still seeing the read-only file system problem for 2.12.5 RC1 at https://testing.whamcloud.com/test_sets/bd833e99-d557-4ec6-a768-91440b98b55e . Maybe LU-12231 is the same issue and this one can be closed since LU-12231 is still open?
            sarah Sarah Liu added a comment - hit this again on b2_10 2.10.6-rc2 zfs DNE https://testing.whamcloud.com/test_sets/05a4f148-ef60-11e8-bfe1-52540065bddc

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31953/
            Subject: LU-10566 test: fix nfs exports clean up
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 741347aafb8053d02294650add007e1bf050e978

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31953/ Subject: LU-10566 test: fix nfs exports clean up Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 741347aafb8053d02294650add007e1bf050e978

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31953
            Subject: LU-10566 test: fix nfs exports clean up
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 5a86bc8daac771e428ad839f27ea1542a4a40f48

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31953 Subject: LU-10566 test: fix nfs exports clean up Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 5a86bc8daac771e428ad839f27ea1542a4a40f48
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31679/
            Subject: LU-10566 test: fix nfs exports clean up
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 2cdc1ad8b86d013fdb8ffc70ee567284537eee47

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31679/ Subject: LU-10566 test: fix nfs exports clean up Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2cdc1ad8b86d013fdb8ffc70ee567284537eee47

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31679
            Subject: LU-10566 test: debug
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 7550568625d2732afeb36a52df36db0109bda82d

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31679 Subject: LU-10566 test: debug Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7550568625d2732afeb36a52df36db0109bda82d
            jamesanunez James Nunez (Inactive) added a comment - - edited

            It looks like we are hitting this again with 2.10.59 RHEL 7 ldiskfs servers and RHEL 7 clients.

            This time, the file system is not almost full. From the suite_log, before we run any parallel-scale-nfsv4 tests,

            UUID                   1K-blocks        Used   Available Use% Mounted on
            
            lustre-MDT0000_UUID      1165900       17368     1045336   2% /mnt/lustre[MDT:0]
            lustre-OST0000_UUID      1933276       26956     1781868   1% /mnt/lustre[OST:0]
            lustre-OST0001_UUID      1933276       26944     1785064   1% /mnt/lustre[OST:1]
            lustre-OST0002_UUID      1933276       31044     1780088   2% /mnt/lustre[OST:2]
            lustre-OST0003_UUID      1933276       26956     1784908   1% /mnt/lustre[OST:3]
            lustre-OST0004_UUID      1933276       26948     1784972   1% /mnt/lustre[OST:4]
            lustre-OST0005_UUID      1933276       26988     1784812   1% /mnt/lustre[OST:5]
            lustre-OST0006_UUID      1933276       26960     1784960   1% /mnt/lustre[OST:6]
            
            filesystem_summary:     13532932      192796    12486672   2% /mnt/lustre 
            

            We see the same output from metabench as in the description

            == parallel-scale-nfsv4 test metabench: metabench ==================================================== 08:43:41 (1521017021)
            OPTIONS:
            METABENCH=/usr/bin/metabench
            clients=onyx-30vm5.onyx.hpdd.intel.com,onyx-30vm6
            mbench_NFILES=10000
            mbench_THREADS=4
            onyx-30vm5.onyx.hpdd.intel.com
            onyx-30vm6
            mkdir: cannot create directory '/mnt/lustre/d0.parallel-scale-nfs': Read-only file system
            chmod: cannot access '/mnt/lustre/d0.parallel-scale-nfs/d0.metabench': No such file or directory
            + /usr/bin/metabench -w /mnt/lustre/d0.parallel-scale-nfs/d0.metabench -c 10000 -C -S
            + chmod 0777 /mnt/lustre
            chmod: changing permissions of '/mnt/lustre': Read-only file system
            dr-xr-xr-x 23 root root 4096 Mar 14 07:59 /mnt/lustre 
            

            https://testing.hpdd.intel.com/test_sets/734ea438-2773-11e8-9e0e-52540065bddc

            https://testing.hpdd.intel.com/test_sets/3b702578-2769-11e8-9e0e-52540065bddc

             

            jamesanunez James Nunez (Inactive) added a comment - - edited It looks like we are hitting this again with 2.10.59 RHEL 7 ldiskfs servers and RHEL 7 clients. This time, the file system is not almost full. From the suite_log, before we run any parallel-scale-nfsv4 tests, UUID                   1K-blocks        Used   Available Use% Mounted on lustre-MDT0000_UUID      1165900       17368     1045336   2% /mnt/lustre[MDT:0] lustre-OST0000_UUID      1933276       26956     1781868   1% /mnt/lustre[OST:0] lustre-OST0001_UUID      1933276       26944     1785064   1% /mnt/lustre[OST:1] lustre-OST0002_UUID      1933276       31044     1780088   2% /mnt/lustre[OST:2] lustre-OST0003_UUID      1933276       26956     1784908   1% /mnt/lustre[OST:3] lustre-OST0004_UUID      1933276       26948     1784972   1% /mnt/lustre[OST:4] lustre-OST0005_UUID      1933276       26988     1784812   1% /mnt/lustre[OST:5] lustre-OST0006_UUID      1933276       26960     1784960   1% /mnt/lustre[OST:6] filesystem_summary:     13532932      192796    12486672   2% /mnt/lustre  We see the same output from metabench as in the description == parallel-scale-nfsv4 test metabench: metabench ==================================================== 08:43:41 (1521017021) OPTIONS: METABENCH=/usr/bin/metabench clients=onyx-30vm5.onyx.hpdd.intel.com,onyx-30vm6 mbench_NFILES=10000 mbench_THREADS=4 onyx-30vm5.onyx.hpdd.intel.com onyx-30vm6 mkdir: cannot create directory '/mnt/lustre/d0.parallel-scale-nfs': Read-only file system chmod: cannot access '/mnt/lustre/d0.parallel-scale-nfs/d0.metabench': No such file or directory + /usr/bin/metabench -w /mnt/lustre/d0.parallel-scale-nfs/d0.metabench -c 10000 -C -S + chmod 0777 /mnt/lustre chmod: changing permissions of '/mnt/lustre': Read-only file system dr-xr-xr-x 23 root root 4096 Mar 14 07:59 /mnt/lustre  https://testing.hpdd.intel.com/test_sets/734ea438-2773-11e8-9e0e-52540065bddc https://testing.hpdd.intel.com/test_sets/3b702578-2769-11e8-9e0e-52540065bddc  
            mdiep Minh Diep added a comment -

            I found that obdfilter-survey test_1c did not clean up propertly

            =============> Destroy 1 on 10.9.6.12:lustre-OST0000_ecc
            error: destroy: invalid objid '3'
            destroy OST object <objid> [num [verbose]]
            usage: destroy <num> objects, starting at objid <objid>
            run <command> after connecting to device <devno>
            --device <devno> <command [args ...]>

            mdiep Minh Diep added a comment - I found that obdfilter-survey test_1c did not clean up propertly =============> Destroy 1 on 10.9.6.12:lustre-OST0000_ecc error: destroy: invalid objid '3' destroy OST object <objid> [num [verbose] ] usage: destroy <num> objects, starting at objid <objid> run <command> after connecting to device <devno> --device <devno> <command [args ...] >

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31231
            Subject: LU-10566 test: don't direct lfs df to /dev/null
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: b9bc92bd01d9a769ddb4d8669b27fe6db8e7cf54

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31231 Subject: LU-10566 test: don't direct lfs df to /dev/null Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b9bc92bd01d9a769ddb4d8669b27fe6db8e7cf54

            People

              wc-triage WC Triage
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: