[LU-10566] parallel-scale-nfsv4 test_metabench: mkdir: cannot create directory on Read-only file system Created: 25/Jan/18 Updated: 14/Apr/21 |
|
| Status: | Reopened |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0, Lustre 2.10.4, Lustre 2.12.5, Lustre 2.12.6 |
| Fix Version/s: | Lustre 2.12.0, Lustre 2.10.4 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
parallel-scale-nfsv4 test_metabench - metabench failed! 1 This issue was created by maloo for sarah_lw <wei3.liu@intel.com> This issue relates to the following test suite run: test_metabench failed with the following error: metabench failed! 1 server: 2.10.57 RHEL7 ldiskfs test log == parallel-scale-nfsv4 test metabench: metabench ==================================================== 11:23:59 (1516389839) OPTIONS: METABENCH=/usr/bin/metabench clients=onyx-33vm1,onyx-33vm2 mbench_NFILES=10000 mbench_THREADS=4 onyx-33vm1 onyx-33vm2 mkdir: cannot create directory ‘/mnt/lustre/d0.parallel-scale-nfs’: Read-only file system chmod: cannot access '/mnt/lustre/d0.parallel-scale-nfs/d0.metabench': No such file or directory + /usr/bin/metabench -w /mnt/lustre/d0.parallel-scale-nfs/d0.metabench -c 10000 -C -S + chmod 0777 /mnt/lustre chmod: changing permissions of '/mnt/lustre': Read-only file system dr-xr-xr-x 23 root root 4096 Jan 19 00:29 /mnt/lustre + su mpiuser sh -c "/usr/lib64/mpi/gcc/openmpi/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh -machinefile /tmp/parallel-scale-nfs.machines -np 8 /usr/bin/metabench -w /mnt/lustre/d0.parallel-scale-nfs/d0.metabench -c 10000 -C -S " [onyx-33vm2:14600] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm2:14600] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm2:14601] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm2:14601] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm1:09898] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm1:09898] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm1:09900] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm1:09900] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm1:09899] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm1:09899] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm2:14602] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm2:14602] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm2:14604] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm2:14604] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm1:09902] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_ofi: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) [onyx-33vm1:09902] mca: base: component_find: unable to open /usr/lib64/mpi/gcc/openmpi/lib64/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) Metadata Test <no-name> on 01/19/2018 at 11:23:59 Rank 0 process on node onyx-33vm1 Rank 1 process on node onyx-33vm1 Rank 2 process on node onyx-33vm1 Rank 3 process on node onyx-33vm1 Rank 4 process on node onyx-33vm2 Rank 5 process on node onyx-33vm2 Rank 6 process on node onyx-33vm2 Rank 7 process on node onyx-33vm2 [01/19/2018 11:23:59] FATAL error on process 0 Proc 0: cannot create component d0.parallel-scale-nfs in /mnt/lustre/d0.parallel-scale-nfs/d0.metabench: Read-only file system [onyx-33vm1][[7407,1],1][btl_tcp_frag.c:238:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) |
| Comments |
| Comment by James Nunez (Inactive) [ 26/Jan/18 ] |
|
From the top of the suite_log, we see that the file system is 99% full: UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 1165900 85376 977328 8% /mnt/lustre[MDT:0] lustre-OST0000_UUID 1933276 1801184 10852 99% /mnt/lustre[OST:0] lustre-OST0001_UUID 1933276 1795316 16720 99% /mnt/lustre[OST:1] lustre-OST0002_UUID 1933276 1795232 16776 99% /mnt/lustre[OST:2] lustre-OST0003_UUID 1933276 1795292 16660 99% /mnt/lustre[OST:3] lustre-OST0004_UUID 1933276 1795180 16828 99% /mnt/lustre[OST:4] lustre-OST0005_UUID 1933276 1795300 16596 99% /mnt/lustre[OST:5] lustre-OST0006_UUID 1933276 1803440 8596 100% /mnt/lustre[OST:6] filesystem_summary: 13532932 12580944 103028 99% /mnt/lustre Does this explain why the file system changes to 'read-only' ? I suspect that the NFS file system is read-only, but we should confirm that the Lustre file system is not read-only. |
| Comment by James Nunez (Inactive) [ 06/Feb/18 ] |
|
We are seeing this issue on failover test sessions with DNE configured and ZFS servers with servers and clients el7: |
| Comment by Minh Diep [ 08/Feb/18 ] |
|
+1 on b2_10 |
| Comment by Gerrit Updater [ 08/Feb/18 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31231 |
| Comment by Minh Diep [ 09/Feb/18 ] |
|
I found that obdfilter-survey test_1c did not clean up propertly =============> Destroy 1 on 10.9.6.12:lustre-OST0000_ecc |
| Comment by James Nunez (Inactive) [ 14/Mar/18 ] |
|
It looks like we are hitting this again with 2.10.59 RHEL 7 ldiskfs servers and RHEL 7 clients. This time, the file system is not almost full. From the suite_log, before we run any parallel-scale-nfsv4 tests, UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 1165900 17368 1045336 2% /mnt/lustre[MDT:0] lustre-OST0000_UUID 1933276 26956 1781868 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 1933276 26944 1785064 1% /mnt/lustre[OST:1] lustre-OST0002_UUID 1933276 31044 1780088 2% /mnt/lustre[OST:2] lustre-OST0003_UUID 1933276 26956 1784908 1% /mnt/lustre[OST:3] lustre-OST0004_UUID 1933276 26948 1784972 1% /mnt/lustre[OST:4] lustre-OST0005_UUID 1933276 26988 1784812 1% /mnt/lustre[OST:5] lustre-OST0006_UUID 1933276 26960 1784960 1% /mnt/lustre[OST:6] filesystem_summary: 13532932 192796 12486672 2% /mnt/lustre We see the same output from metabench as in the description == parallel-scale-nfsv4 test metabench: metabench ==================================================== 08:43:41 (1521017021) OPTIONS: METABENCH=/usr/bin/metabench clients=onyx-30vm5.onyx.hpdd.intel.com,onyx-30vm6 mbench_NFILES=10000 mbench_THREADS=4 onyx-30vm5.onyx.hpdd.intel.com onyx-30vm6 mkdir: cannot create directory '/mnt/lustre/d0.parallel-scale-nfs': Read-only file system chmod: cannot access '/mnt/lustre/d0.parallel-scale-nfs/d0.metabench': No such file or directory + /usr/bin/metabench -w /mnt/lustre/d0.parallel-scale-nfs/d0.metabench -c 10000 -C -S + chmod 0777 /mnt/lustre chmod: changing permissions of '/mnt/lustre': Read-only file system dr-xr-xr-x 23 root root 4096 Mar 14 07:59 /mnt/lustre https://testing.hpdd.intel.com/test_sets/734ea438-2773-11e8-9e0e-52540065bddc https://testing.hpdd.intel.com/test_sets/3b702578-2769-11e8-9e0e-52540065bddc
|
| Comment by Gerrit Updater [ 16/Mar/18 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31679 |
| Comment by Gerrit Updater [ 09/Apr/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31679/ |
| Comment by Peter Jones [ 09/Apr/18 ] |
|
Landed for 2.12 |
| Comment by Gerrit Updater [ 11/Apr/18 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31953 |
| Comment by Gerrit Updater [ 16/Apr/18 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31953/ |
| Comment by Sarah Liu [ 29/Nov/18 ] |
|
hit this again on b2_10 2.10.6-rc2 zfs DNE https://testing.whamcloud.com/test_sets/05a4f148-ef60-11e8-bfe1-52540065bddc |
| Comment by James Nunez (Inactive) [ 04/Jun/20 ] |
|
I'm reopening this ticket because we are still seeing the read-only file system problem for 2.12.5 RC1 at https://testing.whamcloud.com/test_sets/bd833e99-d557-4ec6-a768-91440b98b55e . Maybe |