Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.14.0
-
None
-
3
-
9223372036854775807
Description
metadata-updates test_3 fails with 'mpi_run failed' starting on 2020-08-08 for for Lustre 2.13.55.9 full-patchless test session at https://testing.whamcloud.com/test_sets/abe45967-5b59-4042-88eb-ff34e5e658ad. Since that time, metadata-updates test 3 has failed 27 times with the latest failure at https://testing.whamcloud.com/test_sets/f550175e-d2d7-4ba2-a9fc-0316726010be.
Looking at the failure in the suite_log, we see that the disk quota was exceeded
== metadata-updates test 3: write_disjoint test ====================================================== 07:33:16 (1608881596) + chmod 0777 /mnt/lustre drwxrwxrwx 6 root root 65536 Dec 25 07:32 /mnt/lustre + su mpiuser sh -c "/usr/lib64/openmpi/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh --oversubscribe -machinefile /tmp/auster.machines -np 2 /usr/lib64/openmpi/bin/write_disjoint -f /mnt/lustre/d0.metadata-updates/f3.metadata-updates -n 1000 " Warning: Permanently added 'trevis-202vm2,10.9.7.130' (ECDSA) to the list of known hosts. random seed: 1608881598 loop 0: chunk_size 15794568 rank 0, loop 8: write() returned Disk quota exceeded -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- ------------------------------------------------------------ A process or daemon was unable to complete a TCP connection to another process: Local host: trevis-202vm1 Remote host: trevis-202vm2 This is usually caused by a firewall on the remote host. Please check that any firewall (e.g., iptables) has been disabled and try again. ------------------------------------------------------------ metadata-updates test_3: @@@@@@ FAIL: mpi_run failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6273:error() = /usr/lib64/lustre/tests/metadata-updates.sh:269:test_3()
On client1 (vm1) dmesg, we see
[111671.914784] Lustre: DEBUG MARKER: == metadata-updates test 3: write_disjoint test ====================================================== 07:33:16 (1608881596) [111673.143081] hugetlbfs: write_disjoint (1566296): Using mlock ulimits for SHM_HUGETLB is deprecated [111678.973977] systemd-coredump[1566303]: Not enough arguments passed by the kernel (0, expected 7). [111679.381003] Lustre: DEBUG MARKER: /usr/sbin/lctl mark metadata-updates test_3: @@@@@@ FAIL: mpi_run failed
There’s nothing else that indicates a problem in the console and dmesg logs.
Attachments
Issue Links
- mentioned in
-
Page Loading...