Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 2.10.3
-
None
-
3
-
9223372036854775807
Description
I have found a weird problem on our Lustre system when we try to move a file from a different file system (here /tmp) onto the lustre file server. This problem only affects a mv. A cp works ok. The problem is that the 'mv' hangs forever, and the process can not be a killed WHen I did a strace on the mv, the program hangs on fchown.
strace mv /tmp/simon.small.txt /mnt/lustre/projects/pMOSP/simon <stuff> write(4, "1\n", 2) = 2 read(3, "", 4194304) = 0 utimensat(4, NULL, [{1530777797, 478293939}, {1530777797, 478293939}], 0) = 0 fchown(4, 10001, 10025 If you look at demsg, you see these multiple errors start appearing at the same time: The errors don't stop as we can't kill the 'mv' process Thu Jul 5 18:08:43 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib) [Thu Jul 5 18:08:43 2018] Lustre: Skipped 140105 previous similar messages [Thu Jul 5 18:09:47 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection to lustre-MDT0000 (at 172.16.231.50@o2ib) was lost; in progress operations using this service will wait for recovery to complete [Thu Jul 5 18:09:47 2018] Lustre: Skipped 285517 previous similar messages [Thu Jul 5 18:09:47 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib) [Thu Jul 5 18:09:47 2018] Lustre: Skipped 285516 previous similar messages
We have the following ofed drivers, which I believe have a known problem with connecting to Lustre servers
ofed_info | head -1 MLNX_OFED_LINUX-4.2-1.2.0.0 (OFED-4.2-1.2.0):
OK this is better. The chgrp is failing because the MDT is not connected to OST000c. What is the status of that OST? It appears that the client and server are not handling this condition correctly.
The MDT logs you provided are not from Lustre 2.10.3. What version of Lustre is the MDT running?
The failed assertions are due to
LU-8573and possibly OFED issues.