[LU-11119] A 'mv' of a file from a local file system to a lustre file system hangs Created: 05/Jul/18 Updated: 16/Nov/20 Resolved: 16/Nov/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Monash HPC | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
uname -a cat /etc/redhat-release lctl lustre_build_version |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
I have found a weird problem on our Lustre system when we try to move a file from a different file system (here /tmp) onto the lustre file server. This problem only affects a mv. A cp works ok. The problem is that the 'mv' hangs forever, and the process can not be a killed WHen I did a strace on the mv, the program hangs on fchown. strace mv /tmp/simon.small.txt /mnt/lustre/projects/pMOSP/simon
<stuff>
write(4, "1\n", 2) = 2
read(3, "", 4194304) = 0
utimensat(4, NULL, [{1530777797, 478293939}, {1530777797, 478293939}], 0) = 0
fchown(4, 10001, 10025
If you look at demsg, you see these multiple errors start appearing at the same time:
The errors don't stop as we can't kill the 'mv' process
Thu Jul 5 18:08:43 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
[Thu Jul 5 18:08:43 2018] Lustre: Skipped 140105 previous similar messages
[Thu Jul 5 18:09:47 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection to lustre-MDT0000 (at 172.16.231.50@o2ib) was lost; in progress operations using this service will wait for recovery to complete
[Thu Jul 5 18:09:47 2018] Lustre: Skipped 285517 previous similar messages
[Thu Jul 5 18:09:47 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
[Thu Jul 5 18:09:47 2018] Lustre: Skipped 285516 previous similar messages
We have the following ofed drivers, which I believe have a known problem with connecting to Lustre servers ofed_info | head -1 MLNX_OFED_LINUX-4.2-1.2.0.0 (OFED-4.2-1.2.0): |
| Comments |
| Comment by Nathaniel Clark [ 05/Jul/18 ] |
|
Lustre said it lost connect to MDT and needs to wait for Recovery to complete. Can you attach logs from MDS? |
| Comment by John Hammond [ 05/Jul/18 ] |
|
Hi, Does 'mv' consistently hang in fchown()? Is the 'mv' command being run by root, by the owner of the file, or some other user? From the strace output it appears to be by the file owner. Could you post the output of the 'id' command (in the setup used to reproduce)? I noticed that the destination directory is marked SGID. Does the hang still occur if the destination directory in not marked SGID? Does the MDS use the same Lustre and kernel versions? |
| Comment by Monash HPC [ 06/Jul/18 ] |
|
John/Nathaniels. , {1530777797, 0}], 0) = 0 ' (5) The MDS Server is (6) THe client is (7) I removed the sticky bit |
| Comment by Monash HPC [ 06/Jul/18 ] |
|
PS I added the MDS dmesg output as an attachment |
| Comment by Monash HPC [ 06/Jul/18 ] |
|
Also I noticed that a cp -p fails like the mv regards |
| Comment by John Hammond [ 06/Jul/18 ] |
|
Hi Simon, I suspect your identity upcall is not finding the supplementary group info it needs for the chown(). Could you run id smichnow on the MDT and post the output? How do you manage the user and groups databases? Do you use /etc/passwd and /etc/group, LDAP, AD, or other? Are the user and group databases on the MDT kept in sync with the rest of the cluster? The identity upcall mechanism and configuration are described at http://doc.lustre.org/lustre_manual.xhtml#lustreprogramminginterfaces |
| Comment by Monash HPC [ 09/Jul/18 ] |
|
Dear John, and on the metadata server which shows the same number of groups but in a different order. *[root@rclmddc1r14-02-e1 lustre-MDT0000]# lctl get_param mdt.lustre-MDT0000.identity_upcall and *[root@rclmddc1r14-02-e1 lustre-MDT0000]# l_getidentity -d 10001 regards |
| Comment by John Hammond [ 10/Jul/18 ] |
|
Hi Simon, l_getidentity does generate some logging on error but it uses the authpriv facility with an error loevel of warning. If needed then could you reconfigure syslog on the MDT to temporarily capture any logging generated by l_getidentity when you run your reproducer and attach it here? Second, could you run your reproducer and wait until the client blocks in fchown(),then run the attached script stack1 on the MDT (to collect and consolidate the kernel stack traces) and attach the output here? |
| Comment by Monash HPC [ 11/Jul/18 ] |
|
John Thanks |
| Comment by John Hammond [ 12/Jul/18 ] |
|
Hi Simon, > Could you also tell me how to modify syslogger to collect l-getidentify errors Sure. Do you use rsyslog? If so then you can add a line of the following form to /etc/rsyslog.conf on the MDS: authpriv.warning /var/log/l_getidentity Then restart rsyslog, run your reproducer, and attach the contents of /var/log/l_getidentity. You should check that no site sensitive content ended up in that file before you attach it. |
| Comment by Monash HPC [ 16/Jul/18 ] |
|
Dear John |
| Comment by Monash HPC [ 16/Jul/18 ] |
|
John regards |
| Comment by John Hammond [ 16/Jul/18 ] |
|
Could you run your reproducer and wait until the client blocks in fchown(), then run the attached script stack1 on the MDT (to collect and consolidate the kernel stack traces) and attach the output here? |
| Comment by Monash HPC [ 17/Jul/18 ] |
|
Dear John, which goes up to.. read(4, "\1\0\0\0\0\0\0\0L'\0\0\20\0\0\0pMOSP\0*\0smichnow"..., 136) = 136 , AT_SYMLINK_NOFOLLOW) = 0 and hangs |
| Comment by John Hammond [ 17/Jul/18 ] |
|
Hi Simon, OK this is interesting. The file you attached shows that the MDT was completely idle when stack1 was run. So I'm sorry to say we may have been going in the wrong direction? Could you run it on the client after chown/chgrp hangs? Could you also enable some more debugging and run the following on the client: lctl set_param debug='+dlmtrace rpctrace vfstrace inode trace' lctl set_param debug_mb=64 lctl clear chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3 & ### Wait for chgrp to hang in fchown() lctl dk > /tmp/chgrp-dk.out strack1 > /tmp/chgrp-stack1.out And then attach /tmp/chgrp-dk.out and /tmp/chgrp-stack1.out? |
| Comment by Monash HPC [ 18/Jul/18 ] |
|
Dear John |
| Comment by Monash HPC [ 23/Jul/18 ] |
|
Dear John, |
| Comment by John Hammond [ 01/Aug/18 ] |
|
Hi Simon, The attachments from the 17th show that the client is waiting for the MDT to respond. But the previously attached stack1 output from the MDT show that the server is idle. Can you do the following: # On the client and MDT: lctl set_param debug='+dlmtrace rpctrace vfstrace inode trace' lctl set_param debug_mb=64 lctl clear # On the client: chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3 & # Wait for chgrp to hang in fchown() lctl dk > /tmp/client-chgrp-dk.out strack1 > /tmp/client-chgrp-stack1.out # On the MDT: lctl dk > /tmp/mdt-chgrp-dk.out strack1 > /tmp/mdt-chgrp-stack1.out |
| Comment by Monash HPC [ 02/Aug/18 ] |
|
Dear John |
| Comment by John Hammond [ 02/Aug/18 ] |
|
A copy of stack1 is attached to this ticket. You used it previously. |
| Comment by Monash HPC [ 03/Aug/18 ] |
|
John, *Message from syslogd@monarch-login2 at Aug 3 17:06:27 ... Message from syslogd@monarch-login2 at Aug 3 17:06:27 ... Would you be interested in any of the contents of the /var/crash directories, i.e either the large vmcore or the vmcore-dmesg files? This may/may not be related to another bug we have in lustre which crashes our machines regularly. I am led to believe that bug is caused by an issue with the current version of mofed drivers. I ran the jobs an have uploaded the files for you, with the Aug4 postfix on them. |
| Comment by John Hammond [ 03/Aug/18 ] |
|
OK this is better. The chgrp is failing because the MDT is not connected to OST000c. What is the status of that OST? It appears that the client and server are not handling this condition correctly. The MDT logs you provided are not from Lustre 2.10.3. What version of Lustre is the MDT running? The failed assertions are due to |
| Comment by Monash HPC [ 06/Aug/18 ] |
|
Dear John, and some Both versions show this problem. As for the OST000c, this is an inactive OST But the file belongs on a different OST ( I ran some code I found on the internet to find this) The MDT Lustre version is regards |
| Comment by John Hammond [ 13/Aug/18 ] |
|
Hi Simon, Is OST000c offline or just deactivated to prevent new files from being created on it? If it's only to prevent new files from being created then you should use the max_create_count parameter. See http://doc.lustre.org/lustre_manual.xhtml#section_remove_ost. |
| Comment by Julien Wallior [ 14/Aug/18 ] |
|
Simon, John, Regarding the assertion(bd_md_count==0), how does your trace look like? Do you have the {client,server}_bulk_callback() LustreErrors like in [ 142.947358] [<ffffffff8100471d>] dump_trace+0x7d/0x2d0
|
| Comment by Peter Jones [ 01/Sep/18 ] |
|
Simon JFYI- the above-mentioned fix for Peter |
| Comment by Malcolm Haak - NCI (Inactive) [ 06/Sep/18 ] |
|
Hi Peter, Which commit ID? I looked at the git log for the 2.10.5 tag and could not see a commit claiming to pull this fix. Was it merged into a different commit? |
| Comment by Peter Jones [ 06/Sep/18 ] |
|
Ah. I apologize - it is not actually in 2.10.5. We had discussed including it but decided against it in the end because at the time it had not had very long in testing. It should be in 2.10.6 |
| Comment by Malcolm Haak - NCI (Inactive) [ 06/Sep/18 ] |
|
Hi Peter, All good. Just though I might have been going crazy was all |
| Comment by Malcolm Haak - NCI (Inactive) [ 06/Sep/18 ] |
|
Also Peter,
http://wiki.lustre.org/Lustre_2.10.5_Changelog Thanks |
| Comment by Peter Jones [ 07/Sep/18 ] |
|
Not any more - thx |
| Comment by Ian Costello [ 16/Nov/20 ] |
|
Hi Peter, Problem resolved, we can close this one. Thanks, |