[LU-11119] A 'mv' of a file from a local file system to a lustre file system hangs - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.10.3
Labels:
None
Environment:

Hide
uname -a
Linux monarch-login2 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)

lctl lustre_build_version
Lustre version: 2.10.3

Show
uname -a Linux monarch-login2 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux cat /etc/redhat-release CentOS Linux release 7.4.1708 (Core) lctl lustre_build_version Lustre version: 2.10.3

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

I have found a weird problem on our Lustre system when we try to move a file from a different file system (here /tmp) onto the lustre file server. This problem only affects a mv. A cp works ok. The problem is that the 'mv' hangs forever, and the process can not be a killed WHen I did a strace on the mv, the program hangs on fchown.

strace mv /tmp/simon.small.txt  /mnt/lustre/projects/pMOSP/simon
<stuff>
write(4, "1\n", 2)                      = 2
read(3, "", 4194304)                    = 0
utimensat(4, NULL, [{1530777797, 478293939}, {1530777797, 478293939}], 0) = 0
fchown(4, 10001, 10025 

If you look at demsg, you see these multiple errors start appearing at the same time:
The errors don't stop as we can't kill the 'mv' process

Thu Jul  5 18:08:43 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
[Thu Jul  5 18:08:43 2018] Lustre: Skipped 140105 previous similar messages
[Thu Jul  5 18:09:47 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection to lustre-MDT0000 (at 172.16.231.50@o2ib) was lost; in progress operations using this service will wait for recovery to complete
[Thu Jul  5 18:09:47 2018] Lustre: Skipped 285517 previous similar messages
[Thu Jul  5 18:09:47 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
[Thu Jul  5 18:09:47 2018] Lustre: Skipped 285516 previous similar messages

We have the following ofed drivers, which I believe have a known problem with connecting to Lustre servers

ofed_info | head -1
MLNX_OFED_LINUX-4.2-1.2.0.0 (OFED-4.2-1.2.0):

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

chgrp-dk-wed18july.out
3.44 MB
18/Jul/18 4:19 AM
chgrp-stack1-wed18July.out
15 kB
18/Jul/18 4:19 AM
client-chgrp-dk.4aug.out
7.37 MB
03/Aug/18 7:20 AM
client-chgrp-dk-2Aug.out
15.78 MB
02/Aug/18 8:03 AM
client-chgrp-stack1.4aug.out
15 kB
03/Aug/18 7:19 AM
dmesg.MDS.4.47.6july.txt
1.10 MB
06/Jul/18 7:09 AM
dmesg.txt
6 kB
05/Jul/18 8:17 AM
l_getidentity
234 kB
16/Jul/18 7:17 AM
mdt-chgrp-dk.4Aug.out
22.50 MB
03/Aug/18 7:20 AM
mdt-chgrp-dk-2Aug.out
20.26 MB
02/Aug/18 8:04 AM
mdt-chgrp-stack1.4Aug.out
24 kB
03/Aug/18 7:19 AM
output.Tue.17.july.18.txt
24 kB
17/Jul/18 7:19 AM
stack1
1 kB
10/Jul/18 4:11 PM
strace.output.txt
14 kB
05/Jul/18 8:16 AM

Issue Links

is related to

LU-11236 client MDT OST ENOTCONN loops

Open

is related to

LU-10310 Change l_getidentity error level from NOTICE to WARNING

Resolved

LU-11227 client process hangs when lod_sync accesses deactivated OSTs

Resolved

Activity

[LU-11119] A 'mv' of a file from a local file system to a lustre file system hangs

John Hammond added a comment - 16/Jul/18 1:55 PM

Could you run your reproducer and wait until the client blocks in fchown(), then run the attached script stack1 on the MDT (to collect and consolidate the kernel stack traces) and attach the output here?

John Hammond added a comment - 16/Jul/18 1:55 PM Could you run your reproducer and wait until the client blocks in fchown(), then run the attached script stack1 on the MDT (to collect and consolidate the kernel stack traces) and attach the output here?

Monash HPC added a comment - 16/Jul/18 7:18 AM

John
the dmesg errors from the node that hangs upon a chgrp are
[Mon Jul 16 17:14:54 2018] Lustre: lustre-MDT0000-mdc-ffff88351a566800: Connection to lustre-MDT0000 (at 172.16.231.50@o2ib) was lost; in progress operations using this service will wait for recovery to complete
[Mon Jul 16 17:14:55 2018] Lustre: Skipped 1175070 previous similar messages
[Mon Jul 16 17:14:55 2018] Lustre: lustre-MDT0000-mdc-ffff88351a566800: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
[Mon Jul 16 17:14:55 2018] Lustre: Skipped 1175070 previous similar messages

regards
Simon

Monash HPC added a comment - 16/Jul/18 7:18 AM John the dmesg errors from the node that hangs upon a chgrp are [Mon Jul 16 17:14:54 2018] Lustre: lustre-MDT0000-mdc-ffff88351a566800: Connection to lustre-MDT0000 (at 172.16.231.50@o2ib) was lost; in progress operations using this service will wait for recovery to complete [Mon Jul 16 17:14:55 2018] Lustre: Skipped 1175070 previous similar messages [Mon Jul 16 17:14:55 2018] Lustre: lustre-MDT0000-mdc-ffff88351a566800: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib) [Mon Jul 16 17:14:55 2018] Lustre: Skipped 1175070 previous similar messages regards Simon

Monash HPC added a comment - 16/Jul/18 7:13 AM

Dear John
I enabled the logging, and reran the chgrp command that hung.
I could not see any useful comments in the log file but I include it here. Logging started today so that part worked but nothing relating to the issue
regards
Simon

Monash HPC added a comment - 16/Jul/18 7:13 AM Dear John I enabled the logging, and reran the chgrp command that hung. I could not see any useful comments in the log file but I include it here. Logging started today so that part worked but nothing relating to the issue regards Simon

John Hammond added a comment - 12/Jul/18 3:03 PM

Hi Simon,

> Could you also tell me how to modify syslogger to collect l-getidentify errors

Sure. Do you use rsyslog? If so then you can add a line of the following form to /etc/rsyslog.conf on the MDS:

authpriv.warning /var/log/l_getidentity

Then restart rsyslog, run your reproducer, and attach the contents of /var/log/l_getidentity. You should check that no site sensitive content ended up in that file before you attach it.

John Hammond added a comment - 12/Jul/18 3:03 PM Hi Simon, > Could you also tell me how to modify syslogger to collect l-getidentify errors Sure. Do you use rsyslog? If so then you can add a line of the following form to /etc/rsyslog.conf on the MDS: authpriv.warning /var/log/l_getidentity Then restart rsyslog, run your reproducer, and attach the contents of /var/log/l_getidentity . You should check that no site sensitive content ended up in that file before you attach it.

Monash HPC added a comment - 11/Jul/18 12:23 PM

John
I'm away for a few days so will run script next week. Could you also tell me how to modify syslogger to collect l-getidentify errors

Thanks
Simon

Monash HPC added a comment - 11/Jul/18 12:23 PM John I'm away for a few days so will run script next week. Could you also tell me how to modify syslogger to collect l-getidentify errors Thanks Simon

John Hammond added a comment - 10/Jul/18 7:11 PM

Hi Simon,

l_getidentity does generate some logging on error but it uses the authpriv facility with an error loevel of warning. If needed then could you reconfigure syslog on the MDT to temporarily capture any logging generated by l_getidentity when you run your reproducer and attach it here?

Second, could you run your reproducer and wait until the client blocks in fchown(),then run the attached script stack1 on the MDT (to collect and consolidate the kernel stack traces) and attach the output here?

John Hammond added a comment - 10/Jul/18 7:11 PM Hi Simon, l_getidentity does generate some logging on error but it uses the authpriv facility with an error loevel of warning. If needed then could you reconfigure syslog on the MDT to temporarily capture any logging generated by l_getidentity when you run your reproducer and attach it here? Second, could you run your reproducer and wait until the client blocks in fchown() ,then run the attached script stack1 on the MDT (to collect and consolidate the kernel stack traces) and attach the output here?

Monash HPC added a comment - 09/Jul/18 4:26 AM

Dear John,
our identity on all nodes is managed by a LDAP server. On the client (that hangs) I see
*[smichnow@monarch-login2 tmp]$ id smichnow
uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10150(systems),10052(monarch),10054(pMona0002),10057(pMona0005),10060(pMOSP),10077(pMona0006),10086(pMeRC),10130(gurobi),10151(gaussianmonash),10167(M3EarlyAdopters),10184(mc01),10206(wb82),10218(nq46),10290(avizo),10295(comsol-civil),10399(cplex),10567(gauss),10610(icm)*

and on the metadata server
*[root@rclmddc1r14-02-e1 lustre-MDT0000]# id smichnow
uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10610(icm),10167(M3EarlyAdopters),10151(gaussianmonash),10295(comsol-civil),10054(pMona0002),10057(pMona0005),10077(pMona0006),10052(monarch),10150(systems),10130(gurobi),10290(avizo),10399(cplex),10060(pMOSP),10086(pMeRC),10184(mc01),10206(wb82),10567(gauss),10218(nq46)*

which shows the same number of groups but in a different order.

*[root@rclmddc1r14-02-e1 lustre-MDT0000]# lctl get_param mdt.lustre-MDT0000.identity_upcall
mdt.lustre-MDT0000.identity_upcall=/usr/sbin/l_getidentity*

and

*[root@rclmddc1r14-02-e1 lustre-MDT0000]# l_getidentity -d 10001
uid=10001 gid=10025,10052,10054,10057,10060,10077,10086,10130,10150,10151,10167,10184,10206,10218,10290,10295,10399,10567,10610
permissions:
nid perm*

regards
Simon

Monash HPC added a comment - 09/Jul/18 4:26 AM Dear John, our identity on all nodes is managed by a LDAP server. On the client (that hangs) I see * [smichnow@monarch-login2 tmp] $ id smichnow uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10150(systems),10052(monarch),10054(pMona0002),10057(pMona0005),10060(pMOSP),10077(pMona0006),10086(pMeRC),10130(gurobi),10151(gaussianmonash),10167(M3EarlyAdopters),10184(mc01),10206(wb82),10218(nq46),10290(avizo),10295(comsol-civil),10399(cplex),10567(gauss),10610(icm)* and on the metadata server * [root@rclmddc1r14-02-e1 lustre-MDT0000] # id smichnow uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10610(icm),10167(M3EarlyAdopters),10151(gaussianmonash),10295(comsol-civil),10054(pMona0002),10057(pMona0005),10077(pMona0006),10052(monarch),10150(systems),10130(gurobi),10290(avizo),10399(cplex),10060(pMOSP),10086(pMeRC),10184(mc01),10206(wb82),10567(gauss),10218(nq46)* which shows the same number of groups but in a different order. * [root@rclmddc1r14-02-e1 lustre-MDT0000] # lctl get_param mdt.lustre-MDT0000.identity_upcall mdt.lustre-MDT0000.identity_upcall=/usr/sbin/l_getidentity* and * [root@rclmddc1r14-02-e1 lustre-MDT0000] # l_getidentity -d 10001 uid=10001 gid=10025,10052,10054,10057,10060,10077,10086,10130,10150,10151,10167,10184,10206,10218,10290,10295,10399,10567,10610 permissions: nid perm* regards Simon

John Hammond added a comment - 06/Jul/18 1:30 PM

Hi Simon,

I suspect your identity upcall is not finding the supplementary group info it needs for the chown(). Could you run id smichnow on the MDT and post the output? How do you manage the user and groups databases? Do you use /etc/passwd and /etc/group, LDAP, AD, or other? Are the user and group databases on the MDT kept in sync with the rest of the cluster?

The identity upcall mechanism and configuration are described at http://doc.lustre.org/lustre_manual.xhtml#lustreprogramminginterfaces

John Hammond added a comment - 06/Jul/18 1:30 PM Hi Simon, I suspect your identity upcall is not finding the supplementary group info it needs for the chown(). Could you run id smichnow on the MDT and post the output? How do you manage the user and groups databases? Do you use /etc/passwd and /etc/group , LDAP, AD, or other? Are the user and group databases on the MDT kept in sync with the rest of the cluster? The identity upcall mechanism and configuration are described at http://doc.lustre.org/lustre_manual.xhtml#lustreprogramminginterfaces

Monash HPC added a comment - 06/Jul/18 7:20 AM

Also I noticed that a cp -p fails like the mv
cp simon.small.txt /mnt/lustre/projects/pMOSP/simon/simon.small.txt.1
works but
cp -p simon.small.txt /mnt/lustre/projects/pMOSP/simon/simon.small.txt.2
fails i.e it hangs
Drilling in further, a chmod works but a chgrp fails (when ran as the user)
i.e.
chown smichnow /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3
OK
chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3
fails, i.e. hangs

regards
Simon

Monash HPC added a comment - 06/Jul/18 7:20 AM Also I noticed that a cp -p fails like the mv cp simon.small.txt /mnt/lustre/projects/pMOSP/simon/simon.small.txt.1 works but cp -p simon.small.txt /mnt/lustre/projects/pMOSP/simon/simon.small.txt.2 fails i.e it hangs Drilling in further, a chmod works but a chgrp fails (when ran as the user) i.e. chown smichnow /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3 OK chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3 fails, i.e. hangs regards Simon

Monash HPC added a comment - 06/Jul/18 7:10 AM

PS I added the MDS dmesg output as an attachment

Monash HPC added a comment - 06/Jul/18 7:10 AM PS I added the MDS dmesg output as an attachment

Monash HPC added a comment - 06/Jul/18 7:09 AM

John/Nathaniels.
(1) 'mv' works when ran as root
(2) 'mv' always seems to fail at
'
utimensat(4, NULL, [

{1530778053, 0}

{1530777797, 0}

], 0) = 0
fchown(4, 10001, 10060

'
(3) I went to the MDS and give you the logs below. I started the 'mv' after 4:41 so the errors you see should begin at "LustreError: 181962:0:(lod_dev.c:1414:lod_sync()) lustre-MDT0000-mdtlov: can't sync ost 12: -107
[Fri Jul 6 16:51:04 2018] LustreError: 181962:0:(lod_dev.c:1414:lod_sync()) Skipped 8274044 previous similar messages
[Fri Jul 6 16:51:04 2018] Lustre: lustre-MDT0000: Client 66471b3c-6a3e-724d-5030-ee8252fcfcd2 (at 172.16.230.87@o2ib) reconnecting
[Fri Jul 6 16:51:04 2018] Lustre: Skipped 5378430 previous similar messages
[Fri Jul 6 16:51:04 2018] Lustre: lustre-MDT0000: Connection restored to 8e150630-2b04-00b6-c100-b58229b19cac (at 172.16.230.87@o2ib)
[Fri Jul 6 16:51:04 2018] Lustre: Skipped 5377677 previous similar messages
"
(4) The id of the user is
id smichnow
uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10150(systems),10052(monarch),10054(pMona0002),10057(pMona0005),10060(pMOSP),10077(pMona0006),10086(pMeRC),10130(gurobi),10151(gaussianmonash),10167(M3EarlyAdopters),10184(mc01),10206(wb82),10218(nq46),10290(avizo),10295(comsol-civil),10399(cplex),10567(gauss),10610(icm)

(5) The MDS Server is
[root@rclmddc1r14-02-e1 ~]# uname -a
Linux rclmddc1r14-02-e1.erc.monash.edu.au 3.10.0-693.17.1.el7_lustre.x86_64 #1 SMP Mon Feb 26 13:33:51 AEDT 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@rclmddc1r14-02-e1 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 (Maipo)

(6) THe client is
uname -a
Linux monarch-login2 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
uname -a
Linux monarch-login2 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

(7) I removed the sticky bit
ls -ld /mnt/lustre/projects/pMOSP/simon/
drwxrwsr-x 8 smichnow pMOSP 4096 Jul 6 16:47 /mnt/lustre/projects/pMOSP/simon/
[root@monarch-login2 etc]# chmod g-s /mnt/lustre/projects/pMOSP/simon/
[root@monarch-login2 etc]# ls -ld /mnt/lustre/projects/pMOSP/simon/
drwxrwxr-x 8 smichnow pMOSP 4096 Jul 6 16:47 /mnt/lustre/projects/pMOSP/simon/
but the 'mv' still broke when run as user smichnow
thanks
Simon

Monash HPC added a comment - 06/Jul/18 7:09 AM John/Nathaniels. (1) 'mv' works when ran as root (2) 'mv' always seems to fail at ' utimensat(4, NULL, [ {1530778053, 0} , {1530777797, 0} ], 0) = 0 fchown(4, 10001, 10060 ' (3) I went to the MDS and give you the logs below. I started the 'mv' after 4:41 so the errors you see should begin at "LustreError: 181962:0:(lod_dev.c:1414:lod_sync()) lustre-MDT0000-mdtlov: can't sync ost 12: -107 [Fri Jul 6 16:51:04 2018] LustreError: 181962:0:(lod_dev.c:1414:lod_sync()) Skipped 8274044 previous similar messages [Fri Jul 6 16:51:04 2018] Lustre: lustre-MDT0000: Client 66471b3c-6a3e-724d-5030-ee8252fcfcd2 (at 172.16.230.87@o2ib) reconnecting [Fri Jul 6 16:51:04 2018] Lustre: Skipped 5378430 previous similar messages [Fri Jul 6 16:51:04 2018] Lustre: lustre-MDT0000: Connection restored to 8e150630-2b04-00b6-c100-b58229b19cac (at 172.16.230.87@o2ib) [Fri Jul 6 16:51:04 2018] Lustre: Skipped 5377677 previous similar messages " (4) The id of the user is id smichnow uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10150(systems),10052(monarch),10054(pMona0002),10057(pMona0005),10060(pMOSP),10077(pMona0006),10086(pMeRC),10130(gurobi),10151(gaussianmonash),10167(M3EarlyAdopters),10184(mc01),10206(wb82),10218(nq46),10290(avizo),10295(comsol-civil),10399(cplex),10567(gauss),10610(icm) (5) The MDS Server is [root@rclmddc1r14-02-e1 ~] # uname -a Linux rclmddc1r14-02-e1.erc.monash.edu.au 3.10.0-693.17.1.el7_lustre.x86_64 #1 SMP Mon Feb 26 13:33:51 AEDT 2018 x86_64 x86_64 x86_64 GNU/Linux [root@rclmddc1r14-02-e1 ~] # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.4 (Maipo) (6) THe client is uname -a Linux monarch-login2 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux uname -a Linux monarch-login2 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux (7) I removed the sticky bit ls -ld /mnt/lustre/projects/pMOSP/simon/ drwxrwsr-x 8 smichnow pMOSP 4096 Jul 6 16:47 /mnt/lustre/projects/pMOSP/simon/ [root@monarch-login2 etc] # chmod g-s /mnt/lustre/projects/pMOSP/simon/ [root@monarch-login2 etc] # ls -ld /mnt/lustre/projects/pMOSP/simon/ drwxrwxr-x 8 smichnow pMOSP 4096 Jul 6 16:47 /mnt/lustre/projects/pMOSP/simon/ but the 'mv' still broke when run as user smichnow thanks Simon

People

Assignee:: John Hammond

Reporter:: Monash HPC

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 05/Jul/18 8:22 AM

Updated:: 16/Nov/20 2:26 PM

Resolved:: 16/Nov/20 2:26 PM