Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11119

A 'mv' of a file from a local file system to a lustre file system hangs

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.10.3
    • None
    • 3
    • 9223372036854775807

    Description

      I have found a weird problem on our Lustre system when we try to move a file from a different file system (here /tmp) onto the lustre file server. This problem only affects a mv. A cp works ok. The problem is that the 'mv' hangs forever, and the process can not be a killed WHen I did a strace on the mv, the program hangs on fchown.

      strace mv /tmp/simon.small.txt  /mnt/lustre/projects/pMOSP/simon
      <stuff>
      write(4, "1\n", 2)                      = 2
      read(3, "", 4194304)                    = 0
      utimensat(4, NULL, [{1530777797, 478293939}, {1530777797, 478293939}], 0) = 0
      fchown(4, 10001, 10025 
      
      If you look at demsg, you see these multiple errors start appearing at the same time:
      The errors don't stop as we can't kill the 'mv' process
      
      Thu Jul  5 18:08:43 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
      [Thu Jul  5 18:08:43 2018] Lustre: Skipped 140105 previous similar messages
      [Thu Jul  5 18:09:47 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection to lustre-MDT0000 (at 172.16.231.50@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      [Thu Jul  5 18:09:47 2018] Lustre: Skipped 285517 previous similar messages
      [Thu Jul  5 18:09:47 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
      [Thu Jul  5 18:09:47 2018] Lustre: Skipped 285516 previous similar messages
      

      We have the following ofed drivers, which I believe have a known problem with connecting to Lustre servers

      ofed_info | head -1
      MLNX_OFED_LINUX-4.2-1.2.0.0 (OFED-4.2-1.2.0):
      

      Attachments

        1. chgrp-dk-wed18july.out
          3.44 MB
        2. chgrp-stack1-wed18July.out
          15 kB
        3. client-chgrp-dk.4aug.out
          7.37 MB
        4. client-chgrp-dk-2Aug.out
          15.78 MB
        5. client-chgrp-stack1.4aug.out
          15 kB
        6. dmesg.MDS.4.47.6july.txt
          1.10 MB
        7. dmesg.txt
          6 kB
        8. l_getidentity
          234 kB
        9. mdt-chgrp-dk.4Aug.out
          22.50 MB
        10. mdt-chgrp-dk-2Aug.out
          20.26 MB
        11. mdt-chgrp-stack1.4Aug.out
          24 kB
        12. output.Tue.17.july.18.txt
          24 kB
        13. stack1
          1 kB
        14. strace.output.txt
          14 kB

        Issue Links

          Activity

            [LU-11119] A 'mv' of a file from a local file system to a lustre file system hangs
            jhammond John Hammond added a comment -

            Could you run your reproducer and wait until the client blocks in fchown(), then run the attached script stack1 on the MDT (to collect and consolidate the kernel stack traces) and attach the output here?

            jhammond John Hammond added a comment - Could you run your reproducer and wait until the client blocks in fchown(), then run the attached script stack1 on the MDT (to collect and consolidate the kernel stack traces) and attach the output here?
            monash-hpc Monash HPC added a comment -

            John
            the dmesg errors from the node that hangs upon a chgrp are
            [Mon Jul 16 17:14:54 2018] Lustre: lustre-MDT0000-mdc-ffff88351a566800: Connection to lustre-MDT0000 (at 172.16.231.50@o2ib) was lost; in progress operations using this service will wait for recovery to complete
            [Mon Jul 16 17:14:55 2018] Lustre: Skipped 1175070 previous similar messages
            [Mon Jul 16 17:14:55 2018] Lustre: lustre-MDT0000-mdc-ffff88351a566800: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
            [Mon Jul 16 17:14:55 2018] Lustre: Skipped 1175070 previous similar messages

            regards
            Simon

            monash-hpc Monash HPC added a comment - John the dmesg errors from the node that hangs upon a chgrp are [Mon Jul 16 17:14:54 2018] Lustre: lustre-MDT0000-mdc-ffff88351a566800: Connection to lustre-MDT0000 (at 172.16.231.50@o2ib) was lost; in progress operations using this service will wait for recovery to complete [Mon Jul 16 17:14:55 2018] Lustre: Skipped 1175070 previous similar messages [Mon Jul 16 17:14:55 2018] Lustre: lustre-MDT0000-mdc-ffff88351a566800: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib) [Mon Jul 16 17:14:55 2018] Lustre: Skipped 1175070 previous similar messages regards Simon
            monash-hpc Monash HPC added a comment -

            Dear John
            I enabled the logging, and reran the chgrp command that hung.
            I could not see any useful comments in the log file but I include it here. Logging started today so that part worked but nothing relating to the issue
            regards
            Simon

            monash-hpc Monash HPC added a comment - Dear John I enabled the logging, and reran the chgrp command that hung. I could not see any useful comments in the log file but I include it here. Logging started today so that part worked but nothing relating to the issue regards Simon
            jhammond John Hammond added a comment -

            Hi Simon,

            > Could you also tell me how to modify syslogger to collect l-getidentify errors

            Sure. Do you use rsyslog? If so then you can add a line of the following form to /etc/rsyslog.conf on the MDS:

            authpriv.warning /var/log/l_getidentity
            

            Then restart rsyslog, run your reproducer, and attach the contents of /var/log/l_getidentity. You should check that no site sensitive content ended up in that file before you attach it.

            jhammond John Hammond added a comment - Hi Simon, > Could you also tell me how to modify syslogger to collect l-getidentify errors Sure. Do you use rsyslog? If so then you can add a line of the following form to /etc/rsyslog.conf on the MDS: authpriv.warning /var/log/l_getidentity Then restart rsyslog, run your reproducer, and attach the contents of /var/log/l_getidentity . You should check that no site sensitive content ended up in that file before you attach it.
            monash-hpc Monash HPC added a comment -

            John
            I'm away for a few days so will run script next week. Could you also tell me how to modify syslogger to collect l-getidentify errors

            Thanks
            Simon

            monash-hpc Monash HPC added a comment - John I'm away for a few days so will run script next week. Could you also tell me how to modify syslogger to collect l-getidentify errors Thanks Simon
            jhammond John Hammond added a comment -

            Hi Simon,

            l_getidentity does generate some logging on error but it uses the authpriv facility with an error loevel of warning. If needed then could you reconfigure syslog on the MDT to temporarily capture any logging generated by l_getidentity when you run your reproducer and attach it here?

            Second, could you run your reproducer and wait until the client blocks in fchown(),then run the attached script stack1 on the MDT (to collect and consolidate the kernel stack traces) and attach the output here?

            jhammond John Hammond added a comment - Hi Simon, l_getidentity does generate some logging on error but it uses the authpriv facility with an error loevel of warning. If needed then could you reconfigure syslog on the MDT to temporarily capture any logging generated by l_getidentity when you run your reproducer and attach it here? Second, could you run your reproducer and wait until the client blocks in fchown() ,then run the attached script stack1 on the MDT (to collect and consolidate the kernel stack traces) and attach the output here?
            monash-hpc Monash HPC added a comment -

            Dear John,
            our identity on all nodes is managed by a LDAP server. On the client (that hangs) I see
            *[smichnow@monarch-login2 tmp]$ id smichnow
            uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10150(systems),10052(monarch),10054(pMona0002),10057(pMona0005),10060(pMOSP),10077(pMona0006),10086(pMeRC),10130(gurobi),10151(gaussianmonash),10167(M3EarlyAdopters),10184(mc01),10206(wb82),10218(nq46),10290(avizo),10295(comsol-civil),10399(cplex),10567(gauss),10610(icm)*

            and on the metadata server
            *[root@rclmddc1r14-02-e1 lustre-MDT0000]# id smichnow
            uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10610(icm),10167(M3EarlyAdopters),10151(gaussianmonash),10295(comsol-civil),10054(pMona0002),10057(pMona0005),10077(pMona0006),10052(monarch),10150(systems),10130(gurobi),10290(avizo),10399(cplex),10060(pMOSP),10086(pMeRC),10184(mc01),10206(wb82),10567(gauss),10218(nq46)*

            which shows the same number of groups but in a different order.

            *[root@rclmddc1r14-02-e1 lustre-MDT0000]# lctl get_param mdt.lustre-MDT0000.identity_upcall
            mdt.lustre-MDT0000.identity_upcall=/usr/sbin/l_getidentity*

            and

            *[root@rclmddc1r14-02-e1 lustre-MDT0000]# l_getidentity -d 10001
            uid=10001 gid=10025,10052,10054,10057,10060,10077,10086,10130,10150,10151,10167,10184,10206,10218,10290,10295,10399,10567,10610
            permissions:
            nid perm*

            regards
            Simon

            monash-hpc Monash HPC added a comment - Dear John, our identity on all nodes is managed by a LDAP server. On the client (that hangs) I see * [smichnow@monarch-login2 tmp] $ id smichnow uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10150(systems),10052(monarch),10054(pMona0002),10057(pMona0005),10060(pMOSP),10077(pMona0006),10086(pMeRC),10130(gurobi),10151(gaussianmonash),10167(M3EarlyAdopters),10184(mc01),10206(wb82),10218(nq46),10290(avizo),10295(comsol-civil),10399(cplex),10567(gauss),10610(icm)* and on the metadata server * [root@rclmddc1r14-02-e1 lustre-MDT0000] # id smichnow uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10610(icm),10167(M3EarlyAdopters),10151(gaussianmonash),10295(comsol-civil),10054(pMona0002),10057(pMona0005),10077(pMona0006),10052(monarch),10150(systems),10130(gurobi),10290(avizo),10399(cplex),10060(pMOSP),10086(pMeRC),10184(mc01),10206(wb82),10567(gauss),10218(nq46)* which shows the same number of groups but in a different order. * [root@rclmddc1r14-02-e1 lustre-MDT0000] # lctl get_param mdt.lustre-MDT0000.identity_upcall mdt.lustre-MDT0000.identity_upcall=/usr/sbin/l_getidentity* and * [root@rclmddc1r14-02-e1 lustre-MDT0000] # l_getidentity -d 10001 uid=10001 gid=10025,10052,10054,10057,10060,10077,10086,10130,10150,10151,10167,10184,10206,10218,10290,10295,10399,10567,10610 permissions: nid perm* regards Simon
            jhammond John Hammond added a comment -

            Hi Simon,

            I suspect your identity upcall is not finding the supplementary group info it needs for the chown(). Could you run id smichnow on the MDT and post the output? How do you manage the user and groups databases? Do you use /etc/passwd and /etc/group, LDAP, AD, or other? Are the user and group databases on the MDT kept in sync with the rest of the cluster?

            The identity upcall mechanism and configuration are described at http://doc.lustre.org/lustre_manual.xhtml#lustreprogramminginterfaces

            jhammond John Hammond added a comment - Hi Simon, I suspect your identity upcall is not finding the supplementary group info it needs for the chown(). Could you run id smichnow on the MDT and post the output? How do you manage the user and groups databases? Do you use /etc/passwd and /etc/group , LDAP, AD, or other? Are the user and group databases on the MDT kept in sync with the rest of the cluster? The identity upcall mechanism and configuration are described at http://doc.lustre.org/lustre_manual.xhtml#lustreprogramminginterfaces
            monash-hpc Monash HPC added a comment -

            Also I noticed that a cp -p fails like the mv
            cp simon.small.txt /mnt/lustre/projects/pMOSP/simon/simon.small.txt.1
            works but
            cp -p simon.small.txt /mnt/lustre/projects/pMOSP/simon/simon.small.txt.2
            fails i.e it hangs
            Drilling in further, a chmod works but a chgrp fails (when ran as the user)
            i.e.
            chown smichnow /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3
            OK
            chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3
            fails, i.e. hangs

            regards
            Simon

            monash-hpc Monash HPC added a comment - Also I noticed that a cp -p fails like the mv cp simon.small.txt /mnt/lustre/projects/pMOSP/simon/simon.small.txt.1 works but cp -p simon.small.txt /mnt/lustre/projects/pMOSP/simon/simon.small.txt.2 fails i.e it hangs Drilling in further, a chmod works but a chgrp fails (when ran as the user) i.e. chown smichnow /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3 OK chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3 fails, i.e. hangs regards Simon
            monash-hpc Monash HPC added a comment -

            PS I added the MDS dmesg output as an attachment

            monash-hpc Monash HPC added a comment - PS I added the MDS dmesg output as an attachment
            monash-hpc Monash HPC added a comment -

            John/Nathaniels.
            (1) 'mv' works when ran as root
            (2) 'mv' always seems to fail at
            '
            utimensat(4, NULL, [

            {1530778053, 0}

            ,

            {1530777797, 0}

            ], 0) = 0
            fchown(4, 10001, 10060

            '
            (3) I went to the MDS and give you the logs below. I started the 'mv' after 4:41 so the errors you see should begin at "LustreError: 181962:0:(lod_dev.c:1414:lod_sync()) lustre-MDT0000-mdtlov: can't sync ost 12: -107
            [Fri Jul 6 16:51:04 2018] LustreError: 181962:0:(lod_dev.c:1414:lod_sync()) Skipped 8274044 previous similar messages
            [Fri Jul 6 16:51:04 2018] Lustre: lustre-MDT0000: Client 66471b3c-6a3e-724d-5030-ee8252fcfcd2 (at 172.16.230.87@o2ib) reconnecting
            [Fri Jul 6 16:51:04 2018] Lustre: Skipped 5378430 previous similar messages
            [Fri Jul 6 16:51:04 2018] Lustre: lustre-MDT0000: Connection restored to 8e150630-2b04-00b6-c100-b58229b19cac (at 172.16.230.87@o2ib)
            [Fri Jul 6 16:51:04 2018] Lustre: Skipped 5377677 previous similar messages
            "
            (4) The id of the user is
            id smichnow
            uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10150(systems),10052(monarch),10054(pMona0002),10057(pMona0005),10060(pMOSP),10077(pMona0006),10086(pMeRC),10130(gurobi),10151(gaussianmonash),10167(M3EarlyAdopters),10184(mc01),10206(wb82),10218(nq46),10290(avizo),10295(comsol-civil),10399(cplex),10567(gauss),10610(icm)

            (5) The MDS Server is
            [root@rclmddc1r14-02-e1 ~]# uname -a
            Linux rclmddc1r14-02-e1.erc.monash.edu.au 3.10.0-693.17.1.el7_lustre.x86_64 #1 SMP Mon Feb 26 13:33:51 AEDT 2018 x86_64 x86_64 x86_64 GNU/Linux
            [root@rclmddc1r14-02-e1 ~]# cat /etc/redhat-release
            Red Hat Enterprise Linux Server release 7.4 (Maipo)

            (6) THe client is
            uname -a
            Linux monarch-login2 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
            uname -a
            Linux monarch-login2 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

            (7) I removed the sticky bit
            ls -ld /mnt/lustre/projects/pMOSP/simon/
            drwxrwsr-x 8 smichnow pMOSP 4096 Jul 6 16:47 /mnt/lustre/projects/pMOSP/simon/
            [root@monarch-login2 etc]# chmod g-s /mnt/lustre/projects/pMOSP/simon/
            [root@monarch-login2 etc]# ls -ld /mnt/lustre/projects/pMOSP/simon/
            drwxrwxr-x 8 smichnow pMOSP 4096 Jul 6 16:47 /mnt/lustre/projects/pMOSP/simon/
            but the 'mv' still broke when run as user smichnow
            thanks
            Simon

            monash-hpc Monash HPC added a comment - John/Nathaniels. (1) 'mv' works when ran as root (2) 'mv' always seems to fail at ' utimensat(4, NULL, [ {1530778053, 0} , {1530777797, 0} ], 0) = 0 fchown(4, 10001, 10060 ' (3) I went to the MDS and give you the logs below. I started the 'mv' after 4:41 so the errors you see should begin at "LustreError: 181962:0:(lod_dev.c:1414:lod_sync()) lustre-MDT0000-mdtlov: can't sync ost 12: -107 [Fri Jul 6 16:51:04 2018] LustreError: 181962:0:(lod_dev.c:1414:lod_sync()) Skipped 8274044 previous similar messages [Fri Jul 6 16:51:04 2018] Lustre: lustre-MDT0000: Client 66471b3c-6a3e-724d-5030-ee8252fcfcd2 (at 172.16.230.87@o2ib) reconnecting [Fri Jul 6 16:51:04 2018] Lustre: Skipped 5378430 previous similar messages [Fri Jul 6 16:51:04 2018] Lustre: lustre-MDT0000: Connection restored to 8e150630-2b04-00b6-c100-b58229b19cac (at 172.16.230.87@o2ib) [Fri Jul 6 16:51:04 2018] Lustre: Skipped 5377677 previous similar messages " (4) The id of the user is id smichnow uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10150(systems),10052(monarch),10054(pMona0002),10057(pMona0005),10060(pMOSP),10077(pMona0006),10086(pMeRC),10130(gurobi),10151(gaussianmonash),10167(M3EarlyAdopters),10184(mc01),10206(wb82),10218(nq46),10290(avizo),10295(comsol-civil),10399(cplex),10567(gauss),10610(icm) (5) The MDS Server is [root@rclmddc1r14-02-e1 ~] # uname -a Linux rclmddc1r14-02-e1.erc.monash.edu.au 3.10.0-693.17.1.el7_lustre.x86_64 #1 SMP Mon Feb 26 13:33:51 AEDT 2018 x86_64 x86_64 x86_64 GNU/Linux [root@rclmddc1r14-02-e1 ~] # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.4 (Maipo) (6) THe client is uname -a Linux monarch-login2 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux uname -a Linux monarch-login2 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux (7) I removed the sticky bit ls -ld /mnt/lustre/projects/pMOSP/simon/ drwxrwsr-x 8 smichnow pMOSP 4096 Jul 6 16:47 /mnt/lustre/projects/pMOSP/simon/ [root@monarch-login2 etc] # chmod g-s /mnt/lustre/projects/pMOSP/simon/ [root@monarch-login2 etc] # ls -ld /mnt/lustre/projects/pMOSP/simon/ drwxrwxr-x 8 smichnow pMOSP 4096 Jul 6 16:47 /mnt/lustre/projects/pMOSP/simon/ but the 'mv' still broke when run as user smichnow thanks Simon

            People

              jhammond John Hammond
              monash-hpc Monash HPC
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: