Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11119

A 'mv' of a file from a local file system to a lustre file system hangs

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.10.3
    • None
    • 3
    • 9223372036854775807

    Description

      I have found a weird problem on our Lustre system when we try to move a file from a different file system (here /tmp) onto the lustre file server. This problem only affects a mv. A cp works ok. The problem is that the 'mv' hangs forever, and the process can not be a killed WHen I did a strace on the mv, the program hangs on fchown.

      strace mv /tmp/simon.small.txt  /mnt/lustre/projects/pMOSP/simon
      <stuff>
      write(4, "1\n", 2)                      = 2
      read(3, "", 4194304)                    = 0
      utimensat(4, NULL, [{1530777797, 478293939}, {1530777797, 478293939}], 0) = 0
      fchown(4, 10001, 10025 
      
      If you look at demsg, you see these multiple errors start appearing at the same time:
      The errors don't stop as we can't kill the 'mv' process
      
      Thu Jul  5 18:08:43 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
      [Thu Jul  5 18:08:43 2018] Lustre: Skipped 140105 previous similar messages
      [Thu Jul  5 18:09:47 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection to lustre-MDT0000 (at 172.16.231.50@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      [Thu Jul  5 18:09:47 2018] Lustre: Skipped 285517 previous similar messages
      [Thu Jul  5 18:09:47 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
      [Thu Jul  5 18:09:47 2018] Lustre: Skipped 285516 previous similar messages
      

      We have the following ofed drivers, which I believe have a known problem with connecting to Lustre servers

      ofed_info | head -1
      MLNX_OFED_LINUX-4.2-1.2.0.0 (OFED-4.2-1.2.0):
      

      Attachments

        1. chgrp-dk-wed18july.out
          3.44 MB
        2. chgrp-stack1-wed18July.out
          15 kB
        3. client-chgrp-dk.4aug.out
          7.37 MB
        4. client-chgrp-dk-2Aug.out
          15.78 MB
        5. client-chgrp-stack1.4aug.out
          15 kB
        6. dmesg.MDS.4.47.6july.txt
          1.10 MB
        7. dmesg.txt
          6 kB
        8. l_getidentity
          234 kB
        9. mdt-chgrp-dk.4Aug.out
          22.50 MB
        10. mdt-chgrp-dk-2Aug.out
          20.26 MB
        11. mdt-chgrp-stack1.4Aug.out
          24 kB
        12. output.Tue.17.july.18.txt
          24 kB
        13. stack1
          1 kB
        14. strace.output.txt
          14 kB

        Issue Links

          Activity

            [LU-11119] A 'mv' of a file from a local file system to a lustre file system hangs
            monash-hpc Monash HPC added a comment -

            Dear John,
            has there been any update on the problem?
            regards
            Simon

            monash-hpc Monash HPC added a comment - Dear John, has there been any update on the problem? regards Simon
            monash-hpc Monash HPC added a comment -

            Dear John
            I have uploaded the files to you. These were run on the client monarch-dtn
            regards
            Simon

            monash-hpc Monash HPC added a comment - Dear John I have uploaded the files to you. These were run on the client monarch-dtn regards Simon
            jhammond John Hammond added a comment -

            Hi Simon,

            OK this is interesting. The file you attached shows that the MDT was completely idle when stack1 was run. So I'm sorry to say we may have been going in the wrong direction? Could you run it on the client after chown/chgrp hangs? Could you also enable some more debugging and run the following on the client:

            lctl set_param debug='+dlmtrace rpctrace vfstrace inode trace'
            lctl set_param debug_mb=64
            lctl clear
            chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3 &
            ### Wait for chgrp to hang in fchown()
            lctl dk > /tmp/chgrp-dk.out
            strack1 > /tmp/chgrp-stack1.out
            

            And then attach /tmp/chgrp-dk.out and /tmp/chgrp-stack1.out?

            jhammond John Hammond added a comment - Hi Simon, OK this is interesting. The file you attached shows that the MDT was completely idle when stack1 was run. So I'm sorry to say we may have been going in the wrong direction? Could you run it on the client after chown/chgrp hangs? Could you also enable some more debugging and run the following on the client: lctl set_param debug='+dlmtrace rpctrace vfstrace inode trace' lctl set_param debug_mb=64 lctl clear chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3 & ### Wait for chgrp to hang in fchown() lctl dk > /tmp/chgrp-dk.out strack1 > /tmp/chgrp-stack1.out And then attach /tmp/chgrp-dk.out and /tmp/chgrp-stack1.out ?
            monash-hpc Monash HPC added a comment -

            Dear John,
            I ran the command and attached the file.
            The client executed the command
            strace chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3

            which goes up to..

            read(4, "\1\0\0\0\0\0\0\0L'\0\0\20\0\0\0pMOSP\0*\0smichnow"..., 136) = 136
            newfstatat(AT_FDCWD, "/mnt/lustre/projects/pMOSP/simon/simon.small.txt.3",

            {st_mode=S_IFREG|0600, st_size=2, ...}

            , AT_SYMLINK_NOFOLLOW) = 0
            fchownat(AT_FDCWD, "/mnt/lustre/projects/pMOSP/simon/simon.small.txt.3", -1, 10060, 0

            and hangs
            regards
            Simon

            monash-hpc Monash HPC added a comment - Dear John, I ran the command and attached the file. The client executed the command strace chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3 which goes up to.. read(4, "\1\0\0\0\0\0\0\0L'\0\0\20\0\0\0pMOSP\0*\0smichnow"..., 136) = 136 newfstatat(AT_FDCWD, "/mnt/lustre/projects/pMOSP/simon/simon.small.txt.3", {st_mode=S_IFREG|0600, st_size=2, ...} , AT_SYMLINK_NOFOLLOW) = 0 fchownat(AT_FDCWD, "/mnt/lustre/projects/pMOSP/simon/simon.small.txt.3", -1, 10060, 0 and hangs regards Simon
            jhammond John Hammond added a comment -

            Could you run your reproducer and wait until the client blocks in fchown(), then run the attached script stack1 on the MDT (to collect and consolidate the kernel stack traces) and attach the output here?

            jhammond John Hammond added a comment - Could you run your reproducer and wait until the client blocks in fchown(), then run the attached script stack1 on the MDT (to collect and consolidate the kernel stack traces) and attach the output here?
            monash-hpc Monash HPC added a comment -

            John
            the dmesg errors from the node that hangs upon a chgrp are
            [Mon Jul 16 17:14:54 2018] Lustre: lustre-MDT0000-mdc-ffff88351a566800: Connection to lustre-MDT0000 (at 172.16.231.50@o2ib) was lost; in progress operations using this service will wait for recovery to complete
            [Mon Jul 16 17:14:55 2018] Lustre: Skipped 1175070 previous similar messages
            [Mon Jul 16 17:14:55 2018] Lustre: lustre-MDT0000-mdc-ffff88351a566800: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
            [Mon Jul 16 17:14:55 2018] Lustre: Skipped 1175070 previous similar messages

            regards
            Simon

            monash-hpc Monash HPC added a comment - John the dmesg errors from the node that hangs upon a chgrp are [Mon Jul 16 17:14:54 2018] Lustre: lustre-MDT0000-mdc-ffff88351a566800: Connection to lustre-MDT0000 (at 172.16.231.50@o2ib) was lost; in progress operations using this service will wait for recovery to complete [Mon Jul 16 17:14:55 2018] Lustre: Skipped 1175070 previous similar messages [Mon Jul 16 17:14:55 2018] Lustre: lustre-MDT0000-mdc-ffff88351a566800: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib) [Mon Jul 16 17:14:55 2018] Lustre: Skipped 1175070 previous similar messages regards Simon
            monash-hpc Monash HPC added a comment -

            Dear John
            I enabled the logging, and reran the chgrp command that hung.
            I could not see any useful comments in the log file but I include it here. Logging started today so that part worked but nothing relating to the issue
            regards
            Simon

            monash-hpc Monash HPC added a comment - Dear John I enabled the logging, and reran the chgrp command that hung. I could not see any useful comments in the log file but I include it here. Logging started today so that part worked but nothing relating to the issue regards Simon
            jhammond John Hammond added a comment -

            Hi Simon,

            > Could you also tell me how to modify syslogger to collect l-getidentify errors

            Sure. Do you use rsyslog? If so then you can add a line of the following form to /etc/rsyslog.conf on the MDS:

            authpriv.warning /var/log/l_getidentity
            

            Then restart rsyslog, run your reproducer, and attach the contents of /var/log/l_getidentity. You should check that no site sensitive content ended up in that file before you attach it.

            jhammond John Hammond added a comment - Hi Simon, > Could you also tell me how to modify syslogger to collect l-getidentify errors Sure. Do you use rsyslog? If so then you can add a line of the following form to /etc/rsyslog.conf on the MDS: authpriv.warning /var/log/l_getidentity Then restart rsyslog, run your reproducer, and attach the contents of /var/log/l_getidentity . You should check that no site sensitive content ended up in that file before you attach it.
            monash-hpc Monash HPC added a comment -

            John
            I'm away for a few days so will run script next week. Could you also tell me how to modify syslogger to collect l-getidentify errors

            Thanks
            Simon

            monash-hpc Monash HPC added a comment - John I'm away for a few days so will run script next week. Could you also tell me how to modify syslogger to collect l-getidentify errors Thanks Simon
            jhammond John Hammond added a comment -

            Hi Simon,

            l_getidentity does generate some logging on error but it uses the authpriv facility with an error loevel of warning. If needed then could you reconfigure syslog on the MDT to temporarily capture any logging generated by l_getidentity when you run your reproducer and attach it here?

            Second, could you run your reproducer and wait until the client blocks in fchown(),then run the attached script stack1 on the MDT (to collect and consolidate the kernel stack traces) and attach the output here?

            jhammond John Hammond added a comment - Hi Simon, l_getidentity does generate some logging on error but it uses the authpriv facility with an error loevel of warning. If needed then could you reconfigure syslog on the MDT to temporarily capture any logging generated by l_getidentity when you run your reproducer and attach it here? Second, could you run your reproducer and wait until the client blocks in fchown() ,then run the attached script stack1 on the MDT (to collect and consolidate the kernel stack traces) and attach the output here?
            monash-hpc Monash HPC added a comment -

            Dear John,
            our identity on all nodes is managed by a LDAP server. On the client (that hangs) I see
            *[smichnow@monarch-login2 tmp]$ id smichnow
            uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10150(systems),10052(monarch),10054(pMona0002),10057(pMona0005),10060(pMOSP),10077(pMona0006),10086(pMeRC),10130(gurobi),10151(gaussianmonash),10167(M3EarlyAdopters),10184(mc01),10206(wb82),10218(nq46),10290(avizo),10295(comsol-civil),10399(cplex),10567(gauss),10610(icm)*

            and on the metadata server
            *[root@rclmddc1r14-02-e1 lustre-MDT0000]# id smichnow
            uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10610(icm),10167(M3EarlyAdopters),10151(gaussianmonash),10295(comsol-civil),10054(pMona0002),10057(pMona0005),10077(pMona0006),10052(monarch),10150(systems),10130(gurobi),10290(avizo),10399(cplex),10060(pMOSP),10086(pMeRC),10184(mc01),10206(wb82),10567(gauss),10218(nq46)*

            which shows the same number of groups but in a different order.

            *[root@rclmddc1r14-02-e1 lustre-MDT0000]# lctl get_param mdt.lustre-MDT0000.identity_upcall
            mdt.lustre-MDT0000.identity_upcall=/usr/sbin/l_getidentity*

            and

            *[root@rclmddc1r14-02-e1 lustre-MDT0000]# l_getidentity -d 10001
            uid=10001 gid=10025,10052,10054,10057,10060,10077,10086,10130,10150,10151,10167,10184,10206,10218,10290,10295,10399,10567,10610
            permissions:
            nid perm*

            regards
            Simon

            monash-hpc Monash HPC added a comment - Dear John, our identity on all nodes is managed by a LDAP server. On the client (that hangs) I see * [smichnow@monarch-login2 tmp] $ id smichnow uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10150(systems),10052(monarch),10054(pMona0002),10057(pMona0005),10060(pMOSP),10077(pMona0006),10086(pMeRC),10130(gurobi),10151(gaussianmonash),10167(M3EarlyAdopters),10184(mc01),10206(wb82),10218(nq46),10290(avizo),10295(comsol-civil),10399(cplex),10567(gauss),10610(icm)* and on the metadata server * [root@rclmddc1r14-02-e1 lustre-MDT0000] # id smichnow uid=10001(smichnow) gid=10025(monashuniversity) groups=10025(monashuniversity),10610(icm),10167(M3EarlyAdopters),10151(gaussianmonash),10295(comsol-civil),10054(pMona0002),10057(pMona0005),10077(pMona0006),10052(monarch),10150(systems),10130(gurobi),10290(avizo),10399(cplex),10060(pMOSP),10086(pMeRC),10184(mc01),10206(wb82),10567(gauss),10218(nq46)* which shows the same number of groups but in a different order. * [root@rclmddc1r14-02-e1 lustre-MDT0000] # lctl get_param mdt.lustre-MDT0000.identity_upcall mdt.lustre-MDT0000.identity_upcall=/usr/sbin/l_getidentity* and * [root@rclmddc1r14-02-e1 lustre-MDT0000] # l_getidentity -d 10001 uid=10001 gid=10025,10052,10054,10057,10060,10077,10086,10130,10150,10151,10167,10184,10206,10218,10290,10295,10399,10567,10610 permissions: nid perm* regards Simon

            People

              jhammond John Hammond
              monash-hpc Monash HPC
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: