Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11119

A 'mv' of a file from a local file system to a lustre file system hangs

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.10.3
    • None
    • 3
    • 9223372036854775807

    Description

      I have found a weird problem on our Lustre system when we try to move a file from a different file system (here /tmp) onto the lustre file server. This problem only affects a mv. A cp works ok. The problem is that the 'mv' hangs forever, and the process can not be a killed WHen I did a strace on the mv, the program hangs on fchown.

      strace mv /tmp/simon.small.txt  /mnt/lustre/projects/pMOSP/simon
      <stuff>
      write(4, "1\n", 2)                      = 2
      read(3, "", 4194304)                    = 0
      utimensat(4, NULL, [{1530777797, 478293939}, {1530777797, 478293939}], 0) = 0
      fchown(4, 10001, 10025 
      
      If you look at demsg, you see these multiple errors start appearing at the same time:
      The errors don't stop as we can't kill the 'mv' process
      
      Thu Jul  5 18:08:43 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
      [Thu Jul  5 18:08:43 2018] Lustre: Skipped 140105 previous similar messages
      [Thu Jul  5 18:09:47 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection to lustre-MDT0000 (at 172.16.231.50@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      [Thu Jul  5 18:09:47 2018] Lustre: Skipped 285517 previous similar messages
      [Thu Jul  5 18:09:47 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
      [Thu Jul  5 18:09:47 2018] Lustre: Skipped 285516 previous similar messages
      

      We have the following ofed drivers, which I believe have a known problem with connecting to Lustre servers

      ofed_info | head -1
      MLNX_OFED_LINUX-4.2-1.2.0.0 (OFED-4.2-1.2.0):
      

      Attachments

        1. chgrp-dk-wed18july.out
          3.44 MB
        2. chgrp-stack1-wed18July.out
          15 kB
        3. client-chgrp-dk.4aug.out
          7.37 MB
        4. client-chgrp-dk-2Aug.out
          15.78 MB
        5. client-chgrp-stack1.4aug.out
          15 kB
        6. dmesg.MDS.4.47.6july.txt
          1.10 MB
        7. dmesg.txt
          6 kB
        8. l_getidentity
          234 kB
        9. mdt-chgrp-dk.4Aug.out
          22.50 MB
        10. mdt-chgrp-dk-2Aug.out
          20.26 MB
        11. mdt-chgrp-stack1.4Aug.out
          24 kB
        12. output.Tue.17.july.18.txt
          24 kB
        13. stack1
          1 kB
        14. strace.output.txt
          14 kB

        Issue Links

          Activity

            [LU-11119] A 'mv' of a file from a local file system to a lustre file system hangs
            jhammond John Hammond added a comment - - edited

            Hi Simon,

            Is OST000c offline or just deactivated to prevent new files from being created on it?

            If it's only to prevent new files from being created then you should use the max_create_count parameter. See http://doc.lustre.org/lustre_manual.xhtml#section_remove_ost.

            jhammond John Hammond added a comment - - edited Hi Simon, Is OST000c offline or just deactivated to prevent new files from being created on it? If it's only to prevent new files from being created then you should use the max_create_count parameter. See http://doc.lustre.org/lustre_manual.xhtml#section_remove_ost .
            monash-hpc Monash HPC added a comment -

            Dear John,
            we have been updating our mofed drivers and lustre client versions to try and resolve both bugs. So some clients are
            lctl lustre_build_version
            Lustre version: 2.10.3

            and some
            lctl lustre_build_version
            Lustre version: 2.10.4

            Both versions show this problem.

            As for the OST000c, this is an inactive OST
            lfs osts
            OBDS:
            0: lustre-OST0000_UUID ACTIVE
            1: lustre-OST0001_UUID ACTIVE
            2: lustre-OST0002_UUID ACTIVE
            3: lustre-OST0003_UUID ACTIVE
            4: lustre-OST0004_UUID ACTIVE
            5: lustre-OST0005_UUID ACTIVE
            6: lustre-OST0006_UUID ACTIVE
            7: lustre-OST0007_UUID ACTIVE
            8: lustre-OST0008_UUID ACTIVE
            9: lustre-OST0009_UUID ACTIVE
            10: lustre-OST000a_UUID ACTIVE
            11: lustre-OST000b_UUID ACTIVE
            12: lustre-OST000c_UUID INACTIVE
            13: lustre-OST000d_UUID ACTIVE
            14: lustre-OST000e_UUID ACTIVE

            But the file belongs on a different OST ( I ran some code I found on the internet to find this)
            /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3: ['lustre-OST0009']

            The MDT Lustre version is
            [root@rclmddc1r14-02-e1 ~]# lctl lustre_build_version
            Lustre version: 2.10.58_46_ge528677

            regards
            Simon

            monash-hpc Monash HPC added a comment - Dear John, we have been updating our mofed drivers and lustre client versions to try and resolve both bugs. So some clients are lctl lustre_build_version Lustre version: 2.10.3 and some lctl lustre_build_version Lustre version: 2.10.4 Both versions show this problem. As for the OST000c, this is an inactive OST lfs osts OBDS: 0: lustre-OST0000_UUID ACTIVE 1: lustre-OST0001_UUID ACTIVE 2: lustre-OST0002_UUID ACTIVE 3: lustre-OST0003_UUID ACTIVE 4: lustre-OST0004_UUID ACTIVE 5: lustre-OST0005_UUID ACTIVE 6: lustre-OST0006_UUID ACTIVE 7: lustre-OST0007_UUID ACTIVE 8: lustre-OST0008_UUID ACTIVE 9: lustre-OST0009_UUID ACTIVE 10: lustre-OST000a_UUID ACTIVE 11: lustre-OST000b_UUID ACTIVE 12: lustre-OST000c_UUID INACTIVE 13: lustre-OST000d_UUID ACTIVE 14: lustre-OST000e_UUID ACTIVE But the file belongs on a different OST ( I ran some code I found on the internet to find this) /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3: ['lustre-OST0009'] The MDT Lustre version is [root@rclmddc1r14-02-e1 ~] # lctl lustre_build_version Lustre version: 2.10.58_46_ge528677 regards Simon
            jhammond John Hammond added a comment -

            OK this is better. The chgrp is failing because the MDT is not connected to OST000c. What is the status of that OST? It appears that the client and server are not handling this condition correctly.

            The MDT logs you provided are not from Lustre 2.10.3. What version of Lustre is the MDT running?

            The failed assertions are due to LU-8573 and possibly OFED issues.

            jhammond John Hammond added a comment - OK this is better. The chgrp is failing because the MDT is not connected to OST000c. What is the status of that OST? It appears that the client and server are not handling this condition correctly. The MDT logs you provided are not from Lustre 2.10.3. What version of Lustre is the MDT running? The failed assertions are due to LU-8573 and possibly OFED issues.
            monash-hpc Monash HPC added a comment -

            John,
            Firstly I'd like to note that when I tried running the chgrp command, the OS became unstable and crashed on me. This has happened twice on me:

            *Message from syslogd@monarch-login2 at Aug 3 17:06:27 ...
            kernel:LustreError: 12556:0:(niobuf.c:330:ptlrpc_register_bulk()) ASSERTION( desc->bd_md_count == 0 ) failed:

            Message from syslogd@monarch-login2 at Aug 3 17:06:27 ...
            kernel:LustreError: 12556:0:(niobuf.c:330:ptlrpc_register_bulk()) LBUG*

            Would you be interested in any of the contents of the /var/crash directories, i.e either the large vmcore or the vmcore-dmesg files? This may/may not be related to another bug we have in lustre which crashes our machines regularly. I am led to believe that bug is caused by an issue with the current version of mofed drivers.

            I ran the jobs an have uploaded the files for you, with the Aug4 postfix on them.
            regards Simon

            monash-hpc Monash HPC added a comment - John, Firstly I'd like to note that when I tried running the chgrp command, the OS became unstable and crashed on me. This has happened twice on me: *Message from syslogd@monarch-login2 at Aug 3 17:06:27 ... kernel:LustreError: 12556:0:(niobuf.c:330:ptlrpc_register_bulk()) ASSERTION( desc->bd_md_count == 0 ) failed: Message from syslogd@monarch-login2 at Aug 3 17:06:27 ... kernel:LustreError: 12556:0:(niobuf.c:330:ptlrpc_register_bulk()) LBUG* Would you be interested in any of the contents of the /var/crash directories, i.e either the large vmcore or the vmcore-dmesg files? This may/may not be related to another bug we have in lustre which crashes our machines regularly. I am led to believe that bug is caused by an issue with the current version of mofed drivers. I ran the jobs an have uploaded the files for you, with the Aug4 postfix on them. regards Simon
            jhammond John Hammond added a comment -

            A copy of stack1 is attached to this ticket. You used it previously.

            jhammond John Hammond added a comment - A copy of stack1 is attached to this ticket. You used it previously.
            monash-hpc Monash HPC added a comment -

            Dear John
            I uploaded 2 of the files. I am afraid I could not find the program strack1. Is this part of the lustre distribution or a tool I need to install?
            regards
            Simon

            monash-hpc Monash HPC added a comment - Dear John I uploaded 2 of the files. I am afraid I could not find the program strack1. Is this part of the lustre distribution or a tool I need to install? regards Simon
            jhammond John Hammond added a comment -

            Hi Simon,

            The attachments from the 17th show that the client is waiting for the MDT to respond. But the previously attached stack1 output from the MDT show that the server is idle. Can you do the following:

            # On the client and MDT:
            lctl set_param debug='+dlmtrace rpctrace vfstrace inode trace'
            lctl set_param debug_mb=64
            lctl clear
            # On the client:
            chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3 &
            # Wait for chgrp to hang in fchown()
            lctl dk > /tmp/client-chgrp-dk.out
            strack1 > /tmp/client-chgrp-stack1.out
            # On the MDT:
            lctl dk > /tmp/mdt-chgrp-dk.out
            strack1 > /tmp/mdt-chgrp-stack1.out
            
            jhammond John Hammond added a comment - Hi Simon, The attachments from the 17th show that the client is waiting for the MDT to respond. But the previously attached stack1 output from the MDT show that the server is idle. Can you do the following: # On the client and MDT: lctl set_param debug='+dlmtrace rpctrace vfstrace inode trace' lctl set_param debug_mb=64 lctl clear # On the client: chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3 & # Wait for chgrp to hang in fchown() lctl dk > /tmp/client-chgrp-dk.out strack1 > /tmp/client-chgrp-stack1.out # On the MDT: lctl dk > /tmp/mdt-chgrp-dk.out strack1 > /tmp/mdt-chgrp-stack1.out
            monash-hpc Monash HPC added a comment -

            Dear John,
            has there been any update on the problem?
            regards
            Simon

            monash-hpc Monash HPC added a comment - Dear John, has there been any update on the problem? regards Simon
            monash-hpc Monash HPC added a comment -

            Dear John
            I have uploaded the files to you. These were run on the client monarch-dtn
            regards
            Simon

            monash-hpc Monash HPC added a comment - Dear John I have uploaded the files to you. These were run on the client monarch-dtn regards Simon
            jhammond John Hammond added a comment -

            Hi Simon,

            OK this is interesting. The file you attached shows that the MDT was completely idle when stack1 was run. So I'm sorry to say we may have been going in the wrong direction? Could you run it on the client after chown/chgrp hangs? Could you also enable some more debugging and run the following on the client:

            lctl set_param debug='+dlmtrace rpctrace vfstrace inode trace'
            lctl set_param debug_mb=64
            lctl clear
            chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3 &
            ### Wait for chgrp to hang in fchown()
            lctl dk > /tmp/chgrp-dk.out
            strack1 > /tmp/chgrp-stack1.out
            

            And then attach /tmp/chgrp-dk.out and /tmp/chgrp-stack1.out?

            jhammond John Hammond added a comment - Hi Simon, OK this is interesting. The file you attached shows that the MDT was completely idle when stack1 was run. So I'm sorry to say we may have been going in the wrong direction? Could you run it on the client after chown/chgrp hangs? Could you also enable some more debugging and run the following on the client: lctl set_param debug='+dlmtrace rpctrace vfstrace inode trace' lctl set_param debug_mb=64 lctl clear chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3 & ### Wait for chgrp to hang in fchown() lctl dk > /tmp/chgrp-dk.out strack1 > /tmp/chgrp-stack1.out And then attach /tmp/chgrp-dk.out and /tmp/chgrp-stack1.out ?
            monash-hpc Monash HPC added a comment -

            Dear John,
            I ran the command and attached the file.
            The client executed the command
            strace chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3

            which goes up to..

            read(4, "\1\0\0\0\0\0\0\0L'\0\0\20\0\0\0pMOSP\0*\0smichnow"..., 136) = 136
            newfstatat(AT_FDCWD, "/mnt/lustre/projects/pMOSP/simon/simon.small.txt.3",

            {st_mode=S_IFREG|0600, st_size=2, ...}

            , AT_SYMLINK_NOFOLLOW) = 0
            fchownat(AT_FDCWD, "/mnt/lustre/projects/pMOSP/simon/simon.small.txt.3", -1, 10060, 0

            and hangs
            regards
            Simon

            monash-hpc Monash HPC added a comment - Dear John, I ran the command and attached the file. The client executed the command strace chgrp pMOSP /mnt/lustre/projects/pMOSP/simon/simon.small.txt.3 which goes up to.. read(4, "\1\0\0\0\0\0\0\0L'\0\0\20\0\0\0pMOSP\0*\0smichnow"..., 136) = 136 newfstatat(AT_FDCWD, "/mnt/lustre/projects/pMOSP/simon/simon.small.txt.3", {st_mode=S_IFREG|0600, st_size=2, ...} , AT_SYMLINK_NOFOLLOW) = 0 fchownat(AT_FDCWD, "/mnt/lustre/projects/pMOSP/simon/simon.small.txt.3", -1, 10060, 0 and hangs regards Simon

            People

              jhammond John Hammond
              monash-hpc Monash HPC
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: