Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11119

A 'mv' of a file from a local file system to a lustre file system hangs

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.10.3
    • None
    • 3
    • 9223372036854775807

    Description

      I have found a weird problem on our Lustre system when we try to move a file from a different file system (here /tmp) onto the lustre file server. This problem only affects a mv. A cp works ok. The problem is that the 'mv' hangs forever, and the process can not be a killed WHen I did a strace on the mv, the program hangs on fchown.

      strace mv /tmp/simon.small.txt  /mnt/lustre/projects/pMOSP/simon
      <stuff>
      write(4, "1\n", 2)                      = 2
      read(3, "", 4194304)                    = 0
      utimensat(4, NULL, [{1530777797, 478293939}, {1530777797, 478293939}], 0) = 0
      fchown(4, 10001, 10025 
      
      If you look at demsg, you see these multiple errors start appearing at the same time:
      The errors don't stop as we can't kill the 'mv' process
      
      Thu Jul  5 18:08:43 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
      [Thu Jul  5 18:08:43 2018] Lustre: Skipped 140105 previous similar messages
      [Thu Jul  5 18:09:47 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection to lustre-MDT0000 (at 172.16.231.50@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      [Thu Jul  5 18:09:47 2018] Lustre: Skipped 285517 previous similar messages
      [Thu Jul  5 18:09:47 2018] Lustre: lustre-MDT0000-mdc-ffff88351771f000: Connection restored to 172.16.231.50@o2ib (at 172.16.231.50@o2ib)
      [Thu Jul  5 18:09:47 2018] Lustre: Skipped 285516 previous similar messages
      

      We have the following ofed drivers, which I believe have a known problem with connecting to Lustre servers

      ofed_info | head -1
      MLNX_OFED_LINUX-4.2-1.2.0.0 (OFED-4.2-1.2.0):
      

      Attachments

        1. chgrp-dk-wed18july.out
          3.44 MB
        2. chgrp-stack1-wed18July.out
          15 kB
        3. client-chgrp-dk.4aug.out
          7.37 MB
        4. client-chgrp-dk-2Aug.out
          15.78 MB
        5. client-chgrp-stack1.4aug.out
          15 kB
        6. dmesg.MDS.4.47.6july.txt
          1.10 MB
        7. dmesg.txt
          6 kB
        8. l_getidentity
          234 kB
        9. mdt-chgrp-dk.4Aug.out
          22.50 MB
        10. mdt-chgrp-dk-2Aug.out
          20.26 MB
        11. mdt-chgrp-stack1.4Aug.out
          24 kB
        12. output.Tue.17.july.18.txt
          24 kB
        13. stack1
          1 kB
        14. strace.output.txt
          14 kB

        Issue Links

          Activity

            People

              jhammond John Hammond
              monash-hpc Monash HPC
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: