Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9735

Sles12Sp2 and 2.9 getcwd() sometimes fails

    XMLWordPrintable

Details

    • 2
    • 9223372036854775807

    Description

      This is a duplicate of LU-9208. Opening this case for tracking for nasa. We start to see this once we updated the clients to Sles12SP2 and lustre2.9

      Using the test code provide LU-9208 (miranda) I was able to reproduce the bug on a single node.

       

      Iteration =    868, Run Time =     0.9614 sec., Transfer Rate =   120.7790 10e+06 Bytes/sec/proc
      Iteration =    869, Run Time =     1.5308 sec., Transfer Rate =    75.8561 10e+06 Bytes/sec/proc
      forrtl: severe (121): Cannot access current working directory for unit 10012, file "Unknown"
      Image              PC                Routine            Line        Source             
      miranda            0000000000409F29  Unknown               Unknown  Unknown
      miranda            00000000004169D2  Unknown               Unknown  Unknown
      miranda            0000000000404045  Unknown               Unknown  Unknown
      miranda            0000000000402FDE  Unknown               Unknown  Unknown
      libc.so.6          00002AAAAB5B96E5  Unknown               Unknown  Unknown
      miranda            0000000000402EE9  Unknown               Unknown  Unknown
      MPT ERROR: MPI_COMM_WORLD rank 12 has terminated without calling MPI_Finalize()
      	aborting job
      
      

       I was able to capture some debug logs I have attached to the case. I was unable to reproduce it using "+trace". But will continue to try.

      Attachments

        1. getcwdHack.c
          6 kB
        2. miranda.debug.1499341246.gz
          84.13 MB
        3. miranda.dis
          9.19 MB
        4. r481i7n17.dump1.log.gz
          13.86 MB
        5. unoptimize-atomic_open-of-negative-dentry.patch
          2 kB

        Issue Links

          Activity

            People

              simmonsja James A Simmons
              mhanafi Mahmoud Hanafi
              Votes:
              1 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: