Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1397

ENOENT on open()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • None
    • None
    • http://github.com/chaos/lustre, version 2.1.1-11chaos
    • 3
    • 6399

    Description

      I've been getting widespread reports that with 2.1 clients users are seeing random ENOENT errors on opens (and maybe stats?).

      Sometimes the file is written, closed, and reopened on the same client node. But the open will report that the file does not exist. A few minutes later the file is definitely there, so the problem is transitory.

      We have also had instances of this where the ENOENT occurs on a node other than where the file was created. One node will create, write, and close the file, and then another will attempt to open it only to get ENOENT.

      Here is an example failure from a simul test:

      09:04:12: Set iteration 4
      09:04:12: Running test #0(iter 0): open, shared mode.
      09:04:12:       Beginning setup
      09:04:12:       Finished setup          (0.001 sec)
      09:04:12:       Beginning test
      09:04:12: Process 177(hype338): FAILED in simul_open, open failed: No such file or directory
      

      There tend to not be any obvious messages in the console logs associated with these events.

      Attachments

        1. hype336-lu1397-1337981358181.llog.gz
          5.60 MB
        2. ior-lustre_debug.diff
          1 kB
        3. open.stp
          0.9 kB
        4. open-v2.stp
          2 kB

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: