Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5730

intermittent I/O errors for some directories

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.5.2
    • None
    • Lustre 2.5.2 on RHEL6 servers and clients, NFS exported, ACLs
    • 3
    • 16090

    Description

      Our users have reported an issue where the suddenly have problems editing a file in a directory, they also got I/O errors for example when trying to get the ACLs for that directory. In at least one instance the problem resolved itself overnight after we decided to investigate in more detail later, in another case today the problem went away when we renamed the problematic directory.

      In the most recent instance today we had some time where we were able to attempt to understand the issue and this is what we found so far: While the problem persists, some clients are seeing I/O error on calling getfacl, other clients don't have any problems running the same commands and returned the expected results. Some machines access this directory over NFS, exported from one of our clients which was showing problems in this instance, they had the same issues. Attempting to edit a file in the problematic directory with vim came up with the message that the .swp file already exists even for new files. Creating new files in the directory, for example with touch, worked with no problem.

      There are no error messages recorded by syslog on any of the machines involved.

      We've mostly run out of ideas what to look for next to resolve this if it happens again.

      Attachments

        1. mds_server_debug.xz
          0.2 kB
          Frederik Ferner
        2. nfs_server_debug.xz
          12 kB
          Frederik Ferner

        Issue Links

          Activity

            [LU-5730] intermittent I/O errors for some directories
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: In Progress [ 3 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Great - thanks for confirming

            pjones Peter Jones added a comment - Great - thanks for confirming

            A quick update. We have not had any reports from our users that they were still seeing this, not with the patched version and not with 2.7 which we are running now. So I guess this can be closed.

            ferner Frederik Ferner (Inactive) added a comment - A quick update. We have not had any reports from our users that they were still seeing this, not with the patched version and not with 2.7 which we are running now. So I guess this can be closed.
            pjones Peter Jones made changes -
            Link New: This issue is related to LU-6528 [ LU-6528 ]

            Peter,

            thanks for checking. Judging by the time it previously took our users previously to experience the issue, I might prefer to leave the ticket open a while longer. On the other hand, we can always re-open it if we see the same problem again.

            However in the mean time, we are now seeing a (potentially) different problem with NFS, this time we don't get stale NFS file handle errors, we get permission denied instead when trying to create a file in a newly created directory. This is intermittent as well but reproducible. I'm currently unsure if I should continue using this ticket or if I should open a new ticket. I'm leaning towards opening a new ticket to avoid confusion. (I'm also currently still gathering information...)

            Cheers,
            Frederik

            ferner Frederik Ferner (Inactive) added a comment - Peter, thanks for checking. Judging by the time it previously took our users previously to experience the issue, I might prefer to leave the ticket open a while longer. On the other hand, we can always re-open it if we see the same problem again. However in the mean time, we are now seeing a (potentially) different problem with NFS, this time we don't get stale NFS file handle errors, we get permission denied instead when trying to create a file in a newly created directory. This is intermittent as well but reproducible. I'm currently unsure if I should continue using this ticket or if I should open a new ticket. I'm leaning towards opening a new ticket to avoid confusion. (I'm also currently still gathering information...) Cheers, Frederik
            pjones Peter Jones added a comment -

            Hi Frederik

            I'm just checking in to see whether you are now comfortable to consider this issue resolved by the patch or whether you want to see a longer stretch without a reoccurrence.

            Peter

            pjones Peter Jones added a comment - Hi Frederik I'm just checking in to see whether you are now comfortable to consider this issue resolved by the patch or whether you want to see a longer stretch without a reoccurrence. Peter
            pjones Peter Jones made changes -
            Link New: This issue is related to LU-3727 [ LU-3727 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to SGI-110 [ SGI-110 ]
            pjones Peter Jones added a comment -

            Thanks for the update Frederik. Keep us posted.

            pjones Peter Jones added a comment - Thanks for the update Frederik. Keep us posted.

            Peter,

            thanks for the confirmation. We have today upgraded our servers to 2.5.3 plus this suggested patch. Time will tell, we've so far only been able to reproduce this after the file system has been up for a few weeks.

            Frederik

            ferner Frederik Ferner (Inactive) added a comment - Peter, thanks for the confirmation. We have today upgraded our servers to 2.5.3 plus this suggested patch. Time will tell, we've so far only been able to reproduce this after the file system has been up for a few weeks. Frederik

            People

              laisiyao Lai Siyao
              ferner Frederik Ferner (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: