Details

    • 3
    • 15252

    Description

      A directory tree was copied into a subdirectory of Lustre. At least one of the subdirectories in the newly created tree in Lustre does not appear in its parent's directory listing. The directory does exist, and it is possible to cd into that unlisted directory.

      The same behavior is exhibited from multiple clients, so it is not a problem of just one client's cache being corrupt.

      We are seeing this with the LLNL branch of Lustre 2.4.2 (github.com/chaos/lustre).

      We could not identify any console messages associated with the problem.

      The problem was seen on the secure network, so we cannot directly provide any logs.

      This problem has suspiciously similar symptoms in common with LU-5254.

      Attachments

        Issue Links

          Activity

            [LU-5475] readdir missing a directory

            This was fixed by LU-3573

            utopiabound Nathaniel Clark added a comment - This was fixed by LU-3573
            pjones Peter Jones added a comment -

            Just to capture that the latest version of LU-3573 has been landed to master for over a month and back ported to b2_5 also. I think that this should be safe to try out

            pjones Peter Jones added a comment - Just to capture that the latest version of LU-3573 has been landed to master for over a month and back ported to b2_5 also. I think that this should be safe to try out
            pjones Peter Jones added a comment -

            Heads up that LU-5924 seems related to the LU-3573 fix

            pjones Peter Jones added a comment - Heads up that LU-5924 seems related to the LU-3573 fix
            pjones Peter Jones added a comment -

            Chris

            The patch for LU-3573 has landed to master. Do you have any known cases of affected files to verify the fix?

            Peter

            pjones Peter Jones added a comment - Chris The patch for LU-3573 has landed to master. Do you have any known cases of affected files to verify the fix? Peter
            pjones Peter Jones added a comment -

            Just to be clear - updates are on LU-3573

            pjones Peter Jones added a comment - Just to be clear - updates are on LU-3573

            I have reproduced a very similar issue where readdir is missing a file. I can reproduce this on a ZFS backed MDT with high regularity.

            utopiabound Nathaniel Clark added a comment - I have reproduced a very similar issue where readdir is missing a file. I can reproduce this on a ZFS backed MDT with high regularity.

            Do you have any idea if the clients that don't see the directory all had accessed the parent dir before the problem ensued (meaning the all might have a stale cache problem) or if a client not previously exposed to this directory also does not see it (a server side of some problem)?

            It is not a stale cache problem. Nodes that have never seen the directory before (or have had their cache cleared) do not see the directory.

            Is there anything else known?

            Note really.

            If you create the same parent dir with the same names inside in a different lustre place - does the problem reappear by any chance?

            No, it is not that easily reproduced.

            morrone Christopher Morrone (Inactive) added a comment - Do you have any idea if the clients that don't see the directory all had accessed the parent dir before the problem ensued (meaning the all might have a stale cache problem) or if a client not previously exposed to this directory also does not see it (a server side of some problem)? It is not a stale cache problem. Nodes that have never seen the directory before (or have had their cache cleared) do not see the directory. Is there anything else known? Note really. If you create the same parent dir with the same names inside in a different lustre place - does the problem reappear by any chance? No, it is not that easily reproduced.
            green Oleg Drokin added a comment -

            Hm, not a lot of data in here, unfortunately.
            Do you have any idea if the clients that don't see the directory all had accessed the parent dir before the problem ensued (meaning the all might have a stale cache problem) or if a client not previously exposed to this directory also does not see it (a server side of some problem)?
            I think in 5254 new clients saw the directory so there it appeared like a cache problem on client side to me.

            It is odd that in both cases zfs is used server side. Lustre itself does not really cache anything server-side, so should there be some odd interaction within zfs (or between lustre and zfs of course) that would hide a directory from appearing in readdir output, this is exactly what would be seen.

            Is there anything else known? E.g. was there a lot of stuff in the parent dir (possibly an issue of skipping an entry between pages or something)? If you create the same parent dir with the same names inside in a different lustre place - does the problem reappear by any chance?

            green Oleg Drokin added a comment - Hm, not a lot of data in here, unfortunately. Do you have any idea if the clients that don't see the directory all had accessed the parent dir before the problem ensued (meaning the all might have a stale cache problem) or if a client not previously exposed to this directory also does not see it (a server side of some problem)? I think in 5254 new clients saw the directory so there it appeared like a cache problem on client side to me. It is odd that in both cases zfs is used server side. Lustre itself does not really cache anything server-side, so should there be some odd interaction within zfs (or between lustre and zfs of course) that would hide a directory from appearing in readdir output, this is exactly what would be seen. Is there anything else known? E.g. was there a lot of stuff in the parent dir (possibly an issue of skipping an entry between pages or something)? If you create the same parent dir with the same names inside in a different lustre place - does the problem reappear by any chance?

            People

              green Oleg Drokin
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: