Details

    • Bug
    • Resolution: Not a Bug
    • Minor
    • None
    • Lustre 2.12.2
    • None
    • RHEL7.7 as well as other minor versions of RHEL7 on x86_64.
    • 3
    • 9223372036854775807

    Description

      We have been seeing getcwd() return ENOENT on directories that are, in
      fact, always there. We can reliably reproduce this problem with the
      attached test-getcwd.c code on Lustre Server 2.12.2 and Lustre Client
      2.12.3 on RHEL7.7 as well as many other Lustre version and RHEL7
      version combinations.

      We see reports in LU-9735 about RHEL7 clients getting an ENOENT return
      from getcwd(), but I don't understand if a solution is in the works or
      not. We are also not sure if this is a Lustre problem, an RHEL kernel
      problem, or both.

      The LD_PRELOAD workaround from LU-9735 is working for us, but I am
      wondering if there is a proper solution pending. Is there anything we
      can do to help?

      Attachments

        Issue Links

          Activity

            [LU-12997] getcwd() returns ENOENT on RHEL7

            I have since tested a later kernel, 3.10.0-1127.13.1.el7.x86_64, and it also works.  So I think the solution is to upgrade to at least kernel 3.10.0-1127.el7.x86_64.

             

            This ticket can be closed.  Thanks for your help.

             

            krowe K. Scott Rowe added a comment - I have since tested a later kernel, 3.10.0-1127.13.1.el7.x86_64, and it also works.  So I think the solution is to upgrade to at least kernel 3.10.0-1127.el7.x86_64.   This ticket can be closed.  Thanks for your help.  

            The kernel was just upgraded on my test RHEL-7.8 machine.  It is now running (3.10.0-1127.8.2.el7.x86_64) and I no longer get getcwd() failures

            $ ./test-getcwd /lustre/aoc/sciops/krowe/tmp
            getcwd succeeded

            I don't understand why this failed with kernel 3.10.0-1127.el7.x86_64 and works now but assuming it continues to work after more kernel updates I would say this problem may be fixed.  Again, if you have the ability to check this yourself, please do.  My environment may be customized in strange ways.

            krowe K. Scott Rowe added a comment - The kernel was just upgraded on my test RHEL-7.8 machine.  It is now running (3.10.0-1127.8.2.el7.x86_64) and I no longer get getcwd() failures $ ./test-getcwd /lustre/aoc/sciops/krowe/tmp getcwd succeeded I don't understand why this failed with kernel 3.10.0-1127.el7.x86_64 and works now but assuming it continues to work after more kernel updates I would say this problem may be fixed.  Again, if you have the ability to check this yourself, please do.  My environment may be customized in strange ways.

            Do you have the ability to test this on an RHEL7.8 host?  It would be good to have a second data point.  I suppose it is possible I am seeing this issue with our RHEL7.8 host for some other reason that I can't think of.

            krowe K. Scott Rowe added a comment - Do you have the ability to test this on an RHEL7.8 host?  It would be good to have a second data point.  I suppose it is possible I am seeing this issue with our RHEL7.8 host for some other reason that I can't think of.

            Peter can you take over this issue since you seem to have better relations with RedHat to resolve this.

            simmonsja James A Simmons added a comment - Peter can you take over this issue since you seem to have better relations with RedHat to resolve this.
            simmonsja James A Simmons added a comment - - edited

            Sigh. RedHat claimed this was fixed. Its going to take some push to get them to resolve this. I don't have the power to resolve this. Some one with greater influence with RedHat will have to discuss a fix.

            simmonsja James A Simmons added a comment - - edited Sigh. RedHat claimed this was fixed. Its going to take some push to get them to resolve this. I don't have the power to resolve this. Some one with greater influence with RedHat will have to discuss a fix.

            Perhaps I wasn't clear enough.

            This is still a problem.

            I can still reproduce this error with RHEL7.8 using the test-getcwd.c program above.

             

            krowe K. Scott Rowe added a comment - Perhaps I wasn't clear enough. This is still a problem. I can still reproduce this error with RHEL7.8 using the test-getcwd.c program above.  

            RHEL7.8 contains a fix so this can be closed. If people encounter this issue please move to RHEL7.8

            simmonsja James A Simmons added a comment - RHEL7.8 contains a fix so this can be closed. If people encounter this issue please move to RHEL7.8
            krowe K. Scott Rowe added a comment - - edited

            W installed a machine with RHEL-7.8 using kernel 3.10.0-1127.el7.x86_64 and while I see a bug fix for a getcwd problem

             

             

            $ rpm -qi --changelog kernel|grep getcwd
            - [fs] vfs: close race between getcwd() and d_move() (Miklos Szeredi) [1631631]

             

             

            1631631 is a different bug id than 1811124 that James A Simmons reported.  And, I can still reproduce the problem on our Lustre-2.10.8 filesystem using the 2.12.4 client on RHEL-7.8.

             

            $ ./test-getcwd /lustre/aoc/sciops/krowe/tmp
            test-getcwd: test-getcwd.c:44: main: Assertion `rc == 0' failed.
            Aborted
            

             

             

            krowe K. Scott Rowe added a comment - - edited W installed a machine with RHEL-7.8 using kernel 3.10.0-1127.el7.x86_64 and while I see a bug fix for a getcwd problem     $ rpm -qi --changelog kernel|grep getcwd - [fs] vfs: close race between getcwd() and d_move() (Miklos Szeredi) [1631631]     1631631 is a different bug id than 1811124 that James A Simmons reported.  And, I can still reproduce the problem on our Lustre-2.10.8 filesystem using the 2.12.4 client on RHEL-7.8.   $ ./test-getcwd /lustre/aoc/sciops/krowe/tmp test-getcwd: test-getcwd.c:44: main: Assertion `rc == 0' failed. Aborted    

            People

              wc-triage WC Triage
              krowe K. Scott Rowe
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: