Details

    • Bug
    • Resolution: Not a Bug
    • Minor
    • None
    • Lustre 2.12.2
    • None
    • RHEL7.7 as well as other minor versions of RHEL7 on x86_64.
    • 3
    • 9223372036854775807

    Description

      We have been seeing getcwd() return ENOENT on directories that are, in
      fact, always there. We can reliably reproduce this problem with the
      attached test-getcwd.c code on Lustre Server 2.12.2 and Lustre Client
      2.12.3 on RHEL7.7 as well as many other Lustre version and RHEL7
      version combinations.

      We see reports in LU-9735 about RHEL7 clients getting an ENOENT return
      from getcwd(), but I don't understand if a solution is in the works or
      not. We are also not sure if this is a Lustre problem, an RHEL kernel
      problem, or both.

      The LD_PRELOAD workaround from LU-9735 is working for us, but I am
      wondering if there is a proper solution pending. Is there anything we
      can do to help?

      Attachments

        Issue Links

          Activity

            [LU-12997] getcwd() returns ENOENT on RHEL7
            pjones Peter Jones made changes -
            Resolution New: Not a Bug [ 6 ]
            Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
            pjones Peter Jones made changes -
            Fix Version/s Original: Lustre 2.14.0 [ 14490 ]

            I have since tested a later kernel, 3.10.0-1127.13.1.el7.x86_64, and it also works.  So I think the solution is to upgrade to at least kernel 3.10.0-1127.el7.x86_64.

             

            This ticket can be closed.  Thanks for your help.

             

            krowe K. Scott Rowe added a comment - I have since tested a later kernel, 3.10.0-1127.13.1.el7.x86_64, and it also works.  So I think the solution is to upgrade to at least kernel 3.10.0-1127.el7.x86_64.   This ticket can be closed.  Thanks for your help.  

            The kernel was just upgraded on my test RHEL-7.8 machine.  It is now running (3.10.0-1127.8.2.el7.x86_64) and I no longer get getcwd() failures

            $ ./test-getcwd /lustre/aoc/sciops/krowe/tmp
            getcwd succeeded

            I don't understand why this failed with kernel 3.10.0-1127.el7.x86_64 and works now but assuming it continues to work after more kernel updates I would say this problem may be fixed.  Again, if you have the ability to check this yourself, please do.  My environment may be customized in strange ways.

            krowe K. Scott Rowe added a comment - The kernel was just upgraded on my test RHEL-7.8 machine.  It is now running (3.10.0-1127.8.2.el7.x86_64) and I no longer get getcwd() failures $ ./test-getcwd /lustre/aoc/sciops/krowe/tmp getcwd succeeded I don't understand why this failed with kernel 3.10.0-1127.el7.x86_64 and works now but assuming it continues to work after more kernel updates I would say this problem may be fixed.  Again, if you have the ability to check this yourself, please do.  My environment may be customized in strange ways.
            pjones Peter Jones made changes -
            Assignee Original: Peter Jones [ pjones ] New: WC Triage [ wc-triage ]

            Do you have the ability to test this on an RHEL7.8 host?  It would be good to have a second data point.  I suppose it is possible I am seeing this issue with our RHEL7.8 host for some other reason that I can't think of.

            krowe K. Scott Rowe added a comment - Do you have the ability to test this on an RHEL7.8 host?  It would be good to have a second data point.  I suppose it is possible I am seeing this issue with our RHEL7.8 host for some other reason that I can't think of.

            Peter can you take over this issue since you seem to have better relations with RedHat to resolve this.

            simmonsja James A Simmons added a comment - Peter can you take over this issue since you seem to have better relations with RedHat to resolve this.
            simmonsja James A Simmons made changes -
            Assignee Original: James A Simmons [ simmonsja ] New: Peter Jones [ pjones ]
            simmonsja James A Simmons made changes -
            Resolution Original: Fixed [ 1 ]
            Status Original: Resolved [ 5 ] New: Reopened [ 4 ]
            simmonsja James A Simmons added a comment - - edited

            Sigh. RedHat claimed this was fixed. Its going to take some push to get them to resolve this. I don't have the power to resolve this. Some one with greater influence with RedHat will have to discuss a fix.

            simmonsja James A Simmons added a comment - - edited Sigh. RedHat claimed this was fixed. Its going to take some push to get them to resolve this. I don't have the power to resolve this. Some one with greater influence with RedHat will have to discuss a fix.

            People

              wc-triage WC Triage
              krowe K. Scott Rowe
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: