Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • Lustre 2.9.0
    • Lustre 2.8.0
    • Lustre 2.8.0 (intel) client and server, EL6.7 clients and servers, kernel 2.6.32-573.12.1.el6.x86_64 client and server, redhat ofed client and server, mellanox fdr hca client and server. 7 combined OSS/OST
    • 2
    • 9223372036854775807

    Description

      The combined MDS/MGS server reports the following in /var/log/messages:

      Apr 11 09:33:39 mds1 kernel: LustreError: 43754:0:(mdt_handler.c:893:mdt_getattr_internal()) blizzard-MDT0000: getattr error for [0x2000403f7:0x1e0f4:0x0]: rc = -2

      The error implies that the file does not exist, and this also shows the same:

      sudo lfs fid2path /scratch 0x2000403f7:0x1e0f4:0x0
      fid2path: error on FID 0x2000403f7:0x1e0f4:0x0: No such file or directory

      Thanks,
      Chris

      Attachments

        Issue Links

          Activity

            [LU-8012] lustre 2.8.0 getattr error rc = -2

            Patch landed as v2_8_55_0-133-gc3e03f3 so it is included in 2.9.0.

            adilger Andreas Dilger added a comment - Patch landed as v2_8_55_0-133-gc3e03f3 so it is included in 2.9.0.

            This is a normal situation if there are multiple threads racing to unlink a single file. The patch http://review.whamcloud.com/18145 "LU-7712 mdd: migration is too noisy" turned off this error for the common -ENOENT case.

            adilger Andreas Dilger added a comment - This is a normal situation if there are multiple threads racing to unlink a single file. The patch http://review.whamcloud.com/18145 " LU-7712 mdd: migration is too noisy " turned off this error for the common -ENOENT case.

            Still seeing this in testing. We need to get this message resolved. Having a flood of messages on the console that sysadmins need to ignore leads to sysadmins that stop looking at the logs and we miss important things.

            morrone Christopher Morrone (Inactive) added a comment - Still seeing this in testing. We need to get this message resolved. Having a flood of messages on the console that sysadmins need to ignore leads to sysadmins that stop looking at the logs and we miss important things.
            ofaaland Olaf Faaland added a comment -

            Hi Alex,

            We are seeing this as well. I haven't yet identified the sequence of events. We see it with a set of test scripts that race to mkdir/rmdir/create/unlink/read/write within a common directory, in a randomized manner. I'll try narrow the set of operations down.

            Am I correct that the object existed at the very beginning of mdt_getattr (I see assert), but then after mdt_getattr->mdt_getattr_internal->mdt_attr_get_complex does not?

            Is this supposed to be protected entirely by an LDLM lock held by the client?

            thanks,
            Olaf

            ofaaland Olaf Faaland added a comment - Hi Alex, We are seeing this as well. I haven't yet identified the sequence of events. We see it with a set of test scripts that race to mkdir/rmdir/create/unlink/read/write within a common directory, in a randomized manner. I'll try narrow the set of operations down. Am I correct that the object existed at the very beginning of mdt_getattr (I see assert), but then after mdt_getattr->mdt_getattr_internal->mdt_attr_get_complex does not? Is this supposed to be protected entirely by an LDLM lock held by the client? thanks, Olaf

            Hi Alex,

            I haven't isolated which app/workoad is creating these messages. The workloads are very diverse, ranging from mpi to simple single threaded apps.

            cbc christopher coffey (Inactive) added a comment - Hi Alex, I haven't isolated which app/workoad is creating these messages. The workloads are very diverse, ranging from mpi to simple single threaded apps.
            bzzz Alex Zhuravlev added a comment - - edited

            this is actually a valid situation under load where few threads are working on the same set of files. what kind of load were you running?

            bzzz Alex Zhuravlev added a comment - - edited this is actually a valid situation under load where few threads are working on the same set of files. what kind of load were you running?

            People

              emoly.liu Emoly Liu
              cbc christopher coffey (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: