[LU-8012] lustre 2.8.0 getattr error rc = -2 - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Minor
Fix Version/s: Lustre 2.9.0
Affects Version/s: Lustre 2.8.0
Labels:
- easy
- llnl
Environment:
Lustre 2.8.0 (intel) client and server, EL6.7 clients and servers, kernel 2.6.32-573.12.1.el6.x86_64 client and server, redhat ofed client and server, mellanox fdr hca client and server. 7 combined OSS/OST

Severity:
2
Rank (Obsolete):
9223372036854775807

Description

The combined MDS/MGS server reports the following in /var/log/messages:

Apr 11 09:33:39 mds1 kernel: LustreError: 43754:0:(mdt_handler.c:893:mdt_getattr_internal()) blizzard-MDT0000: getattr error for [0x2000403f7:0x1e0f4:0x0]: rc = -2

The error implies that the file does not exist, and this also shows the same:

sudo lfs fid2path /scratch 0x2000403f7:0x1e0f4:0x0
fid2path: error on FID 0x2000403f7:0x1e0f4:0x0: No such file or directory

Thanks,
Chris

Attachments

Issue Links

duplicates

LU-7122 Document -n switch for lctl changelog_register

Resolved

Activity

[LU-8012] lustre 2.8.0 getattr error rc = -2

Andreas Dilger added a comment - 19/Apr/17 4:17 PM

Patch landed as v2_8_55_0-133-gc3e03f3 so it is included in 2.9.0.

Andreas Dilger added a comment - 19/Apr/17 4:17 PM Patch landed as v2_8_55_0-133-gc3e03f3 so it is included in 2.9.0.

Andreas Dilger added a comment - 19/Apr/17 4:16 PM

This is a normal situation if there are multiple threads racing to unlink a single file. The patch http://review.whamcloud.com/18145 "LU-7712 mdd: migration is too noisy" turned off this error for the common -ENOENT case.

Andreas Dilger added a comment - 19/Apr/17 4:16 PM This is a normal situation if there are multiple threads racing to unlink a single file. The patch http://review.whamcloud.com/18145 " LU-7712 mdd: migration is too noisy " turned off this error for the common -ENOENT case.

Christopher Morrone (Inactive) added a comment - 30/Aug/16 9:04 PM

Still seeing this in testing. We need to get this message resolved. Having a flood of messages on the console that sysadmins need to ignore leads to sysadmins that stop looking at the logs and we miss important things.

Christopher Morrone (Inactive) added a comment - 30/Aug/16 9:04 PM Still seeing this in testing. We need to get this message resolved. Having a flood of messages on the console that sysadmins need to ignore leads to sysadmins that stop looking at the logs and we miss important things.

Olaf Faaland added a comment - 28/Apr/16 1:07 AM

Hi Alex,

We are seeing this as well. I haven't yet identified the sequence of events. We see it with a set of test scripts that race to mkdir/rmdir/create/unlink/read/write within a common directory, in a randomized manner. I'll try narrow the set of operations down.

Am I correct that the object existed at the very beginning of mdt_getattr (I see assert), but then after mdt_getattr->mdt_getattr_internal->mdt_attr_get_complex does not?

Is this supposed to be protected entirely by an LDLM lock held by the client?

thanks,
Olaf

Olaf Faaland added a comment - 28/Apr/16 1:07 AM Hi Alex, We are seeing this as well. I haven't yet identified the sequence of events. We see it with a set of test scripts that race to mkdir/rmdir/create/unlink/read/write within a common directory, in a randomized manner. I'll try narrow the set of operations down. Am I correct that the object existed at the very beginning of mdt_getattr (I see assert), but then after mdt_getattr->mdt_getattr_internal->mdt_attr_get_complex does not? Is this supposed to be protected entirely by an LDLM lock held by the client? thanks, Olaf

christopher coffey (Inactive) added a comment - 12/Apr/16 4:30 PM

Hi Alex,

I haven't isolated which app/workoad is creating these messages. The workloads are very diverse, ranging from mpi to simple single threaded apps.

christopher coffey (Inactive) added a comment - 12/Apr/16 4:30 PM Hi Alex, I haven't isolated which app/workoad is creating these messages. The workloads are very diverse, ranging from mpi to simple single threaded apps.

Alex Zhuravlev added a comment - 12/Apr/16 4:01 PM - edited

this is actually a valid situation under load where few threads are working on the same set of files. what kind of load were you running?

Alex Zhuravlev added a comment - 12/Apr/16 4:01 PM - edited this is actually a valid situation under load where few threads are working on the same set of files. what kind of load were you running?

People

Assignee:: Emoly Liu

Reporter:: christopher coffey (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 12/Apr/16 3:56 PM

Updated:: 19/Apr/17 4:17 PM

Resolved:: 19/Apr/17 4:17 PM