[LU-8012] lustre 2.8.0 getattr error rc = -2 Created: 12/Apr/16 Updated: 19/Apr/17 Resolved: 19/Apr/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | christopher coffey | Assignee: | Emoly Liu |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | easy, llnl | ||
| Environment: |
Lustre 2.8.0 (intel) client and server, EL6.7 clients and servers, kernel 2.6.32-573.12.1.el6.x86_64 client and server, redhat ofed client and server, mellanox fdr hca client and server. 7 combined OSS/OST |
||
| Issue Links: |
|
||||||||
| Severity: | 2 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
The combined MDS/MGS server reports the following in /var/log/messages: Apr 11 09:33:39 mds1 kernel: LustreError: 43754:0:(mdt_handler.c:893:mdt_getattr_internal()) blizzard-MDT0000: getattr error for [0x2000403f7:0x1e0f4:0x0]: rc = -2 The error implies that the file does not exist, and this also shows the same: sudo lfs fid2path /scratch 0x2000403f7:0x1e0f4:0x0 Thanks, |
| Comments |
| Comment by Alex Zhuravlev [ 12/Apr/16 ] |
|
this is actually a valid situation under load where few threads are working on the same set of files. what kind of load were you running? |
| Comment by christopher coffey [ 12/Apr/16 ] |
|
Hi Alex, I haven't isolated which app/workoad is creating these messages. The workloads are very diverse, ranging from mpi to simple single threaded apps. |
| Comment by Olaf Faaland [ 28/Apr/16 ] |
|
Hi Alex, We are seeing this as well. I haven't yet identified the sequence of events. We see it with a set of test scripts that race to mkdir/rmdir/create/unlink/read/write within a common directory, in a randomized manner. I'll try narrow the set of operations down. Am I correct that the object existed at the very beginning of mdt_getattr (I see assert), but then after mdt_getattr->mdt_getattr_internal->mdt_attr_get_complex does not? Is this supposed to be protected entirely by an LDLM lock held by the client? thanks, |
| Comment by Christopher Morrone [ 30/Aug/16 ] |
|
Still seeing this in testing. We need to get this message resolved. Having a flood of messages on the console that sysadmins need to ignore leads to sysadmins that stop looking at the logs and we miss important things. |
| Comment by Andreas Dilger [ 19/Apr/17 ] |
|
This is a normal situation if there are multiple threads racing to unlink a single file. The patch http://review.whamcloud.com/18145 " |
| Comment by Andreas Dilger [ 19/Apr/17 ] |
|
Patch landed as v2_8_55_0-133-gc3e03f3 so it is included in 2.9.0. |