[LU-2387] Error messages printed in mdt_reint_open, possibly causing evictions Created: 26/Nov/12 Updated: 27/Nov/12 Resolved: 27/Nov/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Prakash Surya (Inactive) | Assignee: | Alex Zhuravlev |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | shh, topsequoia | ||
| Environment: |
Tag: 2.3.56-2chaos-3surya1 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 5661 | ||||||||
| Description |
|
I just rebooted our MDS and see the following messages on the console: Lustre: lstest-MDT0000: Will be in recovery for at least 5:00, or until 265 clients reconnect. LustreError: 33073:0:(mdt_open.c:1328:mdt_reint_open()) @@@ [0x2000182dc:0x156a9:0x0]/simul_open.0->[0x2000182dc:0x1f2d1:0x0] cr_flags=03 mode=0100000 msg_flag=0x4 not found in open replay. req@ffff881fd78c5050 x1419179046425207/t0(201876066869) o101->bfcbc1b8-9a26-425f-5198-348ff50beb40@172.20.3.120@o2ib500:0/0 lens 568/1136 e 0 to 0 dl 1353975967 ref 1 fl Complete:/4/0 rc 0/0 2012-11-26 16:25:06 LustreError: 33073:0:(mdt_open.c:1328:mdt_reint_open()) @@@ [0x2000182dc:0x156a9:0x0]/simul_close.0->[0x2000182dc:0x1f2d2:0x0] cr_flags=03 mode=0100000 msg_flag=0x4 not found in open replay. req@ffff881fd7fe1c50 x1419179046425214/t0(201876066900) o101->bfcbc1b8-9a26-425f-5198-348ff50beb40@172.20.3.120@o2ib500:0/0 lens 568/1136 e 0 to 0 dl 1353975967 ref 1 fl Complete:/4/0 rc 0/0 2012-11-26 16:25:07 LustreError: 33073:0:(mdt_open.c:1328:mdt_reint_open()) @@@ [0x2000182dc:0x156a9:0x0]/simul_open.0->[0x2000182dc:0x1f380:0x0] cr_flags=03 mode=0100000 msg_flag=0x4 not found in open replay. req@ffff881fd80c9450 x1419179046426510/t0(201876068937) o101->bfcbc1b8-9a26-425f-5198-348ff50beb40@172.20.3.120@o2ib500:0/0 lens 568/1136 e 0 to 0 dl 1353975968 ref 1 fl Complete:/4/0 rc 0/0 2012-11-26 16:25:07 LustreError: 33073:0:(mdt_open.c:1328:mdt_reint_open()) Skipped 43 previous similar messages 2012-11-26 16:25:08 LustreError: 33073:0:(mdt_open.c:1328:mdt_reint_open()) @@@ [0x2000182dc:0x156a9:0x0]/simul_lseek.0->[0x2000182dc:0x1f45a:0x0] cr_flags=03 mode=0100000 msg_flag=0x4 not found in open replay. req@ffff881fd675f850 x1419179046428033/t0(201876070976) o101->bfcbc1b8-9a26-425f-5198-348ff50beb40@172.20.3.120@o2ib500:0/0 lens 568/1136 e 0 to 0 dl 1353975969 ref 1 fl Complete:/4/0 rc 0/0 2012-11-26 16:25:08 LustreError: 33073:0:(mdt_open.c:1328:mdt_reint_open()) Skipped 56 previous similar messages 2012-11-26 16:25:09 Lustre: lstest-MDT0000: Recovery over after 1:15, of 265 clients 256 recovered and 9 were evicted. I haven't looked at the code to determine what the LustreError messages mean, but I wanted to open an issue in the mean time. First off, these should really be cleaned up and reworked to print something sane that an admin can understand. Secondly, 9 clients were evicted here, so I'm curious if these eviction are a result of the error messages printed just prior to recovery completion. |
| Comments |
| Comment by Peter Jones [ 26/Nov/12 ] |
|
Alex What do you think? Peter |
| Comment by Alex Zhuravlev [ 27/Nov/12 ] |
|
Hi, what specific message you're referring to? there is no DEBUG_REG() at line 1328, AFAICS. I'm not able to find "not found in open replay" in the sources, |
| Comment by Prakash Surya (Inactive) [ 27/Nov/12 ] |
|
Sorry for the noise. Looking at git blame, we're carrying 2679 which changes the message. I'll go ahead and mark this a duplicate of LU-1353. |