[LU-2387] Error messages printed in mdt_reint_open, possibly causing evictions Created: 26/Nov/12  Updated: 27/Nov/12  Resolved: 27/Nov/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Prakash Surya (Inactive) Assignee: Alex Zhuravlev
Resolution: Duplicate Votes: 0
Labels: shh, topsequoia
Environment:

Tag: 2.3.56-2chaos-3surya1


Issue Links:
Duplicate
duplicates LU-1353 mdt_reint_open() @@@ OPEN & CREAT not... Resolved
Severity: 3
Rank (Obsolete): 5661

 Description   

I just rebooted our MDS and see the following messages on the console:

Lustre: lstest-MDT0000: Will be in recovery for at least 5:00, or until 265 clients reconnect.
LustreError: 33073:0:(mdt_open.c:1328:mdt_reint_open()) @@@ [0x2000182dc:0x156a9:0x0]/simul_open.0->[0x2000182dc:0x1f2d1:0x0] cr_flags=03 mode=0100000 msg_flag=0x4 not found in open replay.  req@ffff881fd78c5050 x1419179046425207/t0(201876066869) o101->bfcbc1b8-9a26-425f-5198-348ff50beb40@172.20.3.120@o2ib500:0/0 lens 568/1136 e 0 to 0 dl 1353975967 ref 1 fl Complete:/4/0 rc 0/0
2012-11-26 16:25:06 LustreError: 33073:0:(mdt_open.c:1328:mdt_reint_open()) @@@ [0x2000182dc:0x156a9:0x0]/simul_close.0->[0x2000182dc:0x1f2d2:0x0] cr_flags=03 mode=0100000 msg_flag=0x4 not found in open replay.  req@ffff881fd7fe1c50 x1419179046425214/t0(201876066900) o101->bfcbc1b8-9a26-425f-5198-348ff50beb40@172.20.3.120@o2ib500:0/0 lens 568/1136 e 0 to 0 dl 1353975967 ref 1 fl Complete:/4/0 rc 0/0
2012-11-26 16:25:07 LustreError: 33073:0:(mdt_open.c:1328:mdt_reint_open()) @@@ [0x2000182dc:0x156a9:0x0]/simul_open.0->[0x2000182dc:0x1f380:0x0] cr_flags=03 mode=0100000 msg_flag=0x4 not found in open replay.  req@ffff881fd80c9450 x1419179046426510/t0(201876068937) o101->bfcbc1b8-9a26-425f-5198-348ff50beb40@172.20.3.120@o2ib500:0/0 lens 568/1136 e 0 to 0 dl 1353975968 ref 1 fl Complete:/4/0 rc 0/0
2012-11-26 16:25:07 LustreError: 33073:0:(mdt_open.c:1328:mdt_reint_open()) Skipped 43 previous similar messages
2012-11-26 16:25:08 LustreError: 33073:0:(mdt_open.c:1328:mdt_reint_open()) @@@ [0x2000182dc:0x156a9:0x0]/simul_lseek.0->[0x2000182dc:0x1f45a:0x0] cr_flags=03 mode=0100000 msg_flag=0x4 not found in open replay.  req@ffff881fd675f850 x1419179046428033/t0(201876070976) o101->bfcbc1b8-9a26-425f-5198-348ff50beb40@172.20.3.120@o2ib500:0/0 lens 568/1136 e 0 to 0 dl 1353975969 ref 1 fl Complete:/4/0 rc 0/0
2012-11-26 16:25:08 LustreError: 33073:0:(mdt_open.c:1328:mdt_reint_open()) Skipped 56 previous similar messages
2012-11-26 16:25:09 Lustre: lstest-MDT0000: Recovery over after 1:15, of 265 clients 256 recovered and 9 were evicted.

I haven't looked at the code to determine what the LustreError messages mean, but I wanted to open an issue in the mean time.

First off, these should really be cleaned up and reworked to print something sane that an admin can understand.

Secondly, 9 clients were evicted here, so I'm curious if these eviction are a result of the error messages printed just prior to recovery completion.



 Comments   
Comment by Peter Jones [ 26/Nov/12 ]

Alex

What do you think?

Peter

Comment by Alex Zhuravlev [ 27/Nov/12 ]

Hi,

what specific message you're referring to? there is no DEBUG_REG() at line 1328, AFAICS. I'm not able to find "not found in open replay" in the sources,
though there is a similar message.

Comment by Prakash Surya (Inactive) [ 27/Nov/12 ]

Sorry for the noise. Looking at git blame, we're carrying 2679 which changes the message. I'll go ahead and mark this a duplicate of LU-1353.

Generated at Sat Feb 10 01:24:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.