[LU-1252] Imperative recovery bugs go here Created: 22/Mar/12  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jinshan Xiong (Inactive) Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Sub-Tasks:
Key
Summary
Type
Status
Assignee
LU-1701 CLONE - mgc_apply_recover_logs() ASSE... Technical task Resolved Keith Mannthey  
Severity: 3
Rank (Obsolete): 4601

 Description   

contain for imperative recovery.



 Comments   
Comment by James A Simmons [ 23/Mar/12 ]

First patch is here http://review.whamcloud.com/#change,2371

Comment by James A Simmons [ 26/Mar/12 ]

While testing with the patch I seen two bugs. One was for a bogus recovery time out as seen below. I don't think the clients are in recovery for 18446744073709551615s

LustreError: 4486:0:(ldlm_lib.c:941:target_handle_connect()) lustre-OST000c: denying connection for new client 12@gni (fde45892-b2d3-a6d0-0ff6-b0e9b7d0740b): 18 clients in recovery for 18446744073709551615s

The second is this report:

[ 945.256721] Lustre: lustre-OST0004: Recovery over after 0:01, of 24 clients 23 recovered and 1 was evicted.

Judging by the timestamps it took longer then 1 second to recovery.

Comment by Jinshan Xiong (Inactive) [ 02/Apr/12 ]

Hi James, can you please apply this patch: http://review.whamcloud.com/#change,1797

when you do IR test next time. I found this patch helped a lot to reduce reconnecting time.

Comment by James A Simmons [ 02/Apr/12 ]

Merged it to our build system. Will test tomorrow.

Comment by Build Master (Inactive) [ 07/Apr/12 ]

Integrated in lustre-dev » i686,client,el6,inkernel #323
LU-1252 recovery: don't always swap nidtbl entries (Revision 6026e9c13b2ce73a3e55eeced7407329ac00bce0)

Result = SUCCESS
tappro : 6026e9c13b2ce73a3e55eeced7407329ac00bce0
Files :

  • lustre/mgs/mgs_nids.c
  • lustre/mgc/mgc_request.c
Comment by Build Master (Inactive) [ 07/Apr/12 ]

Integrated in lustre-dev » x86_64,server,el5,inkernel #323
LU-1252 recovery: don't always swap nidtbl entries (Revision 6026e9c13b2ce73a3e55eeced7407329ac00bce0)

Result = SUCCESS
tappro : 6026e9c13b2ce73a3e55eeced7407329ac00bce0
Files :

  • lustre/mgs/mgs_nids.c
  • lustre/mgc/mgc_request.c
Comment by Build Master (Inactive) [ 07/Apr/12 ]

Integrated in lustre-dev » i686,server,el5,inkernel #323
LU-1252 recovery: don't always swap nidtbl entries (Revision 6026e9c13b2ce73a3e55eeced7407329ac00bce0)

Result = SUCCESS
tappro : 6026e9c13b2ce73a3e55eeced7407329ac00bce0
Files :

  • lustre/mgc/mgc_request.c
  • lustre/mgs/mgs_nids.c
Comment by Build Master (Inactive) [ 07/Apr/12 ]

Integrated in lustre-dev » x86_64,server,el6,inkernel #323
LU-1252 recovery: don't always swap nidtbl entries (Revision 6026e9c13b2ce73a3e55eeced7407329ac00bce0)

Result = SUCCESS
tappro : 6026e9c13b2ce73a3e55eeced7407329ac00bce0
Files :

  • lustre/mgc/mgc_request.c
  • lustre/mgs/mgs_nids.c
Comment by Build Master (Inactive) [ 07/Apr/12 ]

Integrated in lustre-dev » x86_64,client,el5,inkernel #323
LU-1252 recovery: don't always swap nidtbl entries (Revision 6026e9c13b2ce73a3e55eeced7407329ac00bce0)

Result = SUCCESS
tappro : 6026e9c13b2ce73a3e55eeced7407329ac00bce0
Files :

  • lustre/mgs/mgs_nids.c
  • lustre/mgc/mgc_request.c
Comment by Build Master (Inactive) [ 07/Apr/12 ]

Integrated in lustre-dev » i686,client,el5,inkernel #323
LU-1252 recovery: don't always swap nidtbl entries (Revision 6026e9c13b2ce73a3e55eeced7407329ac00bce0)

Result = SUCCESS
tappro : 6026e9c13b2ce73a3e55eeced7407329ac00bce0
Files :

  • lustre/mgs/mgs_nids.c
  • lustre/mgc/mgc_request.c
Comment by Build Master (Inactive) [ 07/Apr/12 ]

Integrated in lustre-dev » x86_64,client,el6,inkernel #323
LU-1252 recovery: don't always swap nidtbl entries (Revision 6026e9c13b2ce73a3e55eeced7407329ac00bce0)

Result = SUCCESS
tappro : 6026e9c13b2ce73a3e55eeced7407329ac00bce0
Files :

  • lustre/mgs/mgs_nids.c
  • lustre/mgc/mgc_request.c
Comment by Jinshan Xiong (Inactive) [ 10/Apr/12 ]

Another patch is at: http://review.whamcloud.com/#change,2410

Comment by James A Simmons [ 11/Apr/12 ]

Is this needed for my testings?

Comment by Jinshan Xiong (Inactive) [ 11/Apr/12 ]

It's only needed if you cluster is composed of heterogeneous nodes. So I don;t think you need apply it.

Comment by James A Simmons [ 17/Apr/12 ]

I just got over 100GB of logs to look at. Its for one test run. In the test I attempted to powerman one OSS node. Well tanks to memory bugs in the the debug daemon all the OSS server went pop. The logs cover the entire length of recover. Server side I have systems log pre and post crash.

Comment by Jinshan Xiong (Inactive) [ 25/Apr/12 ]

It'll be really fun to look at 100G logs

Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,client,el5,inkernel #340
LU-1252 recovery: don't always swap nidtbl entries (Revision 853076deee223f9bd3c65a85b36fa766b4993666)

Result = SUCCESS
Mikhail Pershin : 853076deee223f9bd3c65a85b36fa766b4993666
Files :

  • lustre/mgc/mgc_request.c
  • lustre/mgs/mgs_nids.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,client,el6,inkernel #340
LU-1252 recovery: don't always swap nidtbl entries (Revision 853076deee223f9bd3c65a85b36fa766b4993666)

Result = SUCCESS
Mikhail Pershin : 853076deee223f9bd3c65a85b36fa766b4993666
Files :

  • lustre/mgc/mgc_request.c
  • lustre/mgs/mgs_nids.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,server,el5,inkernel #340
LU-1252 recovery: don't always swap nidtbl entries (Revision 853076deee223f9bd3c65a85b36fa766b4993666)

Result = SUCCESS
Mikhail Pershin : 853076deee223f9bd3c65a85b36fa766b4993666
Files :

  • lustre/mgs/mgs_nids.c
  • lustre/mgc/mgc_request.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,server,el6,inkernel #340
LU-1252 recovery: don't always swap nidtbl entries (Revision 853076deee223f9bd3c65a85b36fa766b4993666)

Result = SUCCESS
Mikhail Pershin : 853076deee223f9bd3c65a85b36fa766b4993666
Files :

  • lustre/mgs/mgs_nids.c
  • lustre/mgc/mgc_request.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,client,el5,inkernel #340
LU-1252 recovery: don't always swap nidtbl entries (Revision 853076deee223f9bd3c65a85b36fa766b4993666)

Result = SUCCESS
Mikhail Pershin : 853076deee223f9bd3c65a85b36fa766b4993666
Files :

  • lustre/mgc/mgc_request.c
  • lustre/mgs/mgs_nids.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,server,el5,inkernel #340
LU-1252 recovery: don't always swap nidtbl entries (Revision 853076deee223f9bd3c65a85b36fa766b4993666)

Result = SUCCESS
Mikhail Pershin : 853076deee223f9bd3c65a85b36fa766b4993666
Files :

  • lustre/mgc/mgc_request.c
  • lustre/mgs/mgs_nids.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,client,el6,inkernel #340
LU-1252 recovery: don't always swap nidtbl entries (Revision 853076deee223f9bd3c65a85b36fa766b4993666)

Result = SUCCESS
Mikhail Pershin : 853076deee223f9bd3c65a85b36fa766b4993666
Files :

  • lustre/mgs/mgs_nids.c
  • lustre/mgc/mgc_request.c
Comment by Cory Spitz [ 14/Aug/12 ]

The b2_2 version of change #2410 is at http://review.whamcloud.com/#change,3008.

Generated at Sat Feb 10 01:14:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.