[LU-178] server can do replays after recovery Created: 29/Mar/11  Updated: 23/Jun/16  Resolved: 23/Jun/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Minor
Reporter: Mikhail Pershin Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 5046

 Description   

Niu has found in LU-128 that there is race possible in target_handle_connect(). The server sets RECOVERING flag in reply but right after that the recovery is ended, so client is evicted and new connection is established. The client will start replaying over established connection and server accepts them fully.

We have two problems here:
1) The race itself to be fixed so client shouldn't get RECOVERING flag for new connection.
2) The server must deny replay requests during normal processing otherwise it is possible to break transaction flow by malformed client which will send replays without recovery.

I am going to add patches in gerrit for both problems.



 Comments   
Comment by Build Master (Inactive) [ 01/Apr/11 ]

Integrated in reviews-centos5 #634
LU-178 prevent replays after recovery on server

Mikhail Pershin : c5608451d7afcf50bfd535b2e3583667dc9b50ae
Files :

  • lustre/ptlrpc/service.c
  • lustre/include/obd_support.h
  • lustre/mdt/mdt_handler.c
  • lustre/ptlrpc/client.c
  • lustre/tests/replay-single.sh
  • lustre/ldlm/ldlm_lib.c
Comment by Build Master (Inactive) [ 08/May/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #110
LU-178 prevent replays after recovery on server

Oleg Drokin : 5d4ae6c905b8e8a38a6e4d3195f550a1e2b37ba8
Files :

  • lustre/tests/replay-single.sh
  • lustre/ldlm/ldlm_lib.c
  • lustre/mdt/mdt_handler.c
  • lustre/ptlrpc/client.c
  • lustre/include/obd_support.h
  • lustre/ptlrpc/service.c
Comment by Build Master (Inactive) [ 08/May/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #110
LU-178 prevent replays after recovery on server

Oleg Drokin : 5d4ae6c905b8e8a38a6e4d3195f550a1e2b37ba8
Files :

  • lustre/mdt/mdt_handler.c
  • lustre/tests/replay-single.sh
  • lustre/ldlm/ldlm_lib.c
  • lustre/ptlrpc/service.c
  • lustre/include/obd_support.h
  • lustre/ptlrpc/client.c
Comment by Build Master (Inactive) [ 08/May/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #110
LU-178 prevent replays after recovery on server

Oleg Drokin : 5d4ae6c905b8e8a38a6e4d3195f550a1e2b37ba8
Files :

  • lustre/mdt/mdt_handler.c
  • lustre/ldlm/ldlm_lib.c
  • lustre/include/obd_support.h
  • lustre/ptlrpc/client.c
  • lustre/tests/replay-single.sh
  • lustre/ptlrpc/service.c
Comment by Build Master (Inactive) [ 08/May/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #110
LU-178 prevent replays after recovery on server

Oleg Drokin : 5d4ae6c905b8e8a38a6e4d3195f550a1e2b37ba8
Files :

  • lustre/tests/replay-single.sh
  • lustre/ptlrpc/service.c
  • lustre/ldlm/ldlm_lib.c
  • lustre/include/obd_support.h
  • lustre/ptlrpc/client.c
  • lustre/mdt/mdt_handler.c
Comment by Build Master (Inactive) [ 08/May/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #110
LU-178 prevent replays after recovery on server

Oleg Drokin : 5d4ae6c905b8e8a38a6e4d3195f550a1e2b37ba8
Files :

  • lustre/mdt/mdt_handler.c
  • lustre/ldlm/ldlm_lib.c
  • lustre/include/obd_support.h
  • lustre/tests/replay-single.sh
  • lustre/ptlrpc/client.c
  • lustre/ptlrpc/service.c
Comment by Build Master (Inactive) [ 08/May/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #110
LU-178 prevent replays after recovery on server

Oleg Drokin : 5d4ae6c905b8e8a38a6e4d3195f550a1e2b37ba8
Files :

  • lustre/ptlrpc/service.c
  • lustre/include/obd_support.h
  • lustre/tests/replay-single.sh
  • lustre/ptlrpc/client.c
  • lustre/mdt/mdt_handler.c
  • lustre/ldlm/ldlm_lib.c
Comment by Build Master (Inactive) [ 08/May/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #110
LU-178 prevent replays after recovery on server

Oleg Drokin : 5d4ae6c905b8e8a38a6e4d3195f550a1e2b37ba8
Files :

  • lustre/ptlrpc/client.c
  • lustre/include/obd_support.h
  • lustre/mdt/mdt_handler.c
  • lustre/ldlm/ldlm_lib.c
  • lustre/tests/replay-single.sh
  • lustre/ptlrpc/service.c
Comment by Build Master (Inactive) [ 08/May/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #110
LU-178 prevent replays after recovery on server

Oleg Drokin : 5d4ae6c905b8e8a38a6e4d3195f550a1e2b37ba8
Files :

  • lustre/ptlrpc/service.c
  • lustre/tests/replay-single.sh
  • lustre/include/obd_support.h
  • lustre/mdt/mdt_handler.c
  • lustre/ptlrpc/client.c
  • lustre/ldlm/ldlm_lib.c
Comment by Build Master (Inactive) [ 08/May/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #110
LU-178 prevent replays after recovery on server

Oleg Drokin : 5d4ae6c905b8e8a38a6e4d3195f550a1e2b37ba8
Files :

  • lustre/tests/replay-single.sh
  • lustre/ptlrpc/service.c
  • lustre/mdt/mdt_handler.c
  • lustre/include/obd_support.h
  • lustre/ptlrpc/client.c
  • lustre/ldlm/ldlm_lib.c
Comment by Build Master (Inactive) [ 08/May/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #110
LU-178 prevent replays after recovery on server

Oleg Drokin : 5d4ae6c905b8e8a38a6e4d3195f550a1e2b37ba8
Files :

  • lustre/include/obd_support.h
  • lustre/ldlm/ldlm_lib.c
  • lustre/ptlrpc/client.c
  • lustre/ptlrpc/service.c
  • lustre/mdt/mdt_handler.c
  • lustre/tests/replay-single.sh
Comment by Build Master (Inactive) [ 08/May/11 ]

Integrated in lustre-master » i686,client,el5,ofa #110
LU-178 prevent replays after recovery on server

Oleg Drokin : 5d4ae6c905b8e8a38a6e4d3195f550a1e2b37ba8
Files :

  • lustre/ldlm/ldlm_lib.c
  • lustre/mdt/mdt_handler.c
  • lustre/include/obd_support.h
  • lustre/ptlrpc/service.c
  • lustre/tests/replay-single.sh
  • lustre/ptlrpc/client.c
Comment by Build Master (Inactive) [ 08/May/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #110
LU-178 prevent replays after recovery on server

Oleg Drokin : 5d4ae6c905b8e8a38a6e4d3195f550a1e2b37ba8
Files :

  • lustre/include/obd_support.h
  • lustre/ptlrpc/service.c
  • lustre/mdt/mdt_handler.c
  • lustre/ldlm/ldlm_lib.c
  • lustre/tests/replay-single.sh
  • lustre/ptlrpc/client.c
Comment by Build Master (Inactive) [ 08/May/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #110
LU-178 prevent replays after recovery on server

Oleg Drokin : 5d4ae6c905b8e8a38a6e4d3195f550a1e2b37ba8
Files :

  • lustre/tests/replay-single.sh
  • lustre/ptlrpc/client.c
  • lustre/mdt/mdt_handler.c
  • lustre/ldlm/ldlm_lib.c
  • lustre/include/obd_support.h
  • lustre/ptlrpc/service.c
Comment by Build Master (Inactive) [ 08/May/11 ]

Integrated in lustre-master » i686,server,el5,ofa #110
LU-178 prevent replays after recovery on server

Oleg Drokin : 5d4ae6c905b8e8a38a6e4d3195f550a1e2b37ba8
Files :

  • lustre/ptlrpc/client.c
  • lustre/include/obd_support.h
  • lustre/mdt/mdt_handler.c
  • lustre/ldlm/ldlm_lib.c
  • lustre/tests/replay-single.sh
  • lustre/ptlrpc/service.c
Comment by Peter Jones [ 13/Jun/11 ]

Is any further work required or can this ticket be marked as resolved?

Comment by Niu Yawei (Inactive) [ 13/Jun/11 ]

Looks b1_8 has the same problem, we should port it to b1)8 as well.

Comment by Peter Jones [ 14/Jun/11 ]

Niu,

Do I understand correctly that this is a long-standing issue with recovery on 1.8.x?

Peter

Comment by Niu Yawei (Inactive) [ 14/Jun/11 ]

Yes, I think so, at least current b1_8 has the same problem.

Comment by Niu Yawei (Inactive) [ 23/Jun/16 ]

There is no intention to backport the fix to b1_8. This can be closed.

Generated at Sat Feb 10 01:04:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.