[LU-6840] update memory reply data in DNE update replay Created: 13/Jul/15  Updated: 02/Sep/15  Resolved: 28/Aug/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Blocker
Reporter: Di Wang Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocker
is blocking LU-6773 DNE2 Failover and recovery soak testing Closed
Related
is related to LU-5319 Support multiple slots per client in ... Resolved
is related to LU-6831 The ticket for tracking all DNE2 bugs Reopened
is related to LU-6844 replay-single test 70b failure: 'rund... Resolved
is related to LU-7077 Pointer 'hash' returned from call to ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

DNE update replay might update last_rcvd file, but because it will be operated on OSD/OSP directly, so the memory structure (ted/lrd etc) will not be updated by update replay handler, so it needs update these memory structure by itself after each update replay. (see replay_request_or_update() ).

In current implementation, this is done in target_update_lcd(), but multiple slot patch just changed this process, so target_update_lcd needs to be fixed as well.



 Comments   
Comment by Gerrit Updater [ 13/Jul/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15576
Subject: LU-6840 target: update reply data after update replay
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 49d65b184daf7c6d580f918b3befd41a16c1f278

Comment by Gregoire Pichon [ 20/Jul/15 ]

I am not sure to understand the context, but the metadata operations operated on OSD/OSP should not make use of the multiple slot patch.

At the moment (lustre 2.8) multiple slot feature is only supported for MDT exports that have the OBD_CONNECT_MULTIMODRPCS flag, that is to say exports for MDC. This flag should not be set on exports for OSP.

Comment by Di Wang [ 20/Jul/15 ]

For cross-MDT operation, the updates (including update last_rcvd and reply_data) will be recorded on all of MDTs. So during the recovery, once the updates on the master MDT are missing, then these updates needs to be redo on the master MDT, which means these last_rcvd and reply_data file needs to be updated as well in this process, and those memory stuff (lcd etc) needs to be updated as well, which is normally through the trans stop callback.

In current implementation, due to the speciality of the DNE recovery and recent changes of multiple slot, stop callback can not be called correctly, which is this patch trying to resolve.

Comment by Gerrit Updater [ 28/Aug/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15576/
Subject: LU-6840 target: update reply data after update replay
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9543df37cdfd35980c888440265d161e350d166d

Comment by Joseph Gmitter (Inactive) [ 28/Aug/15 ]

Landed for 2.8.

Generated at Sat Feb 10 02:03:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.