[LU-266] Need a better, automated way to recover from failures that require LAST_ID recovery Created: 03/May/11  Updated: 25/Apr/13  Resolved: 25/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0, Lustre 2.1.0, Lustre 1.8.6
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: John Salinas (Inactive) Assignee: Peter Jones
Resolution: Fixed Votes: 0
Labels: None
Environment:

Linux and Lustre


Issue Links:
Related
is related to LU-14 live replacement of OST Resolved
Bugzilla ID: 22,734
Rank (Obsolete): 7875

 Description   

We continue to see issues where hardware failures create problems on the filesystem that require LAST_ID recovery. While there is a manual procedure to do this it is unnecessarily arduous, especially for the number of times that we continue to have to do this.

Bug 22734 details one possible automated solution to this. It is possible there are other way to handle this and we would be open to exploring them as long as we can arrive at something that is as simple (or even simpiler) to run to recover quickly from this situation.



 Comments   
Comment by John Salinas (Inactive) [ 03/May/11 ]

The proposed fix from Oracle Bug 22734 has been used in the field at least twice

Comment by Peter Jones [ 15/Jun/11 ]

John

Can you upload this patch into gerrit?

Thanks

Peter

Comment by Peter Jones [ 13/Oct/11 ]

John

If you are unsure about how to do this, Ihara should be able to help

Peter

Comment by Kit Westneat (Inactive) [ 14/Aug/12 ]

Patch uploaded to gerrit:
http://review.whamcloud.com/3640

Comment by Andreas Dilger [ 25/Apr/13 ]

The submitted patch was landed. Further work is described in LU-14.

Generated at Sat Feb 10 01:05:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.