[LU-14] live replacement of OST Created: 16/Nov/10 Updated: 24/Sep/15 Resolved: 23/Dec/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.4.2, Lustre 2.5.1 |
| Type: | Improvement | Priority: | Critical |
| Reporter: | Lai Siyao | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Bugzilla ID: | 24,128 | ||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 7701 | ||||||||||||||||||||||||||||||||||||||||
| Description |
|
Hot replace: It probably wouldn't be impossible to have a new OST gracefully replace an old one, if that is what the administrator wanted. Some "special" action would need to be taken on the OST and/or MDT to ensure that this is what the admin wanted, instead of e.g. accidentally inserting some other OST with the same index and corrupting the filesystem because of duplicate object IDs, or not being able to access existing objects on the "real" OST at that index.
|
| Comments |
| Comment by Lai Siyao [ 19/Nov/10 ] |
|
Did some tests, finished 30% code. |
| Comment by bschubert [ 19/Nov/10 ] |
|
I just noticed this here, while it is still easy to browse through all the open issues Just for your information, the offline approach: https://bugzilla.lustre.org/show_bug.cgi?id=22734 |
| Comment by Lai Siyao [ 19/Nov/10 ] |
|
Thanks for pointing this out, which explains a lot of details on LAST_ID recovery! |
| Comment by Lai Siyao [ 05/Dec/10 ] |
|
Code is ready, and in inspection. |
| Comment by Andreas Dilger [ 09/Oct/12 ] |
|
It probably makes sense for Fan Yong to implement this as part of the LFSCK project, so that an OST can recover from some common forms of corruption. The existing patch is at http://review.whamcloud.com/141, but needs to be refreshed. |
| Comment by nasf (Inactive) [ 29/Dec/12 ] |
|
It will be considered in LFSCK phase II. |
| Comment by Andreas Dilger [ 07/May/13 ] |
|
In discussions during struct last_id_ondisk {
__u64 lio_next_oid;
__u32 lio_magic;
__u32 lio_cksum;
};
and ofd_seq_load() (maybe rename this to ofd_seq_last_oid_read()?), ofd_seq_last_oid_write() and ll_recover_lost_found_objs.c should updated to handle both an old 8-byte LAST_ID file, and this new 16-byte format. If the on-disk LAST_ID file is corrupted (bad lio_magic, bad lio_cksum, lio_next_oid > OBIF_MAX_OID for seq != 0, lio_next_oid > IDIF_MAX_OID for fid_seq == 0) it would be treated the same as if it where missing, and this LAST_ID recovery code should traverse the object directories for that group and rebuild the LAST_ID file. This would avoid the case where the LAST_ID file has some random garbage in it and causes an inconsistency between the MDT's and OST's understanding of what the next valid OID is. |
| Comment by Andreas Dilger [ 07/Jun/13 ] |
|
The one missing part of this process is to be able to use a newly formatted OST in place if an old one with the same index if the last_rcvd and mountdata files are not accessible. The last_rcvd file will be recreated at mount time with default parameters (should normally be ok), but mkfs.lustre will create the mountdata file with the LDD_F_VIRGIN flag always set. It should be possible to add a --replace option to mkfs.lustre so that the MGS doesn't refuse the OST to connect because the index is in use. |
| Comment by Andreas Dilger [ 23/Aug/13 ] |
|
I've pushed http://review.whamcloud.com/7443 for "mkfs.lustre --replace", and the OST precreating only recent objects if the MDT lov_objid is much larger than the OST LAST_ID. This replaces the old patch in http://review.whamcloud.com/141. |
| Comment by Peter Jones [ 26/Sep/13 ] |
|
So is there still further work to complete for this ticket or does the recent landing mean that this ticket can be closed? |
| Comment by nasf (Inactive) [ 07/Oct/13 ] |
|
We still need the patch for rebuilding LAST_ID file: |
| Comment by Bob Glossman (Inactive) [ 04/Nov/13 ] |
|
backport to b2_4: http://review.whamcloud.com/8159 |
| Comment by Jian Yu [ 22/Nov/13 ] |
|
Patch landed on Lustre b2_4 branch. |
| Comment by Jian Yu [ 26/Nov/13 ] |
The new-added conf-sanity test 69 introduced regression failure on interop testing: The patch also introduced conf-sanity test 72 and 73 regression failures: Before Lustre b2_4 build #57 (which contains the patch), conf-sanity test 72 and 73 always passed on Lustre b2_4 branch. |
| Comment by Jian Yu [ 26/Nov/13 ] |
|
The new-added conf-sanity test 69 also introduced regression failure on ZFS testing: |
| Comment by Jian Yu [ 27/Nov/13 ] |
|
Patches for adding Lustre version check codes into conf-sanity test 69: |
| Comment by Andreas Dilger [ 13/Dec/13 ] |
|
Patch http://review.whamcloud.com/6997 is implementing LAST_ID rebuild after corruption, and also handles the case where the MDT and OST are out of sync about the LAST_ID value. |
| Comment by Peter Jones [ 23/Dec/13 ] |
|
Closing as remaining work is tracked under |