[LU-4966] handle server registration errors gracefully Created: 28/Apr/14  Updated: 05/Dec/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Niu Yawei (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: llnl

Issue Links:
Related
is related to LU-1257 OST registration snafu Resolved
is related to LU-9838 target registration mount fails with ... Resolved
is related to LU-15112 attempt to register an OST with dupli... Resolved
is related to LU-14928 Allow MD target re-registered after w... Resolved
is related to LU-17240 change test-framework to format and m... Open
Rank (Obsolete): 13736

 Description   

If some server registered successfully on MGS, but it got an error registration reply (MGS revoking config locks timeout or other networking problems), then the server will always get -EADDRINUSE error when it try to register next time, because the server index has been occupied on MGS in the first registration.

Current solution for above situation is to use writeconf option to force registration.

We need to get this improved and make MGS able to handle this gracefully.



 Comments   
Comment by Niu Yawei (Inactive) [ 28/Apr/14 ]

I think if MGS save server UUID along with the server index, the it can tell if the registration (acquire for an occupied index) come from same server.
And looks MGS now keeps server index bitmap in memory only, it needs be saved in disk as well.

Comment by Niu Yawei (Inactive) [ 28/Apr/14 ]

Andreas/Alex, any suggestions? Thanks.

Comment by Alex Zhuravlev [ 28/Apr/14 ]

this issue was mentioned by Chris in LU-1257 and at the moment it's not clear whether this specific issue is a major one for LLNL. it'll be hard to get this fixed in 2.6 due to amount of changes ?

Comment by Andreas Dilger [ 28/Apr/14 ]

Alex, I think Niu was asking for ideas on how this might best be fixed.

Comment by Li Wei (Inactive) [ 25/Nov/14 ]

Niu's idea makes sense to me. I spent some time experimenting with it.

A real UUID (e.g., "a53bc5ba-687b-4091-fb0b-61489785f247") could easily be generated by back-end-independent mkfs.lustre code and stored in a back-end-specific way (e.g., in ldiskfs "mountdata" or as a ZFS dataset property). (ZFS has pool IDs, but those are pool properties and are only 64-bit.)

Current master code always passes empty strings in mti_uuid. It would be nice if real UUIDs could be packed into that field. However, experiments showed:

  • MDT OSPs for OSTs would send MDS_CONNECTs to OSTs, because client_obd_setup() depends on "OST" in UUIDs to determine whether an OSP is for an OST or an MDT.
  • Clients would not be able to connect to MDTs, because mgs would put fake UUIDs (e.g., "lustre-MDT0000_UUID") into MDT logs but real UUIDs into the client log.

A possible solution is:

  • Start generating real UUIDs for any newly formatted targets.
  • Send real UUIDs via mti_uuid.
  • MGT checks and stores real UUIDs somewhere, but keeps putting fake UUIDs into logs.
  • Newly formatted targets must talk with new mgs code.
Comment by Andreas Dilger [ 25/Nov/14 ]

There is space in the last_rcvd file to store the UUID, but that has the potential problem that this file may be deleted if there are problems with recovery.

As for the OST detection in the connection cide, it would be possible to store the target type in the last byte of the UUID or similar (e.g. the ASCII "O" or "M") and still make the rest of the UUID random.

Comment by Sebastien Piechurski [ 03/Nov/16 ]

Was this issue somehow adressed in latest versions ?
We are still seeing this problem when installing configurations with large amount of OSTs (or when modifiying it after a writeconf) and have to take care manually of it by starting OSTs one after another.

Generated at Sat Feb 10 01:47:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.