Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
Various conf-sanity tests fail with large nids. The test mounts a single OST while the MGS is stopped. As a result, the OST is unable to determine whether the MGS understands large NIDs and the mount fails with:
[Tue Jul 9 11:11:20 2024] LustreError: 60249:0:(tgt_mount.c:1281:server_lsi2mti()) Failed to get NID for server lustre-OST0000, please check whether the target is specifed with improper --servicenode or --network options.
static struct mgs_target_info *server_lsi2mti(struct lustre_sb_info *lsi) { ... int nid_count = 0; ... if (exp_connect_flags2(lsi->lsi_mgc->u.cli.cl_mgc_mgsexp) & OBD_CONNECT2_LARGE_NID) large_nid = true; ... while (LNetGetId(i++, &id, large_nid) != -ENOENT) { ... tmp = genradix_ptr_alloc(&plist, nid_count++, GFP_KERNEL); ... } if (nid_count == 0) { CERROR("Failed to get NID for server %s, please check whether the target is specifed with improper --servicenode or --network options.\n", lsi->lsi_svname);
large_nid is false because MGS is stopped and so OBD_CONNECT2_LARGE_NID is not set.
The test cases that are known to be impacted by this issue
conf-sanity test_5b
conf-sanity test_9
conf-sanity test_17
conf-sanity test_19b
conf-sanity test_21b
conf-sanity test_21c
conf-sanity test_27a
conf-sanity test_40
conf-sanity test_107