[LU-14928] Allow MD target re-registered after writeconf Created: 11/Aug/21  Updated: 07/Dec/23  Resolved: 25/Aug/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Alexander Zarochentsev Assignee: Alexander Zarochentsev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-4966 handle server registration errors gra... Open
is related to LU-17240 change test-framework to format and m... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In a DNE system, it is not safe to do writeconf of a MD target and attempt to mount (and re-register) it again, as it creates weird MDT-MDT osp devices as "fsname-MDT0001-osp-MDT0001". But it would be nice to have such a possibility to fix a half-failed target registration, when MGS completes the registration process but the target fails with a timeout not knowing about registration success.



 Comments   
Comment by Gerrit Updater [ 11/Aug/21 ]

"Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/44594
Subject: LU-14928 mgs: allow md target re-register
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7223b7e6fb6925af05cade3699f73874fa5f4751

Comment by Alexander Zarochentsev [ 11/Aug/21 ]

here test conf-sanity 130 output w/o the fix:

== conf-sanity test 130: re-register an MDT after writeconf ========================================== 16:02:19 (1628686939)
start mds service on devvm1
Starting mds1: -o localrecov  /dev/mapper/mds1_flakey /mnt/lustre-mds1
Started lustre-MDT0000
start mds service on devvm1
Starting mds2: -o localrecov  /dev/mapper/mds2_flakey /mnt/lustre-mds2
Started lustre-MDT0001
devvm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
devvm1: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
stop mds service on devvm1
Stopping /mnt/lustre-mds2 (opts:-f) on devvm1
checking for existing Lustre data: found

   Read previous values:
Target:     lustre-MDT0001
Index:      1
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x1
              (MDT )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=192.168.56.101@tcp sys.timeout=20 mdt.identity_upcall=/work/git/lustre-wc-rel/lustre/tests/../utils/l_getidentity


   Permanent disk data:
Target:     lustre=MDT0001
Index:      1
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x101
              (MDT writeconf )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=192.168.56.101@tcp sys.timeout=20 mdt.identity_upcall=/work/git/lustre-wc-rel/lustre/tests/../utils/l_getidentity

Writing CONFIGS/mountdata
start mds service on devvm1
Starting mds2: -o localrecov  /dev/mapper/mds2_flakey /mnt/lustre-mds2
Started lustre-MDT0001
 16 UP osp lustre-MDT0001-osp-MDT0001 lustre-MDT0001-mdtlov_UUID 4
 conf-sanity test_130: @@@@@@ FAIL: Illegal OSP device created 
  Trace dump:
  = ./../tests/test-framework.sh:6221:error()
  = conf-sanity.sh:9259:test_130()
  = ./../tests/test-framework.sh:6524:run_one()
  = ./../tests/test-framework.sh:6571:run_one_logged()
  = ./../tests/test-framework.sh:6398:run_test()
  = conf-sanity.sh:9262:main()
Dumping lctl log to /tmp/test_logs/1628686868/conf-sanity.test_130.*.1628686959.log
Dumping logs only on local client.
Resetting fail_loc on all nodes...done.
FAIL 130 (30s)
[root@devvm1 tests]# 
Comment by Gerrit Updater [ 25/Aug/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44594/
Subject: LU-14928 mgs: allow md target re-register
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e4f3f47f04c762770bc36c1e3fa7e92e94a36704

Comment by Peter Jones [ 25/Aug/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:13:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.