[LU-14928] Allow MD target re-registered after writeconf Created: 11/Aug/21 Updated: 07/Dec/23 Resolved: 25/Aug/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Alexander Zarochentsev | Assignee: | Alexander Zarochentsev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
In a DNE system, it is not safe to do writeconf of a MD target and attempt to mount (and re-register) it again, as it creates weird MDT-MDT osp devices as "fsname-MDT0001-osp-MDT0001". But it would be nice to have such a possibility to fix a half-failed target registration, when MGS completes the registration process but the target fails with a timeout not knowing about registration success. |
| Comments |
| Comment by Gerrit Updater [ 11/Aug/21 ] |
|
"Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/44594 |
| Comment by Alexander Zarochentsev [ 11/Aug/21 ] |
|
here test conf-sanity 130 output w/o the fix: == conf-sanity test 130: re-register an MDT after writeconf ========================================== 16:02:19 (1628686939)
start mds service on devvm1
Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1
Started lustre-MDT0000
start mds service on devvm1
Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2
Started lustre-MDT0001
devvm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
devvm1: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
stop mds service on devvm1
Stopping /mnt/lustre-mds2 (opts:-f) on devvm1
checking for existing Lustre data: found
Read previous values:
Target: lustre-MDT0001
Index: 1
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x1
(MDT )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=192.168.56.101@tcp sys.timeout=20 mdt.identity_upcall=/work/git/lustre-wc-rel/lustre/tests/../utils/l_getidentity
Permanent disk data:
Target: lustre=MDT0001
Index: 1
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x101
(MDT writeconf )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=192.168.56.101@tcp sys.timeout=20 mdt.identity_upcall=/work/git/lustre-wc-rel/lustre/tests/../utils/l_getidentity
Writing CONFIGS/mountdata
start mds service on devvm1
Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2
Started lustre-MDT0001
16 UP osp lustre-MDT0001-osp-MDT0001 lustre-MDT0001-mdtlov_UUID 4
conf-sanity test_130: @@@@@@ FAIL: Illegal OSP device created
Trace dump:
= ./../tests/test-framework.sh:6221:error()
= conf-sanity.sh:9259:test_130()
= ./../tests/test-framework.sh:6524:run_one()
= ./../tests/test-framework.sh:6571:run_one_logged()
= ./../tests/test-framework.sh:6398:run_test()
= conf-sanity.sh:9262:main()
Dumping lctl log to /tmp/test_logs/1628686868/conf-sanity.test_130.*.1628686959.log
Dumping logs only on local client.
Resetting fail_loc on all nodes...done.
FAIL 130 (30s)
[root@devvm1 tests]#
|
| Comment by Gerrit Updater [ 25/Aug/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44594/ |
| Comment by Peter Jones [ 25/Aug/21 ] |
|
Landed for 2.15 |