Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14928

Allow MD target re-registered after writeconf

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      In a DNE system, it is not safe to do writeconf of a MD target and attempt to mount (and re-register) it again, as it creates weird MDT-MDT osp devices as "fsname-MDT0001-osp-MDT0001". But it would be nice to have such a possibility to fix a half-failed target registration, when MGS completes the registration process but the target fails with a timeout not knowing about registration success.

      Attachments

        Issue Links

          Activity

            [LU-14928] Allow MD target re-registered after writeconf
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44594/
            Subject: LU-14928 mgs: allow md target re-register
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e4f3f47f04c762770bc36c1e3fa7e92e94a36704

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44594/ Subject: LU-14928 mgs: allow md target re-register Project: fs/lustre-release Branch: master Current Patch Set: Commit: e4f3f47f04c762770bc36c1e3fa7e92e94a36704

            here test conf-sanity 130 output w/o the fix:

            == conf-sanity test 130: re-register an MDT after writeconf ========================================== 16:02:19 (1628686939)
            start mds service on devvm1
            Starting mds1: -o localrecov  /dev/mapper/mds1_flakey /mnt/lustre-mds1
            Started lustre-MDT0000
            start mds service on devvm1
            Starting mds2: -o localrecov  /dev/mapper/mds2_flakey /mnt/lustre-mds2
            Started lustre-MDT0001
            devvm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
            devvm1: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
            stop mds service on devvm1
            Stopping /mnt/lustre-mds2 (opts:-f) on devvm1
            checking for existing Lustre data: found
            
               Read previous values:
            Target:     lustre-MDT0001
            Index:      1
            Lustre FS:  lustre
            Mount type: ldiskfs
            Flags:      0x1
                          (MDT )
            Persistent mount opts: user_xattr,errors=remount-ro
            Parameters: mgsnode=192.168.56.101@tcp sys.timeout=20 mdt.identity_upcall=/work/git/lustre-wc-rel/lustre/tests/../utils/l_getidentity
            
            
               Permanent disk data:
            Target:     lustre=MDT0001
            Index:      1
            Lustre FS:  lustre
            Mount type: ldiskfs
            Flags:      0x101
                          (MDT writeconf )
            Persistent mount opts: user_xattr,errors=remount-ro
            Parameters: mgsnode=192.168.56.101@tcp sys.timeout=20 mdt.identity_upcall=/work/git/lustre-wc-rel/lustre/tests/../utils/l_getidentity
            
            Writing CONFIGS/mountdata
            start mds service on devvm1
            Starting mds2: -o localrecov  /dev/mapper/mds2_flakey /mnt/lustre-mds2
            Started lustre-MDT0001
             16 UP osp lustre-MDT0001-osp-MDT0001 lustre-MDT0001-mdtlov_UUID 4
             conf-sanity test_130: @@@@@@ FAIL: Illegal OSP device created 
              Trace dump:
              = ./../tests/test-framework.sh:6221:error()
              = conf-sanity.sh:9259:test_130()
              = ./../tests/test-framework.sh:6524:run_one()
              = ./../tests/test-framework.sh:6571:run_one_logged()
              = ./../tests/test-framework.sh:6398:run_test()
              = conf-sanity.sh:9262:main()
            Dumping lctl log to /tmp/test_logs/1628686868/conf-sanity.test_130.*.1628686959.log
            Dumping logs only on local client.
            Resetting fail_loc on all nodes...done.
            FAIL 130 (30s)
            [root@devvm1 tests]# 
            
            zam Alexander Zarochentsev added a comment - here test conf-sanity 130 output w/o the fix: == conf-sanity test 130: re-register an MDT after writeconf ========================================== 16:02:19 (1628686939) start mds service on devvm1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 Started lustre-MDT0000 start mds service on devvm1 Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 Started lustre-MDT0001 devvm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid devvm1: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid stop mds service on devvm1 Stopping /mnt/lustre-mds2 (opts:-f) on devvm1 checking for existing Lustre data: found Read previous values: Target: lustre-MDT0001 Index: 1 Lustre FS: lustre Mount type: ldiskfs Flags: 0x1 (MDT ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: mgsnode=192.168.56.101@tcp sys.timeout=20 mdt.identity_upcall=/work/git/lustre-wc-rel/lustre/tests/../utils/l_getidentity Permanent disk data: Target: lustre=MDT0001 Index: 1 Lustre FS: lustre Mount type: ldiskfs Flags: 0x101 (MDT writeconf ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: mgsnode=192.168.56.101@tcp sys.timeout=20 mdt.identity_upcall=/work/git/lustre-wc-rel/lustre/tests/../utils/l_getidentity Writing CONFIGS/mountdata start mds service on devvm1 Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 Started lustre-MDT0001 16 UP osp lustre-MDT0001-osp-MDT0001 lustre-MDT0001-mdtlov_UUID 4 conf-sanity test_130: @@@@@@ FAIL: Illegal OSP device created Trace dump: = ./../tests/test-framework.sh:6221:error() = conf-sanity.sh:9259:test_130() = ./../tests/test-framework.sh:6524:run_one() = ./../tests/test-framework.sh:6571:run_one_logged() = ./../tests/test-framework.sh:6398:run_test() = conf-sanity.sh:9262:main() Dumping lctl log to /tmp/test_logs/1628686868/conf-sanity.test_130.*.1628686959.log Dumping logs only on local client. Resetting fail_loc on all nodes...done. FAIL 130 (30s) [root@devvm1 tests]#

            "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/44594
            Subject: LU-14928 mgs: allow md target re-register
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 7223b7e6fb6925af05cade3699f73874fa5f4751

            gerrit Gerrit Updater added a comment - "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/44594 Subject: LU-14928 mgs: allow md target re-register Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7223b7e6fb6925af05cade3699f73874fa5f4751

            People

              zam Alexander Zarochentsev
              zam Alexander Zarochentsev
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: