Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8728

Fix conf-sanity:88 for the multiple MDS case

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      start_mds call starts all MDSs. lctl clear_conf fails because it expects only one mds combined with mgs started with nosvc option. Only start_mdt call is to be used to start needed mds.

      Attachments

        Issue Links

          Activity

            [LU-8728] Fix conf-sanity:88 for the multiple MDS case

            Closing this as a duplicate of LU-8727, since that patch introduces this problem and it should be fixed in that patch before it is landed rather than having a separate ticket for it.

            adilger Andreas Dilger added a comment - Closing this as a duplicate of LU-8727 , since that patch introduces this problem and it should be fixed in that patch before it is landed rather than having a separate ticket for it.

            Test result on local: 88a

            == conf-sanity test 88a: test lctl clear_conf fsname == 22:03:06 (1475944386)
            Stopping clients: node1.domain /mnt/lustre (opts:)
            Stopping clients: node1.domain /mnt/lustre2 (opts:)
            Loading modules from /root/hpdd/lustre-wc/lustre/tests/..
            detected 1 online CPUs by sysfs
            libcfs will create CPU partition based on online CPUs
            debug=-1
            subsystem_debug=all
            ../lnet/lnet/lnet options: 'networks=tcp0(eth1) accept=all'
            gss/krb5 is not supported
            quota/lquota options: 'hash_lqs_cur_bits=3'
            Formatting mgs, mds, osts
            Format mds1: /tmp/lustre-mdt1
            Format ost1: /tmp/lustre-ost1
            Format ost2: /tmp/lustre-ost2
            start mds service on node1.domain
            Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1
            Commit the device label on /tmp/lustre-mdt1
            Started lustre-MDT0000
            start ost1 service on node1.domain
            Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/lustre-ost1
            Commit the device label on /tmp/lustre-ost1
            Started lustre-OST0000
            mount lustre on /mnt/lustre.....
            Starting client: node1.domain:  -o user_xattr,flock node1.domain@tcp:/lustre /mnt/lustre
            Setting lustre-MDT0000.mdd.atime_diff from 60 to 62
            Waiting 90 secs for update
            Updated after 2s: wanted '62' got '62'
            Setting lustre-MDT0000.mdd.atime_diff from 62 to 63
            Waiting 90 secs for update
            Updated after 5s: wanted '63' got '63'
            Setting lustre.llite.max_read_ahead_mb from 27.13 to 32
            Waiting 90 secs for update
            Updated after 8s: wanted '32' got '32'
            Setting lustre.llite.max_read_ahead_mb from 32 to 64
            Waiting 90 secs for update
            Updated after 6s: wanted '64' got '64'
            Pool lustre.pool1 created
            OST lustre-OST0000_UUID added to pool lustre.pool1
            OST lustre-OST0000_UUID removed from pool lustre.pool1
            OST lustre-OST0000_UUID added to pool lustre.pool1
            umount lustre on /mnt/lustre.....
            Stopping client node1.domain /mnt/lustre (opts:)
            stop ost1 service on node1.domain
            Stopping /mnt/lustre-ost1 (opts:-f) on node1.domain
            stop mds service on node1.domain
            Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain
            start mds service on node1.domain
            Starting mds1: -o nosvc,loop  /tmp/lustre-mdt1 /mnt/lustre-mds1
            Start /tmp/lustre-mdt1 without service
            Started lustre-MDT0000
            debugfs 1.42.13.wc3 (28-Aug-2015)
            /tmp/lustre-mdt1: catastrophic mode - not reading inode or group bitmaps
            stop mds service on node1.domain
            Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain
            debugfs 1.42.13.wc3 (28-Aug-2015)
            /tmp/lustre-mdt1: catastrophic mode - not reading inode or group bitmaps
            start mds service on node1.domain
            Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1
            Started lustre-MDT0000
            start ost1 service on node1.domain
            Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/lustre-ost1
            Started lustre-OST0000
            mount lustre on /mnt/lustre.....
            Starting client: node1.domain:  -o user_xattr,flock node1.domain@tcp:/lustre /mnt/lustre
            umount lustre on /mnt/lustre.....
            Stopping client node1.domain /mnt/lustre (opts:)
            stop ost1 service on node1.domain
            Stopping /mnt/lustre-ost1 (opts:-f) on node1.domain
            stop mds service on node1.domain
            Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain
            modules unloaded.
            Stopping clients: node1.domain /mnt/lustre (opts:)
            Stopping clients: node1.domain /mnt/lustre2 (opts:)
            Loading modules from /root/hpdd/lustre-wc/lustre/tests/..
            detected 1 online CPUs by sysfs
            libcfs will create CPU partition based on online CPUs
            debug=-1
            subsystem_debug=all
            ../lnet/lnet/lnet options: 'networks=tcp0(eth1) accept=all'
            gss/krb5 is not supported
            quota/lquota options: 'hash_lqs_cur_bits=3'
            Formatting mgs, mds, osts
            Format mds1: /tmp/lustre-mdt1
            Format ost1: /tmp/lustre-ost1
            Format ost2: /tmp/lustre-ost2
            Resetting fail_loc on all nodes...done.
            22:05:36 (1475944536) waiting for node1.domain network 5 secs ...
            22:05:36 (1475944536) network interface is UP
            PASS 88a (150s)
            Stopping clients: node1.domain /mnt/lustre (opts:)
            Stopping clients: node1.domain /mnt/lustre2 (opts:)
            Loading modules from /root/hpdd/lustre-wc/lustre/tests/..
            detected 1 online CPUs by sysfs
            libcfs will create CPU partition based on online CPUs
            debug=-1
            subsystem_debug=all
            gss/krb5 is not supported
            Formatting mgs, mds, osts
            Format mds1: /tmp/lustre-mdt1
            Format ost1: /tmp/lustre-ost1
            Format ost2: /tmp/lustre-ost2
            == conf-sanity test complete, duration 198 sec == 22:05:39 (1475944539)
            

            Test result on local: 88b

            == conf-sanity test 88b: test lctl clear_conf one config == 22:07:00 (1475944620)
            Stopping clients: node1.domain /mnt/lustre (opts:)
            Stopping clients: node1.domain /mnt/lustre2 (opts:)
            Loading modules from /root/hpdd/lustre-wc/lustre/tests/..
            detected 1 online CPUs by sysfs
            libcfs will create CPU partition based on online CPUs
            debug=-1
            subsystem_debug=all
            ../lnet/lnet/lnet options: 'networks=tcp0(eth1) accept=all'
            gss/krb5 is not supported
            quota/lquota options: 'hash_lqs_cur_bits=3'
            Formatting mgs, mds, osts
            Format mds1: /tmp/lustre-mdt1
            Format ost1: /tmp/lustre-ost1
            Format ost2: /tmp/lustre-ost2
            start mds service on node1.domain
            Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1
            Commit the device label on /tmp/lustre-mdt1
            Started lustre-MDT0000
            start ost1 service on node1.domain
            Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/lustre-ost1
            Commit the device label on /tmp/lustre-ost1
            Started lustre-OST0000
            mount lustre on /mnt/lustre.....
            Starting client: node1.domain:  -o user_xattr,flock node1.domain@tcp:/lustre /mnt/lustre
            Setting lustre-MDT0000.mdd.atime_diff from 60 to 62
            Waiting 90 secs for update
            Updated after 6s: wanted '62' got '62'
            Setting lustre-MDT0000.mdd.atime_diff from 62 to 63
            Waiting 90 secs for update
            Updated after 7s: wanted '63' got '63'
            Setting lustre.llite.max_read_ahead_mb from 27.13 to 32
            Waiting 90 secs for update
            Updated after 7s: wanted '32' got '32'
            Setting lustre.llite.max_read_ahead_mb from 32 to 64
            Waiting 90 secs for update
            Updated after 6s: wanted '64' got '64'
            Pool lustre.pool1 created
            OST lustre-OST0000_UUID added to pool lustre.pool1
            OST lustre-OST0000_UUID removed from pool lustre.pool1
            OST lustre-OST0000_UUID added to pool lustre.pool1
            umount lustre on /mnt/lustre.....
            Stopping client node1.domain /mnt/lustre (opts:)
            stop ost1 service on node1.domain
            Stopping /mnt/lustre-ost1 (opts:-f) on node1.domain
            stop mds service on node1.domain
            Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain
            start mds service on node1.domain
            Starting mds1: -o nosvc,loop  /tmp/lustre-mdt1 /mnt/lustre-mds1
            Start /tmp/lustre-mdt1 without service
            Started lustre-MDT0000
            debugfs 1.42.13.wc3 (28-Aug-2015)
            /tmp/lustre-mdt1: catastrophic mode - not reading inode or group bitmaps
            stop mds service on node1.domain
            Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain
            debugfs 1.42.13.wc3 (28-Aug-2015)
            /tmp/lustre-mdt1: catastrophic mode - not reading inode or group bitmaps
            start mds service on node1.domain
            Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1
            Started lustre-MDT0000
            start ost1 service on node1.domain
            Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/lustre-ost1
            Started lustre-OST0000
            mount lustre on /mnt/lustre.....
            Starting client: node1.domain:  -o user_xattr,flock node1.domain@tcp:/lustre /mnt/lustre
            umount lustre on /mnt/lustre.....
            Stopping client node1.domain /mnt/lustre (opts:)
            stop ost1 service on node1.domain
            Stopping /mnt/lustre-ost1 (opts:-f) on node1.domain
            stop mds service on node1.domain
            Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain
            modules unloaded.
            Stopping clients: node1.domain /mnt/lustre (opts:)
            Stopping clients: node1.domain /mnt/lustre2 (opts:)
            Loading modules from /root/hpdd/lustre-wc/lustre/tests/..
            detected 1 online CPUs by sysfs
            libcfs will create CPU partition based on online CPUs
            debug=-1
            subsystem_debug=all
            ../lnet/lnet/lnet options: 'networks=tcp0(eth1) accept=all'
            gss/krb5 is not supported
            quota/lquota options: 'hash_lqs_cur_bits=3'
            Formatting mgs, mds, osts
            Format mds1: /tmp/lustre-mdt1
            Format ost1: /tmp/lustre-ost1
            Format ost2: /tmp/lustre-ost2
            Resetting fail_loc on all nodes...done.
            22:09:30 (1475944770) waiting for node1.domain network 5 secs ...
            22:09:30 (1475944770) network interface is UP
            PASS 88b (151s)
            Stopping clients: node1.domain /mnt/lustre (opts:)
            Stopping clients: node1.domain /mnt/lustre2 (opts:)
            Loading modules from /root/hpdd/lustre-wc/lustre/tests/..
            detected 1 online CPUs by sysfs
            libcfs will create CPU partition based on online CPUs
            debug=-1
            subsystem_debug=all
            gss/krb5 is not supported
            Formatting mgs, mds, osts
            Format mds1: /tmp/lustre-mdt1
            Format ost1: /tmp/lustre-ost1
            Format ost2: /tmp/lustre-ost2
            == conf-sanity test complete, duration 190 sec == 22:09:33 (1475944773)
            
            arshad512 Arshad Hussain added a comment - Test result on local: 88a == conf-sanity test 88a: test lctl clear_conf fsname == 22:03:06 (1475944386) Stopping clients: node1.domain /mnt/lustre (opts:) Stopping clients: node1.domain /mnt/lustre2 (opts:) Loading modules from /root/hpdd/lustre-wc/lustre/tests/.. detected 1 online CPUs by sysfs libcfs will create CPU partition based on online CPUs debug=-1 subsystem_debug=all ../lnet/lnet/lnet options: 'networks=tcp0(eth1) accept=all' gss/krb5 is not supported quota/lquota options: 'hash_lqs_cur_bits=3' Formatting mgs, mds, osts Format mds1: /tmp/lustre-mdt1 Format ost1: /tmp/lustre-ost1 Format ost2: /tmp/lustre-ost2 start mds service on node1.domain Starting mds1: -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1 Commit the device label on /tmp/lustre-mdt1 Started lustre-MDT0000 start ost1 service on node1.domain Starting ost1: -o loop /tmp/lustre-ost1 /mnt/lustre-ost1 Commit the device label on /tmp/lustre-ost1 Started lustre-OST0000 mount lustre on /mnt/lustre..... Starting client: node1.domain: -o user_xattr,flock node1.domain@tcp:/lustre /mnt/lustre Setting lustre-MDT0000.mdd.atime_diff from 60 to 62 Waiting 90 secs for update Updated after 2s: wanted '62' got '62' Setting lustre-MDT0000.mdd.atime_diff from 62 to 63 Waiting 90 secs for update Updated after 5s: wanted '63' got '63' Setting lustre.llite.max_read_ahead_mb from 27.13 to 32 Waiting 90 secs for update Updated after 8s: wanted '32' got '32' Setting lustre.llite.max_read_ahead_mb from 32 to 64 Waiting 90 secs for update Updated after 6s: wanted '64' got '64' Pool lustre.pool1 created OST lustre-OST0000_UUID added to pool lustre.pool1 OST lustre-OST0000_UUID removed from pool lustre.pool1 OST lustre-OST0000_UUID added to pool lustre.pool1 umount lustre on /mnt/lustre..... Stopping client node1.domain /mnt/lustre (opts:) stop ost1 service on node1.domain Stopping /mnt/lustre-ost1 (opts:-f) on node1.domain stop mds service on node1.domain Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain start mds service on node1.domain Starting mds1: -o nosvc,loop /tmp/lustre-mdt1 /mnt/lustre-mds1 Start /tmp/lustre-mdt1 without service Started lustre-MDT0000 debugfs 1.42.13.wc3 (28-Aug-2015) /tmp/lustre-mdt1: catastrophic mode - not reading inode or group bitmaps stop mds service on node1.domain Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain debugfs 1.42.13.wc3 (28-Aug-2015) /tmp/lustre-mdt1: catastrophic mode - not reading inode or group bitmaps start mds service on node1.domain Starting mds1: -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1 Started lustre-MDT0000 start ost1 service on node1.domain Starting ost1: -o loop /tmp/lustre-ost1 /mnt/lustre-ost1 Started lustre-OST0000 mount lustre on /mnt/lustre..... Starting client: node1.domain: -o user_xattr,flock node1.domain@tcp:/lustre /mnt/lustre umount lustre on /mnt/lustre..... Stopping client node1.domain /mnt/lustre (opts:) stop ost1 service on node1.domain Stopping /mnt/lustre-ost1 (opts:-f) on node1.domain stop mds service on node1.domain Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain modules unloaded. Stopping clients: node1.domain /mnt/lustre (opts:) Stopping clients: node1.domain /mnt/lustre2 (opts:) Loading modules from /root/hpdd/lustre-wc/lustre/tests/.. detected 1 online CPUs by sysfs libcfs will create CPU partition based on online CPUs debug=-1 subsystem_debug=all ../lnet/lnet/lnet options: 'networks=tcp0(eth1) accept=all' gss/krb5 is not supported quota/lquota options: 'hash_lqs_cur_bits=3' Formatting mgs, mds, osts Format mds1: /tmp/lustre-mdt1 Format ost1: /tmp/lustre-ost1 Format ost2: /tmp/lustre-ost2 Resetting fail_loc on all nodes...done. 22:05:36 (1475944536) waiting for node1.domain network 5 secs ... 22:05:36 (1475944536) network interface is UP PASS 88a (150s) Stopping clients: node1.domain /mnt/lustre (opts:) Stopping clients: node1.domain /mnt/lustre2 (opts:) Loading modules from /root/hpdd/lustre-wc/lustre/tests/.. detected 1 online CPUs by sysfs libcfs will create CPU partition based on online CPUs debug=-1 subsystem_debug=all gss/krb5 is not supported Formatting mgs, mds, osts Format mds1: /tmp/lustre-mdt1 Format ost1: /tmp/lustre-ost1 Format ost2: /tmp/lustre-ost2 == conf-sanity test complete, duration 198 sec == 22:05:39 (1475944539) Test result on local: 88b == conf-sanity test 88b: test lctl clear_conf one config == 22:07:00 (1475944620) Stopping clients: node1.domain /mnt/lustre (opts:) Stopping clients: node1.domain /mnt/lustre2 (opts:) Loading modules from /root/hpdd/lustre-wc/lustre/tests/.. detected 1 online CPUs by sysfs libcfs will create CPU partition based on online CPUs debug=-1 subsystem_debug=all ../lnet/lnet/lnet options: 'networks=tcp0(eth1) accept=all' gss/krb5 is not supported quota/lquota options: 'hash_lqs_cur_bits=3' Formatting mgs, mds, osts Format mds1: /tmp/lustre-mdt1 Format ost1: /tmp/lustre-ost1 Format ost2: /tmp/lustre-ost2 start mds service on node1.domain Starting mds1: -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1 Commit the device label on /tmp/lustre-mdt1 Started lustre-MDT0000 start ost1 service on node1.domain Starting ost1: -o loop /tmp/lustre-ost1 /mnt/lustre-ost1 Commit the device label on /tmp/lustre-ost1 Started lustre-OST0000 mount lustre on /mnt/lustre..... Starting client: node1.domain: -o user_xattr,flock node1.domain@tcp:/lustre /mnt/lustre Setting lustre-MDT0000.mdd.atime_diff from 60 to 62 Waiting 90 secs for update Updated after 6s: wanted '62' got '62' Setting lustre-MDT0000.mdd.atime_diff from 62 to 63 Waiting 90 secs for update Updated after 7s: wanted '63' got '63' Setting lustre.llite.max_read_ahead_mb from 27.13 to 32 Waiting 90 secs for update Updated after 7s: wanted '32' got '32' Setting lustre.llite.max_read_ahead_mb from 32 to 64 Waiting 90 secs for update Updated after 6s: wanted '64' got '64' Pool lustre.pool1 created OST lustre-OST0000_UUID added to pool lustre.pool1 OST lustre-OST0000_UUID removed from pool lustre.pool1 OST lustre-OST0000_UUID added to pool lustre.pool1 umount lustre on /mnt/lustre..... Stopping client node1.domain /mnt/lustre (opts:) stop ost1 service on node1.domain Stopping /mnt/lustre-ost1 (opts:-f) on node1.domain stop mds service on node1.domain Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain start mds service on node1.domain Starting mds1: -o nosvc,loop /tmp/lustre-mdt1 /mnt/lustre-mds1 Start /tmp/lustre-mdt1 without service Started lustre-MDT0000 debugfs 1.42.13.wc3 (28-Aug-2015) /tmp/lustre-mdt1: catastrophic mode - not reading inode or group bitmaps stop mds service on node1.domain Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain debugfs 1.42.13.wc3 (28-Aug-2015) /tmp/lustre-mdt1: catastrophic mode - not reading inode or group bitmaps start mds service on node1.domain Starting mds1: -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1 Started lustre-MDT0000 start ost1 service on node1.domain Starting ost1: -o loop /tmp/lustre-ost1 /mnt/lustre-ost1 Started lustre-OST0000 mount lustre on /mnt/lustre..... Starting client: node1.domain: -o user_xattr,flock node1.domain@tcp:/lustre /mnt/lustre umount lustre on /mnt/lustre..... Stopping client node1.domain /mnt/lustre (opts:) stop ost1 service on node1.domain Stopping /mnt/lustre-ost1 (opts:-f) on node1.domain stop mds service on node1.domain Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain modules unloaded. Stopping clients: node1.domain /mnt/lustre (opts:) Stopping clients: node1.domain /mnt/lustre2 (opts:) Loading modules from /root/hpdd/lustre-wc/lustre/tests/.. detected 1 online CPUs by sysfs libcfs will create CPU partition based on online CPUs debug=-1 subsystem_debug=all ../lnet/lnet/lnet options: 'networks=tcp0(eth1) accept=all' gss/krb5 is not supported quota/lquota options: 'hash_lqs_cur_bits=3' Formatting mgs, mds, osts Format mds1: /tmp/lustre-mdt1 Format ost1: /tmp/lustre-ost1 Format ost2: /tmp/lustre-ost2 Resetting fail_loc on all nodes...done. 22:09:30 (1475944770) waiting for node1.domain network 5 secs ... 22:09:30 (1475944770) network interface is UP PASS 88b (151s) Stopping clients: node1.domain /mnt/lustre (opts:) Stopping clients: node1.domain /mnt/lustre2 (opts:) Loading modules from /root/hpdd/lustre-wc/lustre/tests/.. detected 1 online CPUs by sysfs libcfs will create CPU partition based on online CPUs debug=-1 subsystem_debug=all gss/krb5 is not supported Formatting mgs, mds, osts Format mds1: /tmp/lustre-mdt1 Format ost1: /tmp/lustre-ost1 Format ost2: /tmp/lustre-ost2 == conf-sanity test complete, duration 190 sec == 22:09:33 (1475944773)

            Arshad Hussain (arshad.hussain@seagate.com) uploaded a new patch: http://review.whamcloud.com/23246
            Subject: LU-8728 tests: fix conf-sanity:88 for the multiple MDS case
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 1f9c29af71015534d3c5c608de7b50b81fdad634

            gerrit Gerrit Updater added a comment - Arshad Hussain (arshad.hussain@seagate.com) uploaded a new patch: http://review.whamcloud.com/23246 Subject: LU-8728 tests: fix conf-sanity:88 for the multiple MDS case Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1f9c29af71015534d3c5c608de7b50b81fdad634

            People

              wc-triage WC Triage
              arshad512 Arshad Hussain
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: