[LU-8728] Fix conf-sanity:88 for the multiple MDS case Created: 19/Oct/16  Updated: 08/Dec/17  Resolved: 08/Dec/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Arshad Hussain Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Gantt End to Start
has to be done after LU-8727 Remove skip records from config file Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

start_mds call starts all MDSs. lctl clear_conf fails because it expects only one mds combined with mgs started with nosvc option. Only start_mdt call is to be used to start needed mds.



 Comments   
Comment by Gerrit Updater [ 19/Oct/16 ]

Arshad Hussain (arshad.hussain@seagate.com) uploaded a new patch: http://review.whamcloud.com/23246
Subject: LU-8728 tests: fix conf-sanity:88 for the multiple MDS case
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1f9c29af71015534d3c5c608de7b50b81fdad634

Comment by Arshad Hussain [ 19/Oct/16 ]

Test result on local: 88a

== conf-sanity test 88a: test lctl clear_conf fsname == 22:03:06 (1475944386)
Stopping clients: node1.domain /mnt/lustre (opts:)
Stopping clients: node1.domain /mnt/lustre2 (opts:)
Loading modules from /root/hpdd/lustre-wc/lustre/tests/..
detected 1 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=-1
subsystem_debug=all
../lnet/lnet/lnet options: 'networks=tcp0(eth1) accept=all'
gss/krb5 is not supported
quota/lquota options: 'hash_lqs_cur_bits=3'
Formatting mgs, mds, osts
Format mds1: /tmp/lustre-mdt1
Format ost1: /tmp/lustre-ost1
Format ost2: /tmp/lustre-ost2
start mds service on node1.domain
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1
Commit the device label on /tmp/lustre-mdt1
Started lustre-MDT0000
start ost1 service on node1.domain
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/lustre-ost1
Commit the device label on /tmp/lustre-ost1
Started lustre-OST0000
mount lustre on /mnt/lustre.....
Starting client: node1.domain:  -o user_xattr,flock node1.domain@tcp:/lustre /mnt/lustre
Setting lustre-MDT0000.mdd.atime_diff from 60 to 62
Waiting 90 secs for update
Updated after 2s: wanted '62' got '62'
Setting lustre-MDT0000.mdd.atime_diff from 62 to 63
Waiting 90 secs for update
Updated after 5s: wanted '63' got '63'
Setting lustre.llite.max_read_ahead_mb from 27.13 to 32
Waiting 90 secs for update
Updated after 8s: wanted '32' got '32'
Setting lustre.llite.max_read_ahead_mb from 32 to 64
Waiting 90 secs for update
Updated after 6s: wanted '64' got '64'
Pool lustre.pool1 created
OST lustre-OST0000_UUID added to pool lustre.pool1
OST lustre-OST0000_UUID removed from pool lustre.pool1
OST lustre-OST0000_UUID added to pool lustre.pool1
umount lustre on /mnt/lustre.....
Stopping client node1.domain /mnt/lustre (opts:)
stop ost1 service on node1.domain
Stopping /mnt/lustre-ost1 (opts:-f) on node1.domain
stop mds service on node1.domain
Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain
start mds service on node1.domain
Starting mds1: -o nosvc,loop  /tmp/lustre-mdt1 /mnt/lustre-mds1
Start /tmp/lustre-mdt1 without service
Started lustre-MDT0000
debugfs 1.42.13.wc3 (28-Aug-2015)
/tmp/lustre-mdt1: catastrophic mode - not reading inode or group bitmaps
stop mds service on node1.domain
Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain
debugfs 1.42.13.wc3 (28-Aug-2015)
/tmp/lustre-mdt1: catastrophic mode - not reading inode or group bitmaps
start mds service on node1.domain
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1
Started lustre-MDT0000
start ost1 service on node1.domain
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/lustre-ost1
Started lustre-OST0000
mount lustre on /mnt/lustre.....
Starting client: node1.domain:  -o user_xattr,flock node1.domain@tcp:/lustre /mnt/lustre
umount lustre on /mnt/lustre.....
Stopping client node1.domain /mnt/lustre (opts:)
stop ost1 service on node1.domain
Stopping /mnt/lustre-ost1 (opts:-f) on node1.domain
stop mds service on node1.domain
Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain
modules unloaded.
Stopping clients: node1.domain /mnt/lustre (opts:)
Stopping clients: node1.domain /mnt/lustre2 (opts:)
Loading modules from /root/hpdd/lustre-wc/lustre/tests/..
detected 1 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=-1
subsystem_debug=all
../lnet/lnet/lnet options: 'networks=tcp0(eth1) accept=all'
gss/krb5 is not supported
quota/lquota options: 'hash_lqs_cur_bits=3'
Formatting mgs, mds, osts
Format mds1: /tmp/lustre-mdt1
Format ost1: /tmp/lustre-ost1
Format ost2: /tmp/lustre-ost2
Resetting fail_loc on all nodes...done.
22:05:36 (1475944536) waiting for node1.domain network 5 secs ...
22:05:36 (1475944536) network interface is UP
PASS 88a (150s)
Stopping clients: node1.domain /mnt/lustre (opts:)
Stopping clients: node1.domain /mnt/lustre2 (opts:)
Loading modules from /root/hpdd/lustre-wc/lustre/tests/..
detected 1 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=-1
subsystem_debug=all
gss/krb5 is not supported
Formatting mgs, mds, osts
Format mds1: /tmp/lustre-mdt1
Format ost1: /tmp/lustre-ost1
Format ost2: /tmp/lustre-ost2
== conf-sanity test complete, duration 198 sec == 22:05:39 (1475944539)

Test result on local: 88b

== conf-sanity test 88b: test lctl clear_conf one config == 22:07:00 (1475944620)
Stopping clients: node1.domain /mnt/lustre (opts:)
Stopping clients: node1.domain /mnt/lustre2 (opts:)
Loading modules from /root/hpdd/lustre-wc/lustre/tests/..
detected 1 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=-1
subsystem_debug=all
../lnet/lnet/lnet options: 'networks=tcp0(eth1) accept=all'
gss/krb5 is not supported
quota/lquota options: 'hash_lqs_cur_bits=3'
Formatting mgs, mds, osts
Format mds1: /tmp/lustre-mdt1
Format ost1: /tmp/lustre-ost1
Format ost2: /tmp/lustre-ost2
start mds service on node1.domain
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1
Commit the device label on /tmp/lustre-mdt1
Started lustre-MDT0000
start ost1 service on node1.domain
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/lustre-ost1
Commit the device label on /tmp/lustre-ost1
Started lustre-OST0000
mount lustre on /mnt/lustre.....
Starting client: node1.domain:  -o user_xattr,flock node1.domain@tcp:/lustre /mnt/lustre
Setting lustre-MDT0000.mdd.atime_diff from 60 to 62
Waiting 90 secs for update
Updated after 6s: wanted '62' got '62'
Setting lustre-MDT0000.mdd.atime_diff from 62 to 63
Waiting 90 secs for update
Updated after 7s: wanted '63' got '63'
Setting lustre.llite.max_read_ahead_mb from 27.13 to 32
Waiting 90 secs for update
Updated after 7s: wanted '32' got '32'
Setting lustre.llite.max_read_ahead_mb from 32 to 64
Waiting 90 secs for update
Updated after 6s: wanted '64' got '64'
Pool lustre.pool1 created
OST lustre-OST0000_UUID added to pool lustre.pool1
OST lustre-OST0000_UUID removed from pool lustre.pool1
OST lustre-OST0000_UUID added to pool lustre.pool1
umount lustre on /mnt/lustre.....
Stopping client node1.domain /mnt/lustre (opts:)
stop ost1 service on node1.domain
Stopping /mnt/lustre-ost1 (opts:-f) on node1.domain
stop mds service on node1.domain
Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain
start mds service on node1.domain
Starting mds1: -o nosvc,loop  /tmp/lustre-mdt1 /mnt/lustre-mds1
Start /tmp/lustre-mdt1 without service
Started lustre-MDT0000
debugfs 1.42.13.wc3 (28-Aug-2015)
/tmp/lustre-mdt1: catastrophic mode - not reading inode or group bitmaps
stop mds service on node1.domain
Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain
debugfs 1.42.13.wc3 (28-Aug-2015)
/tmp/lustre-mdt1: catastrophic mode - not reading inode or group bitmaps
start mds service on node1.domain
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1
Started lustre-MDT0000
start ost1 service on node1.domain
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/lustre-ost1
Started lustre-OST0000
mount lustre on /mnt/lustre.....
Starting client: node1.domain:  -o user_xattr,flock node1.domain@tcp:/lustre /mnt/lustre
umount lustre on /mnt/lustre.....
Stopping client node1.domain /mnt/lustre (opts:)
stop ost1 service on node1.domain
Stopping /mnt/lustre-ost1 (opts:-f) on node1.domain
stop mds service on node1.domain
Stopping /mnt/lustre-mds1 (opts:-f) on node1.domain
modules unloaded.
Stopping clients: node1.domain /mnt/lustre (opts:)
Stopping clients: node1.domain /mnt/lustre2 (opts:)
Loading modules from /root/hpdd/lustre-wc/lustre/tests/..
detected 1 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=-1
subsystem_debug=all
../lnet/lnet/lnet options: 'networks=tcp0(eth1) accept=all'
gss/krb5 is not supported
quota/lquota options: 'hash_lqs_cur_bits=3'
Formatting mgs, mds, osts
Format mds1: /tmp/lustre-mdt1
Format ost1: /tmp/lustre-ost1
Format ost2: /tmp/lustre-ost2
Resetting fail_loc on all nodes...done.
22:09:30 (1475944770) waiting for node1.domain network 5 secs ...
22:09:30 (1475944770) network interface is UP
PASS 88b (151s)
Stopping clients: node1.domain /mnt/lustre (opts:)
Stopping clients: node1.domain /mnt/lustre2 (opts:)
Loading modules from /root/hpdd/lustre-wc/lustre/tests/..
detected 1 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=-1
subsystem_debug=all
gss/krb5 is not supported
Formatting mgs, mds, osts
Format mds1: /tmp/lustre-mdt1
Format ost1: /tmp/lustre-ost1
Format ost2: /tmp/lustre-ost2
== conf-sanity test complete, duration 190 sec == 22:09:33 (1475944773)
Comment by Andreas Dilger [ 08/Dec/17 ]

Closing this as a duplicate of LU-8727, since that patch introduces this problem and it should be fixed in that patch before it is landed rather than having a separate ticket for it.

Generated at Sat Feb 10 02:20:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.