[LU-12777] conf-sanity test 103 fails with ‘set mdt quota type failed’ Created: 17/Sep/19  Updated: 08/Oct/19  Resolved: 04/Oct/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Wang Shilong (Inactive)
Resolution: Fixed Votes: 0
Labels: ZFS, rhel8
Environment:

ZFS with RHEL 8 clients


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

conf-sanity test_103 fails with ‘set mdt quota type failed’ for ZFS with RHEL 8.0 clients only and started on 14-JULY-2019.

Looking at the client test_log, we see

mount mylustre on /mnt/lustre.....
Starting client: trevis-17vm1.trevis.whamcloud.com:  -o user_xattr,flock trevis-17vm4@tcp:/mylustre /mnt/lustre
CMD: trevis-17vm1.trevis.whamcloud.com mkdir -p /mnt/lustre
CMD: trevis-17vm1.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-17vm4@tcp:/mylustre /mnt/lustre
mount.lustre: mount trevis-17vm4@tcp:/mylustre at /mnt/lustre failed: Input/output error
Is the MGS running?
Starting client trevis-17vm1.trevis.whamcloud.com,trevis-17vm2:  -o user_xattr,flock trevis-17vm4@tcp:/mylustre /mnt/lustre
CMD: trevis-17vm1.trevis.whamcloud.com,trevis-17vm2 mkdir -p /mnt/lustre
CMD: trevis-17vm1.trevis.whamcloud.com,trevis-17vm2 
running=\$(mount | grep -c /mnt/lustre' ');
rc=0;
if [ \$running -eq 0 ] ; then
	mkdir -p /mnt/lustre;
	mount -t lustre  -o user_xattr,flock trevis-17vm4@tcp:/mylustre /mnt/lustre;
	rc=\$?;
fi;
exit \$rc
trevis-17vm2: mount.lustre: mount trevis-17vm4@tcp:/mylustre at /mnt/lustre failed: Input/output error
trevis-17vm2: Is the MGS running?
pdsh@trevis-17vm1: trevis-17vm2: ssh exited with exit code 5
trevis-17vm1: mount.lustre: mount trevis-17vm4@tcp:/mylustre at /mnt/lustre failed: Input/output error
trevis-17vm1: Is the MGS running?
pdsh@trevis-17vm1: trevis-17vm1: ssh exited with exit code 5
CMD: trevis-17vm4 lctl get_param -n timeout
Using TIMEOUT=20
CMD: trevis-17vm4 lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
CMD: trevis-17vm1.trevis.whamcloud.com lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
error: set_param: param_path 'osc/*/idle_timeout': No such file or directory
error: get_param: param_path 'mdc/*/connect_flags': No such file or directory
jobstats not supported by server
enable quota as required
CMD: trevis-17vm4 /usr/sbin/lctl get_param -n osd-zfs.lustre-MDT0000.quota_slave.enabled
trevis-17vm4: error: get_param: param_path 'osd-zfs/lustre-MDT0000/quota_slave/enabled': No such file or directory
pdsh@trevis-17vm1: trevis-17vm4: ssh exited with exit code 2
CMD: trevis-17vm3 /usr/sbin/lctl get_param -n osd-zfs.lustre-OST0000.quota_slave.enabled
trevis-17vm3: error: get_param: param_path 'osd-zfs/lustre-OST0000/quota_slave/enabled': No such file or directory
pdsh@trevis-17vm1: trevis-17vm3: ssh exited with exit code 2
[HOST:trevis-17vm1.trevis.whamcloud.com] [old_mdt_qtype:] [old_ost_qtype:] [new_qtype:ug3]
CMD: trevis-17vm4 /usr/sbin/lctl conf_param mylustre.quota.mdt=ug3
trevis-17vm4: No device found for name MGS: Invalid argument
trevis-17vm4: This command must be run on the MGS.
trevis-17vm4: error: conf_param: No such device
pdsh@trevis-17vm1: trevis-17vm4: ssh exited with exit code 19
 conf-sanity test_103: @@@@@@ FAIL: set mdt quota type failed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6115:error()
  = /usr/lib64/lustre/tests/test-framework.sh:2209:setup_quota()
  = /usr/lib64/lustre/tests/test-framework.sh:5188:init_param_vars()
  = /usr/lib64/lustre/tests/test-framework.sh:4920:setupall()
  = /usr/lib64/lustre/tests/conf-sanity.sh:7512:test_103()

IN the console for client 1 (vm1), we see a couple of errors

[63663.392258] LustreError: 29000:0:(mgc_request.c:250:do_config_log_add()) MGC10.9.4.201@tcp: failed processing log, type 1: rc = -5
[63670.752274] LustreError: 29006:0:(mgc_request.c:598:do_requeue()) failed processing log: -5
[63674.656290] LustreError: 15c-8: MGC10.9.4.201@tcp: Confguration from log mylustre-client failed from MGS -5. Communication error between node & MGS, a bad configuration, or other errors. See syslog for more info
[63674.659682] Lustre: Unmounted mylustre-client
[63674.660992] LustreError: 29000:0:(obd_mount.c:1669:lustre_fill_super()) Unable to mount  (-5)
[63674.942608] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre
[63675.766879] Lustre: DEBUG MARKER: 
[63675.766879] running=$(mount | grep -c /mnt/lustre' ');
[63675.766879] rc=0;
[63675.766879] if [ $running -eq 0 ] ; then
[63675.766879] 	mkdir -p /mnt/lustre;
[63675.766879] 	mount -t lustre  -o user_xattr,flock trevis-17vm4@tcp:/mylustre /mnt/lustre;
[63675.766879] 	rc=$?;
[63675.766879] fi;
[63675.766879] exit $rc
[63682.336278] LustreError: 29245:0:(mgc_request.c:250:do_config_log_add()) MGC10.9.4.201@tcp: failed processing log, type 1: rc = -5
[63692.256268] LustreError: 29250:0:(mgc_request.c:598:do_requeue()) failed processing log: -5
[63693.600289] LustreError: 15c-8: MGC10.9.4.201@tcp: Confguration from log mylustre-client failed from MGS -5. Communication error between node & MGS, a bad configuration, or other errors. See syslog for more info
[63693.603727] Lustre: Unmounted mylustre-client
[63693.604849] LustreError: 29245:0:(obd_mount.c:1669:lustre_fill_super()) Unable to mount  (-5)
[63696.068194] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20
[63696.538630] Lustre: DEBUG MARKER: Using TIMEOUT=20
[63697.213487] Lustre: DEBUG MARKER: lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
[63699.496142] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_103: @@@@@@ FAIL: set mdt quota type failed 

Logs for conf-sanity test 103 failures are at
https://testing.whamcloud.com/test_sets/513fcf0c-d708-11e9-98c8-52540065bddc
https://testing.whamcloud.com/test_sets/b4dd0e36-a705-11e9-861b-52540065bddc
https://testing.whamcloud.com/test_sets/7683ebf6-d2b5-11e9-9fc9-52540065bddc
https://testing.whamcloud.com/test_sets/61f82b1a-d59f-11e9-90ad-52540065bddc



 Comments   
Comment by Peter Jones [ 18/Sep/19 ]

Shilong

Could you please investigate?

Peter

Comment by Gerrit Updater [ 26/Sep/19 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/36298
Subject: LU-12777 test: fix to pass facet to facet_fstype
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b057a79f41ed14a701da09706a7fbc065bb1c7dc

Comment by Gerrit Updater [ 04/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36298/
Subject: LU-12777 test: fix to pass facet to facet_fstype
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 38c8fdfde3953f239bd3d86a91a3213737231ce5

Comment by Peter Jones [ 04/Oct/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 04/Oct/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36379
Subject: LU-12777 test: fix to pass facet to facet_fstype
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 713fa93b816efba5b3bbfb357013cb079d97ad13

Comment by Gerrit Updater [ 08/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36379/
Subject: LU-12777 test: fix to pass facet to facet_fstype
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 26fc6fa27755943e9dbcc20260ca4d3006f05143

Generated at Sat Feb 10 02:55:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.