[LU-14404] lustre-initialization fails with “auster : @@@@@@ FAIL: /usr/bin/lfs setquota -u quota_usr -b 13363604 -B 14031784 -i 838864 -I 880807 /mnt/lustre FAILED!” Created: 08/Feb/21  Updated: 01/Apr/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Sergey Cheremencev
Resolution: Unresolved Votes: 0
Labels: ppc
Environment:

PPC64 clients


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

lustre-initialization fails with the error message '"lustre-initialization timed out"'. So far, I’ve only seen this issue twice and only for PPC64 client testing.

Looking at the autotest log for a recent failure https://testing.whamcloud.com/test_sets/4579843f-9edb-47cc-9a0c-5f326eeee193, we see that we are failing setting quotas

2021-02-08T16:17:57 enable quota as required
2021-02-08T16:17:57 CMD: trevis-4vm2 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-MDT0000.quota_slave.enabled
2021-02-08T16:17:57 CMD: trevis-4vm1 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
2021-02-08T16:17:57 [HOST:trevis-77vm9.trevis.whamcloud.com] [old_mdt_qtype:none] [old_ost_qtype:none] [new_qtype:ug3]
2021-02-08T16:17:57 CMD: trevis-4vm2 /usr/sbin/lctl conf_param lustre.quota.mdt=ug3
2021-02-08T16:17:57 CMD: trevis-4vm2 /usr/sbin/lctl conf_param lustre.quota.ost=ug3
2021-02-08T16:17:57 Total disk size: 13362580  block-softlimit: 13363604 block-hardlimit: 14031784 inode-softlimit: 838864 inode-hardlimit: 880807
2021-02-08T16:17:57 Setting up quota on trevis-77vm9.trevis.whamcloud.com:/mnt/lustre for quota_usr...
2021-02-08T16:17:57 + /usr/bin/lfs setquota -u quota_usr -b 13363604 -B 14031784 -i 838864 -I 880807 /mnt/lustre
2021-02-08T16:17:57  auster : @@@@@@ FAIL: /usr/bin/lfs setquota -u quota_usr -b 13363604 -B 14031784 -i 838864 -I 880807 /mnt/lustre FAILED! 
2021-02-08T16:17:57   Trace dump:
2021-02-08T16:17:57   = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
2021-02-08T16:17:57   = /usr/lib64/lustre/tests/test-framework.sh:2302:setup_quota()
2021-02-08T16:17:57   = /usr/lib64/lustre/tests/test-framework.sh:5329:init_param_vars()
2021-02-08T16:17:57   = /usr/lib64/lustre/tests/test-framework.sh:5061:setupall()
2021-02-08T16:17:57   = auster:146:setup_if_needed()
2021-02-08T16:17:57   = auster:331:main()

Looking at the MDS (vm2) console log, we see the error

[  212.037292] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-MDT0000.quota_slave.enabled
[  212.889389] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.mdt=ug3
[  213.315973] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=ug3
[  213.527799] LustreError: 11349:0:(mdt_handler.c:2964:mdt_quotactl()) lustre-MDT0000: unsupported quotactl command 134250496: rc = -14
[  213.791766] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  auster : @@@@@@ FAIL: \/usr\/bin\/lfs setquota -u quota_usr -b 13363604 -B 14031784 -i 838864 -I 880807 \/mnt\/lustre FAILED! 
[  214.037691] Lustre: DEBUG MARKER: auster : @@@@@@ FAIL: /usr/bin/lfs setquota -u quota_usr -b 13363604 -B 14031784 -i 838864 -I 880807 /mnt/lustre FAILED!

On the client (vm9) console log, we see

[  314.690859] LustreError: 9025:0:(mdc_request.c:2039:mdc_quotactl()) lustre-MDT0000-mdc-c0000000b3f26800: ptlrpc_queue_wait failed: rc = -14
[  314.849768] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  auster : @@@@@@ FAIL: \/usr\/bin\/lfs setquota -u quota_usr -b 13363604 -B 14031784 -i 838864 -I 880807 \/mnt\/lustre FAILED! 
[  315.077149] Lustre: DEBUG MARKER: auster : @@@@@@ FAIL: /usr/bin/lfs setquota -u quota_usr -b 13363604 -B 14031784 -i 838864 -I 880807 /mnt/lustre FAILED!


 Comments   
Comment by James Nunez (Inactive) [ 10/Feb/21 ]

Sergey -
Would you please take a look at this failure and see if this could be related to any changes made for OST pools or if you understand what the issue is?

Thank you

Comment by Sergey Cheremencev [ 01/Apr/21 ]

Hello,

[  213.527799] LustreError: 11349:0:(mdt_handler.c:2964:mdt_quotactl()) lustre-MDT0000: unsupported quotactl command 134250496: rc = -14 

134250496 in hex is 0x8008000, while quota commands lay in an interval [0x800100-0x800014].
Wrong command id comes from the client. Somehow it relates to PPC64 client code.

Generated at Sat Feb 10 03:09:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.