[LU-8340] sanity-sec test_25: /usr/bin/lfs setquota -u quota_usr -b 13761540 -B 14449617 -i 916168 -I 961976 /mnt/lustre FAILED! Created: 28/Jun/16  Updated: 05/Oct/16  Resolved: 05/Oct/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Kit Westneat
Resolution: Fixed Votes: 0
Labels: None
Environment:

Full - EL7.2 Server/EL7.2 Client
master, build# 3399


Issue Links:
Related
is related to LU-3291 IU UID/GID Mapping Feature Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/12ba69e2-3a6a-11e6-a0ce-5254006e85c2.

The sub-test test_25 failed with the following error:

/usr/bin/lfs setquota -u quota_usr -b 13761540 -B 14449617 -i 916168 -I 961976 /mnt/lustre FAILED!

test log:

Started clients trevis-48vm1.trevis.hpdd.intel.com,trevis-48vm2: 
CMD: trevis-48vm1.trevis.hpdd.intel.com,trevis-48vm2 mount | grep /mnt/lustre' '
10.9.6.52@tcp:/lustre on /mnt/lustre type lustre (rw,flock,user_xattr)
10.9.6.52@tcp:/lustre on /mnt/lustre type lustre (rw,flock,user_xattr)
CMD: trevis-48vm2 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests//usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all -lnet -lnd -pinger\" 4 
CMD: trevis-48vm7 lctl get_param -n timeout
Using TIMEOUT=20
CMD: trevis-48vm7 lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
CMD: trevis-48vm1.trevis.hpdd.intel.com lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
enable quota as required
CMD: trevis-48vm7 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-MDT0000.quota_slave.enabled
CMD: trevis-48vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
[HOST:trevis-48vm1.trevis.hpdd.intel.com] [old_mdt_qtype:ug] [old_ost_qtype:ug] [new_qtype:ug3]
CMD: trevis-48vm7 /usr/sbin/lctl conf_param lustre.quota.mdt=ug3
CMD: trevis-48vm7 /usr/sbin/lctl conf_param lustre.quota.ost=ug3
Total disk size: 13760516  block-softlimit: 13761540 block-hardlimit: 14449617 inode-softlimit: 916168 inode-hardlimit: 961976
Setting up quota on trevis-48vm1.trevis.hpdd.intel.com:/mnt/lustre for quota_usr...
+ /usr/bin/lfs setquota -u quota_usr -b 13761540 -B 14449617 -i 916168 -I 961976 /mnt/lustre
setquota failed: Operation not permitted
 sanity-sec test_25: @@@@@@ FAIL: /usr/bin/lfs setquota -u quota_usr -b 13761540 -B 14449617 -i 916168 -I 961976 /mnt/lustre FAILED! 

This issue was first seen on 06/19/2016 and since then it has occurred around 22 times till now (i.e. in 10 days, 22 times).



 Comments   
Comment by Niu Yawei (Inactive) [ 08/Jul/16 ]

The setquota failed for -EPERM, I think it's caused by nodemap permission check (nodemap_can_setquota()), but I don't see why the "nmf_allow_root_access" wasn't set properly in this case. Perhaps we need some nodemap expert to take a look.

Comment by Joseph Gmitter (Inactive) [ 12/Jul/16 ]

Hi Kit,

Would you be able to look at this issue? Our engineer believes it is a nodemap permission check issue. Can you please advise?

Thanks!
Joe

Comment by Jian Yu [ 27/Jul/16 ]

More failure instances on master branch:
https://testing.hpdd.intel.com/test_sets/49a487de-541f-11e6-88a7-5254006e85c2
https://testing.hpdd.intel.com/test_sets/808f2faa-524f-11e6-bf87-5254006e85c2
https://testing.hpdd.intel.com/test_sets/939e46d6-5037-11e6-bf87-5254006e85c2

Comment by Peter Jones [ 12/Aug/16 ]

Any update Kit? Do you think that this is something we ought to fix for 2.9 or could it wait for a future release?

Comment by Kit Westneat [ 12/Aug/16 ]

Hi Peter,

I've taken a look at it, but I'm not sure why it fails sometimes and not others. My gut is that it's just a test issue, but I will spend some time trying to reproduce it on my system and get a clearer answer. I don't think it's a blocker, but it would be nice to have it sorted out before the next release.

Comment by Peter Jones [ 12/Aug/16 ]

Kit

Thanks for the update. Those kind of issues can be infuriating. Do you know how to leverage test parameters to easily run the test multiple times? Adding additional debug to the tests and doing that is usually how engineers manage to get to the bottom of such issues.

Peter

Comment by Kit Westneat [ 12/Aug/16 ]

That's a good idea, I'll upload a debug patch as well with test params to see if I can reliably trigger it on maloo.

Comment by Gerrit Updater [ 12/Aug/16 ]

Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/21907
Subject: LU-8340 nodemap: debug patch
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 07e836a33a3cd5613fcda80944cb41bad20cecec

Comment by Saurabh Tandan (Inactive) [ 25/Aug/16 ]

Another instance: Interop EL7.2 Server/2.8.0 EL7.2 Client
Server: master, build# 3418
Client: b2_8, build# 12
https://testing.hpdd.intel.com/test_sets/77f2fecc-5f00-11e6-906c-5254006e85c2

Comment by Peter Jones [ 01/Sep/16 ]

Kit

Has the debug patch provided the information you hoped for?

Peter

Comment by Kit Westneat [ 06/Sep/16 ]

Hi Peter,

I haven't been able to reproduce the bug yet, I'll try some more this week.

Thanks,
Kit

Comment by Gerrit Updater [ 13/Sep/16 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/21907/
Subject: LU-8340 nodemap: debug patch
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 193987fbbb7595d29cec47f03f009f56caa90924

Comment by Peter Jones [ 23/Sep/16 ]

Kit

The debug patch has landed for over a week now. Have you been able to gather any useful info as a result?

Peter

Comment by Gerrit Updater [ 27/Sep/16 ]

Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/22750
Subject: LU-8340 tests: in sanity-sec 25 client nodemap must be admin
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 466a4add4b2bf92012920b0acff46745899547fd

Comment by Kit Westneat [ 27/Sep/16 ]

Hi Peter,

I think it's an issue with the ENABLE_QUOTA flag being set. When it is set, restarting Lustre causes the client to issue quota commands, but quota permissions were disabled by nodemap at that spot. This patch reenables the admin permissions on the client, so that should fix it.

Is it possible to confirm that ENABLE_QUOTA is disabled on the normal tests, but enabled on the full tests?

Thanks,
Kit

Comment by Jian Yu [ 27/Sep/16 ]

Is it possible to confirm that ENABLE_QUOTA is disabled on the normal tests, but enabled on the full tests?

Yes, Kit.

For full test group, ENABLE_QUOTA=yes is specified. For review-* test groups, ENABLE_QUOTA is disabled.

Comment by Gerrit Updater [ 05/Oct/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22750/
Subject: LU-8340 tests: in sanity-sec 25 client nodemap must be admin
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1a92e40bd05c05820c902ce4ee738212a308d9e2

Comment by Peter Jones [ 05/Oct/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:16:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.