[LU-13079] ost-pools test_23a: Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate. Created: 15/Dec/19  Updated: 08/Jan/20  Resolved: 08/Jan/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Maloo Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12378 sanity-quota test 1 fails with 'proje... Resolved
is related to LU-12951 LL_IOC_GETOBDCOUNT return wrong MDT c... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Arshad <arshad.super@gmail.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/f3b1229e-1ea8-11ea-b1e8-52540065bddc

test_23a failed with the following error:

test_23a failed with 22

Total allocated inode limit: 0, total allocated block limit: 0
uid 500 is using default file quota setting
Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.
 ost-pools test_23a: @@@@@@ FAIL: test_23a failed with 22 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6108:error()
  = /usr/lib64/lustre/tests/test-framework.sh:6410:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:6449:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:6295:run_test()
  = /usr/lib64/lustre/tests/ost-pools.sh:1285:main()
Dumping lctl log to /autotest/at-candidate/2019-12-14/lustre-reviews-el7_6-x86_64--review-dne-part-2--1_5__70456___a2827a62-8be9-4c1a-ac98-c35b83c5f16e/ost-pools.test_23a.*.1576332059.log
CMD: trevis-68vm1.trevis.whamcloud.com,trevis-68vm2,trevis-68vm3,trevis-68vm4,trevis-68vm5 /usr/sbin/lctl dk > /autotest/at-candidate/2019-12-14/lustre-reviews-el7_6-x86_64--review-dne-part-2--1_5__70456___a2827a62-8be9-4c1a-ac98-c35b83c5f16e/ost-pools.test_23a.debug_log.\$(hostname -s).1576332059.log;
         dmesg > /autotest/at-candidate/2019-12-14/lustre-reviews-el7_6-x86_64--review-dne-part-2--1_5__70456___a2827a62-8be9-4c1a-ac98-c35b83c5f16e/ost-pools.test_23a.dmesg.\$(hostname -s).1576332059.log
Resetting fail_loc on all nodes...CMD: trevis-68vm1.trevis.whamcloud.com,trevis-68vm2,trevis-68vm3,trevis-68vm4,trevis-68vm5 lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null
done.
Destroy the created pools: testpool
CMD: trevis-68vm4 /usr/sbin/lctl pool_list lustre
lustre.testpool
CMD: trevis-68vm4 /usr/sbin/lctl pool_list lustre.testpool
CMD: trevis-68vm4 lctl pool_remove lustre.testpool lustre-OST0000_UUID
trevis-68vm4: OST lustre-OST0000_UUID removed from pool lustre.testpool
CMD: trevis-68vm4 lctl pool_remove lustre.testpool lustre-OST0003_UUID
trevis-68vm4: OST lustre-OST0003_UUID removed from pool lustre.testpool
CMD: trevis-68vm4 lctl pool_remove lustre.testpool lustre-OST0006_UUID
trevis-68vm4: OST lustre-OST0006_UUID removed from pool lustre.testpool
CMD: trevis-68vm4 lctl pool_list lustre.testpool | wc -l
CMD: trevis-68vm4 lctl pool_list lustre.testpool | wc -l
CMD: trevis-68vm4 lctl pool_destroy lustre.testpool
trevis-68vm4: Pool lustre.testpool destroyed
CMD: trevis-68vm4 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool 				2>/dev/null || echo foo
CMD: trevis-68vm4 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool 				2>/dev/null || echo foo
CMD: trevis-68vm5 lctl get_param -n lod.lustre-MDT0001-mdtlov.pools.testpool 				2>/dev/null || echo foo
CMD: trevis-68vm5 lctl get_param -n lod.lustre-MDT0001-mdtlov.pools.testpool 				2>/dev/null || echo foo
CMD: trevis-68vm4 lctl get_param -n lod.lustre-MDT0002-mdtlov.pools.testpool 				2>/dev/null || echo foo
CMD: trevis-68vm4 lctl get_param -n lod.lustre-MDT0002-mdtlov.pools.testpool 				2>/dev/null || echo foo
CMD: trevis-68vm5 lctl get_param -n lod.lustre-MDT0003-mdtlov.pools.testpool 				2>/dev/null || echo foo
CMD: trevis-68vm5 lctl get_param -n lod.lustre-MDT0003-mdtlov.pools.testpool 				2>/dev/null || echo foo
CMD: trevis-68vm1.trevis.whamcloud.com lctl get_param -n lov.lustre-*.pools.testpool 		2>/dev/null || echo foo
CMD: trevis-68vm1.trevis.whamcloud.com lctl get_param -n lov.lustre-*.pools.testpool 		2>/dev/null || echo foo

 



 Comments   
Comment by Arshad Hussain [ 15/Dec/19 ]

This looks similar to LU-11396

This is failing LU-3606 (https://testing.whamcloud.com/test_sets/1b844b90-1eac-11ea-971c-52540065bddc)

Comment by James Nunez (Inactive) [ 16/Dec/19 ]

We're seeing this failure for sanity-quota tests 7b and 27b starting on 14 DEC 2019. Logs for one of these failures is at https://testing.whamcloud.com/test_sets/8b797c1c-1f02-11ea-bb75-52540065bddc .

Comment by Andreas Dilger [ 16/Dec/19 ]

I also noticed some spurious error messages being printed:

quotactl mdt4 failed.
quotactl mdt5 failed.
:
:
quotactl mdt62 failed.
quotactl mdt63 failed.

This should not be printed to the console at all. I'd assume that this returns a useful error code like -ENODEV for the case where the device is non-existent.

Comment by Andreas Dilger [ 16/Dec/19 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37041
Subject: Revert "LU-12378 ptlrpc: always reset generation for idle reconnect"

This patch is causing an increase in sanity-quota and ost-pools testing tracked under LU-13079.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 29939c06e7ad23eb17d5490defba3488e968e22a

Comment by Wang Shilong (Inactive) [ 17/Dec/19 ]

I think following patch fixed the problem:

https://review.whamcloud.com/#/c/36713/

Comment by Wang Shilong (Inactive) [ 08/Jan/20 ]

LU-12951 has been merged, the issue could be closed.

Generated at Sat Feb 10 02:58:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.