[LU-4422] Failure on test suite sanity-quota test_6: file not growed in 30 seconds 2088960\/2088960 Created: 30/Dec/13  Updated: 01/Oct/14  Resolved: 17/Apr/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.1
Fix Version/s: Lustre 2.6.0, Lustre 2.5.2

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: dne

Issue Links:
Related
is related to LU-2836 Test failure on test suite sanity-quo... Resolved
Severity: 3
Rank (Obsolete): 12140

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/7b0d8a32-707e-11e3-923a-52540035b04c.

The sub-test test_6 failed with the following error:

file not growed in 30 seconds 2088960/2088960

Hit this issue in patch: http://review.whamcloud.com/#/c/7665/
MDS 1 dmesg shows:

Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity-quota test 6: Test dropping acquire request on master == 19:51:08 \(1388289068\)
Lustre: DEBUG MARKER: == sanity-quota test 6: Test dropping acquire request on master == 19:51:08 (1388289068)
Lustre: DEBUG MARKER: lctl get_param -n osc.*MDT*.sync_*
Lustre: DEBUG MARKER: lctl set_param fail_val=0
Lustre: DEBUG MARKER: lctl set_param fail_loc=0
Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=ug
Lustre: DEBUG MARKER: lctl set_param -n osd*.*MDT*.force_sync 1
Lustre: DEBUG MARKER: lctl set_param fail_val=601
Lustre: DEBUG MARKER: lctl set_param fail_loc=0x513
Lustre: *** cfs_fail_loc=513, val=0***
Lustre: *** cfs_fail_loc=513, val=0***
Lustre: *** cfs_fail_loc=513, val=0***
Lustre: Skipped 2 previous similar messages
Lustre: *** cfs_fail_loc=513, val=0***
Lustre: Skipped 10 previous similar messages
Lustre: *** cfs_fail_loc=513, val=0***
Lustre: Skipped 2 previous similar messages
Lustre: *** cfs_fail_loc=513, val=0***
Lustre: Skipped 22 previous similar messages
Lustre: *** cfs_fail_loc=513, val=0***
Lustre: Skipped 34 previous similar messages
LustreError: 25638:0:(service.c:1911:ptlrpc_server_handle_req_in()) drop incoming rpc opc 601, x1455716828388904
Lustre: lustre-MDT0000: Client lustre-MDT0000-lwp-OST0000_UUID (at 10.10.16.207@tcp) reconnecting
LustreError: 25389:0:(service.c:1911:ptlrpc_server_handle_req_in()) drop incoming rpc opc 601, x1455716828389172
Lustre: *** cfs_fail_loc=513, val=0***
Lustre: Skipped 75 previous similar messages
Lustre: DEBUG MARKER: lctl set_param fail_val=0
Lustre: DEBUG MARKER: lctl set_param fail_loc=0
Lustre: lustre-MDT0000: Client lustre-MDT0000-lwp-OST0000_UUID (at 10.10.16.207@tcp) reconnecting
Lustre: DEBUG MARKER: /usr/sbin/lctl mark Files for user \(quota_usr\):
Lustre: DEBUG MARKER: Files for user (quota_usr):
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-quota test_6: @@@@@@ FAIL: file not growed in 30 seconds 2088960\/2088960 
Lustre: DEBUG MARKER: sanity-quota test_6: @@@@@@ FAIL: file not growed in 30 seconds 2088960\/2088960 


 Comments   
Comment by nasf (Inactive) [ 10/Feb/14 ]

another failure instance:

https://maloo.whamcloud.com/test_sets/a67ad95a-91c5-11e3-bdf8-52540035b04c

Comment by Peng Tao [ 12/Feb/14 ]

another one:
https://maloo.whamcloud.com/test_sets/dd0c7eae-90dc-11e3-a252-52540035b04c

Comment by Jian Yu [ 13/Feb/14 ]

More instance on master branch:
https://maloo.whamcloud.com/test_sets/ffae120c-9455-11e3-854a-52540035b04c

Comment by nasf (Inactive) [ 13/Feb/14 ]

There is temporary patch to disable sanity-quota test_6:

http://review.whamcloud.com/#/c/9203/

Comment by Niu Yawei (Inactive) [ 21/Feb/14 ]
11:32:03:running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
11:32:03: [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d6.sanity-quota/f6.sanity-quota-quota_usr] [count=3] [seek=1] [oflag=sync] [conv=notrunc]
11:32:45:CMD: wtm-23vm3 lctl set_param fail_val=0
11:32:45:fail_val=0
11:32:45:CMD: wtm-23vm3 lctl set_param fail_loc=0
11:32:45:fail_loc=0
11:32:45:CMD: wtm-23vm4 dmesg
11:33:16:Disk quotas for user quota_usr (uid 60000):
11:33:16:     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
11:33:16:    /mnt/lustre    2040       0    3072       -       1       0       0       -
11:33:16:lustre-MDT0000_UUID
11:33:16:                      0       -       0       -       1       -       0       -
11:33:16:lustre-MDT0001_UUID
11:33:16:                      0       -       0       -       0       -       0       -
11:33:16:lustre-MDT0002_UUID
11:33:16:                      0       -       0       -       0       -       0       -
11:33:16:lustre-MDT0003_UUID
11:33:17:                      0       -       0       -       0       -       0       -
11:33:17:lustre-OST0000_UUID
11:33:18:                   2040       -    2048       -       -       -       -       -
11:33:18:lustre-OST0001_UUID
11:33:18:                      0       -       0       -       -       -       -       -
11:33:18:lustre-OST0002_UUID
11:33:18:                      0       -       0       -       -       -       -       -
11:33:18:lustre-OST0003_UUID
11:33:18:                      0       -       0       -       -       -       -       -
11:33:18:lustre-OST0004_UUID
11:33:18:                      0       -       0       -       -       -       -       -
11:33:18:lustre-OST0005_UUID
11:33:18:                      0       -       0       -       -       -       -       -
11:33:18:lustre-OST0006_UUID
11:33:18:                      0       -       0       -       -       -       -       -
11:33:18:lustre-OST0007_UUID
11:33:18:                      0       -       0       -       -       -       -       -
11:33:18:Total allocated inode limit: 0, total allocated block limit: 0
11:33:18:Files for user (quota_usr):
11:33:18:  File: `/mnt/lustre/d6.sanity-quota/f6.sanity-quota-quota_usr'
11:33:18:  Size: 2084864   	Blocks: 4080       IO Block: 4194304 regular file
11:33:18:Device: 2c54f966h/743766374d	Inode: 288230930219272356  Links: 1
11:33:18:Access: (0644/-rw-r--r--)  Uid: (60000/quota_usr)   Gid: (60000/quota_usr)
11:33:18:Access: 2014-02-12 11:31:58.000000000 -0800
11:33:18:Modify: 2014-02-12 11:31:58.000000000 -0800
11:33:18:Change: 2014-02-12 11:31:58.000000000 -0800
11:33:18: sanity-quota test_6: @@@@@@ FAIL: file not growed in 30 seconds 2084864/2084864 
11:33:18:  Trace dump:
11:33:18:  = /usr/lib64/lustre/tests/test-framework.sh:4455:error_noexit()
11:33:18:  = /usr/lib64/lustre/tests/test-framework.sh:4486:error()
11:33:18:  = /usr/lib64/lustre/tests/sanity-quota.sh:157:quota_error()
11:33:18:  = /usr/lib64/lustre/tests/sanity-quota.sh:949:test_6()
11:33:18:  = /usr/lib64/lustre/tests/test-framework.sh:4747:run_one()
11:33:18:  = /usr/lib64/lustre/tests/test-framework.sh:4782:run_one_logged()
11:33:18:  = /usr/lib64/lustre/tests/test-framework.sh:4635:run_test()
11:33:18:  = /usr/lib64/lustre/tests/sanity-quota.sh:962:main()
11:33:18:Dumping lctl log to /logdir/test_logs/2014-02-12/lustre-reviews-el6-x86_64--review-dne-part-2--1_3_1__21745__-69932649229140-090503/sanity-quota.test_6.*.1392233592.log
11:33:18:CMD: wtm-23vm3,wtm-23vm4,wtm-23vm5,wtm-23vm6.rosso.whamcloud.com,wtm-23vm7 /usr/sbin/lctl dk > /logdir/test_logs/2014-02-12/lustre-reviews-el6-x86_64--review-dne-part-2--1_3_1__21745__-69932649229140-090503/sanity-quota.test_6.debug_log.\$(hostname -s).1392233592.log;
11:33:18:         dmesg > /logdir/test_logs/2014-02-12/lustre-reviews-el6-x86_64--review-dne-part-2--1_3_1__21745__-69932649229140-090503/sanity-quota.test_6.dmesg.\$(hostname -s).1392233592.log
11:33:18:Delete files...
11:33:30:Wait for unlink objects finished...
11:33:30:CMD: wtm-23vm3 lctl set_param -n osd*.*MD*.force_sync 1
11:33:30:CMD: wtm-23vm3 lctl get_param -n osc.*MDT*.sync_*
11:33:30:Waiting for local destroys to complete
11:33:30:CMD: wtm-23vm3,wtm-23vm7 lctl set_param -n osd*.*MDT*.force_sync 1
11:33:30:CMD: wtm-23vm4 lctl set_param -n osd*.*OS*.force_sync 1
11:33:30:dd: writing `/mnt/lustre/d6.sanity-quota/f6.sanity-quota-quota_usr': Disk quota exceeded
11:33:30:2+0 records in
11:33:30:1+0 records out

Actually, the dd was finished in 45 seconds (far less then the 240 seconds), I think we just need to change the script to give quota slave more time to reconnect to master and finish dd.

Comment by Niu Yawei (Inactive) [ 21/Feb/14 ]

Fix the script to not fail for file not growing: http://review.whamcloud.com/#/c/9339/1

Comment by Bob Glossman (Inactive) [ 03/Apr/14 ]

failure seen in b2_5:
https://maloo.whamcloud.com/test_sets/41b91d04-bada-11e3-9a01-52540035b04c

Does the fix need a back port to b2_5?

Comment by Jian Yu [ 14/Apr/14 ]

Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/9946.

Comment by Bob Glossman (Inactive) [ 17/Apr/14 ]

still seeing this problem in latest b2_5
https://maloo.whamcloud.com/test_sets/014b38e6-c62a-11e3-b07b-52540035b04c

Comment by Peter Jones [ 17/Apr/14 ]

Fix landed to master

Generated at Sat Feb 10 01:42:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.