[LU-4422] Failure on test suite sanity-quota test_6: file not growed in 30 seconds 2088960\/2088960 Created: 30/Dec/13 Updated: 01/Oct/14 Resolved: 17/Apr/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.1 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.2 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 12140 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/7b0d8a32-707e-11e3-923a-52540035b04c. The sub-test test_6 failed with the following error:
Hit this issue in patch: http://review.whamcloud.com/#/c/7665/ Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity-quota test 6: Test dropping acquire request on master == 19:51:08 \(1388289068\) Lustre: DEBUG MARKER: == sanity-quota test 6: Test dropping acquire request on master == 19:51:08 (1388289068) Lustre: DEBUG MARKER: lctl get_param -n osc.*MDT*.sync_* Lustre: DEBUG MARKER: lctl set_param fail_val=0 Lustre: DEBUG MARKER: lctl set_param fail_loc=0 Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=ug Lustre: DEBUG MARKER: lctl set_param -n osd*.*MDT*.force_sync 1 Lustre: DEBUG MARKER: lctl set_param fail_val=601 Lustre: DEBUG MARKER: lctl set_param fail_loc=0x513 Lustre: *** cfs_fail_loc=513, val=0*** Lustre: *** cfs_fail_loc=513, val=0*** Lustre: *** cfs_fail_loc=513, val=0*** Lustre: Skipped 2 previous similar messages Lustre: *** cfs_fail_loc=513, val=0*** Lustre: Skipped 10 previous similar messages Lustre: *** cfs_fail_loc=513, val=0*** Lustre: Skipped 2 previous similar messages Lustre: *** cfs_fail_loc=513, val=0*** Lustre: Skipped 22 previous similar messages Lustre: *** cfs_fail_loc=513, val=0*** Lustre: Skipped 34 previous similar messages LustreError: 25638:0:(service.c:1911:ptlrpc_server_handle_req_in()) drop incoming rpc opc 601, x1455716828388904 Lustre: lustre-MDT0000: Client lustre-MDT0000-lwp-OST0000_UUID (at 10.10.16.207@tcp) reconnecting LustreError: 25389:0:(service.c:1911:ptlrpc_server_handle_req_in()) drop incoming rpc opc 601, x1455716828389172 Lustre: *** cfs_fail_loc=513, val=0*** Lustre: Skipped 75 previous similar messages Lustre: DEBUG MARKER: lctl set_param fail_val=0 Lustre: DEBUG MARKER: lctl set_param fail_loc=0 Lustre: lustre-MDT0000: Client lustre-MDT0000-lwp-OST0000_UUID (at 10.10.16.207@tcp) reconnecting Lustre: DEBUG MARKER: /usr/sbin/lctl mark Files for user \(quota_usr\): Lustre: DEBUG MARKER: Files for user (quota_usr): Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-quota test_6: @@@@@@ FAIL: file not growed in 30 seconds 2088960\/2088960 Lustre: DEBUG MARKER: sanity-quota test_6: @@@@@@ FAIL: file not growed in 30 seconds 2088960\/2088960 |
| Comments |
| Comment by nasf (Inactive) [ 10/Feb/14 ] |
|
another failure instance: https://maloo.whamcloud.com/test_sets/a67ad95a-91c5-11e3-bdf8-52540035b04c |
| Comment by Peng Tao [ 12/Feb/14 ] |
|
another one: |
| Comment by Jian Yu [ 13/Feb/14 ] |
|
More instance on master branch: |
| Comment by nasf (Inactive) [ 13/Feb/14 ] |
|
There is temporary patch to disable sanity-quota test_6: |
| Comment by Niu Yawei (Inactive) [ 21/Feb/14 ] |
11:32:03:running as uid/gid/euid/egid 60000/60000/60000/60000, groups: 11:32:03: [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d6.sanity-quota/f6.sanity-quota-quota_usr] [count=3] [seek=1] [oflag=sync] [conv=notrunc] 11:32:45:CMD: wtm-23vm3 lctl set_param fail_val=0 11:32:45:fail_val=0 11:32:45:CMD: wtm-23vm3 lctl set_param fail_loc=0 11:32:45:fail_loc=0 11:32:45:CMD: wtm-23vm4 dmesg 11:33:16:Disk quotas for user quota_usr (uid 60000): 11:33:16: Filesystem kbytes quota limit grace files quota limit grace 11:33:16: /mnt/lustre 2040 0 3072 - 1 0 0 - 11:33:16:lustre-MDT0000_UUID 11:33:16: 0 - 0 - 1 - 0 - 11:33:16:lustre-MDT0001_UUID 11:33:16: 0 - 0 - 0 - 0 - 11:33:16:lustre-MDT0002_UUID 11:33:16: 0 - 0 - 0 - 0 - 11:33:16:lustre-MDT0003_UUID 11:33:17: 0 - 0 - 0 - 0 - 11:33:17:lustre-OST0000_UUID 11:33:18: 2040 - 2048 - - - - - 11:33:18:lustre-OST0001_UUID 11:33:18: 0 - 0 - - - - - 11:33:18:lustre-OST0002_UUID 11:33:18: 0 - 0 - - - - - 11:33:18:lustre-OST0003_UUID 11:33:18: 0 - 0 - - - - - 11:33:18:lustre-OST0004_UUID 11:33:18: 0 - 0 - - - - - 11:33:18:lustre-OST0005_UUID 11:33:18: 0 - 0 - - - - - 11:33:18:lustre-OST0006_UUID 11:33:18: 0 - 0 - - - - - 11:33:18:lustre-OST0007_UUID 11:33:18: 0 - 0 - - - - - 11:33:18:Total allocated inode limit: 0, total allocated block limit: 0 11:33:18:Files for user (quota_usr): 11:33:18: File: `/mnt/lustre/d6.sanity-quota/f6.sanity-quota-quota_usr' 11:33:18: Size: 2084864 Blocks: 4080 IO Block: 4194304 regular file 11:33:18:Device: 2c54f966h/743766374d Inode: 288230930219272356 Links: 1 11:33:18:Access: (0644/-rw-r--r--) Uid: (60000/quota_usr) Gid: (60000/quota_usr) 11:33:18:Access: 2014-02-12 11:31:58.000000000 -0800 11:33:18:Modify: 2014-02-12 11:31:58.000000000 -0800 11:33:18:Change: 2014-02-12 11:31:58.000000000 -0800 11:33:18: sanity-quota test_6: @@@@@@ FAIL: file not growed in 30 seconds 2084864/2084864 11:33:18: Trace dump: 11:33:18: = /usr/lib64/lustre/tests/test-framework.sh:4455:error_noexit() 11:33:18: = /usr/lib64/lustre/tests/test-framework.sh:4486:error() 11:33:18: = /usr/lib64/lustre/tests/sanity-quota.sh:157:quota_error() 11:33:18: = /usr/lib64/lustre/tests/sanity-quota.sh:949:test_6() 11:33:18: = /usr/lib64/lustre/tests/test-framework.sh:4747:run_one() 11:33:18: = /usr/lib64/lustre/tests/test-framework.sh:4782:run_one_logged() 11:33:18: = /usr/lib64/lustre/tests/test-framework.sh:4635:run_test() 11:33:18: = /usr/lib64/lustre/tests/sanity-quota.sh:962:main() 11:33:18:Dumping lctl log to /logdir/test_logs/2014-02-12/lustre-reviews-el6-x86_64--review-dne-part-2--1_3_1__21745__-69932649229140-090503/sanity-quota.test_6.*.1392233592.log 11:33:18:CMD: wtm-23vm3,wtm-23vm4,wtm-23vm5,wtm-23vm6.rosso.whamcloud.com,wtm-23vm7 /usr/sbin/lctl dk > /logdir/test_logs/2014-02-12/lustre-reviews-el6-x86_64--review-dne-part-2--1_3_1__21745__-69932649229140-090503/sanity-quota.test_6.debug_log.\$(hostname -s).1392233592.log; 11:33:18: dmesg > /logdir/test_logs/2014-02-12/lustre-reviews-el6-x86_64--review-dne-part-2--1_3_1__21745__-69932649229140-090503/sanity-quota.test_6.dmesg.\$(hostname -s).1392233592.log 11:33:18:Delete files... 11:33:30:Wait for unlink objects finished... 11:33:30:CMD: wtm-23vm3 lctl set_param -n osd*.*MD*.force_sync 1 11:33:30:CMD: wtm-23vm3 lctl get_param -n osc.*MDT*.sync_* 11:33:30:Waiting for local destroys to complete 11:33:30:CMD: wtm-23vm3,wtm-23vm7 lctl set_param -n osd*.*MDT*.force_sync 1 11:33:30:CMD: wtm-23vm4 lctl set_param -n osd*.*OS*.force_sync 1 11:33:30:dd: writing `/mnt/lustre/d6.sanity-quota/f6.sanity-quota-quota_usr': Disk quota exceeded 11:33:30:2+0 records in 11:33:30:1+0 records out Actually, the dd was finished in 45 seconds (far less then the 240 seconds), I think we just need to change the script to give quota slave more time to reconnect to master and finish dd. |
| Comment by Niu Yawei (Inactive) [ 21/Feb/14 ] |
|
Fix the script to not fail for file not growing: http://review.whamcloud.com/#/c/9339/1 |
| Comment by Bob Glossman (Inactive) [ 03/Apr/14 ] |
|
failure seen in b2_5: Does the fix need a back port to b2_5? |
| Comment by Jian Yu [ 14/Apr/14 ] |
|
Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/9946. |
| Comment by Bob Glossman (Inactive) [ 17/Apr/14 ] |
|
still seeing this problem in latest b2_5 |
| Comment by Peter Jones [ 17/Apr/14 ] |
|
Fix landed to master |