Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4422

Failure on test suite sanity-quota test_6: file not growed in 30 seconds 2088960\/2088960

Details

    • 3
    • 12140

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/7b0d8a32-707e-11e3-923a-52540035b04c.

      The sub-test test_6 failed with the following error:

      file not growed in 30 seconds 2088960/2088960

      Hit this issue in patch: http://review.whamcloud.com/#/c/7665/
      MDS 1 dmesg shows:

      Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity-quota test 6: Test dropping acquire request on master == 19:51:08 \(1388289068\)
      Lustre: DEBUG MARKER: == sanity-quota test 6: Test dropping acquire request on master == 19:51:08 (1388289068)
      Lustre: DEBUG MARKER: lctl get_param -n osc.*MDT*.sync_*
      Lustre: DEBUG MARKER: lctl set_param fail_val=0
      Lustre: DEBUG MARKER: lctl set_param fail_loc=0
      Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=ug
      Lustre: DEBUG MARKER: lctl set_param -n osd*.*MDT*.force_sync 1
      Lustre: DEBUG MARKER: lctl set_param fail_val=601
      Lustre: DEBUG MARKER: lctl set_param fail_loc=0x513
      Lustre: *** cfs_fail_loc=513, val=0***
      Lustre: *** cfs_fail_loc=513, val=0***
      Lustre: *** cfs_fail_loc=513, val=0***
      Lustre: Skipped 2 previous similar messages
      Lustre: *** cfs_fail_loc=513, val=0***
      Lustre: Skipped 10 previous similar messages
      Lustre: *** cfs_fail_loc=513, val=0***
      Lustre: Skipped 2 previous similar messages
      Lustre: *** cfs_fail_loc=513, val=0***
      Lustre: Skipped 22 previous similar messages
      Lustre: *** cfs_fail_loc=513, val=0***
      Lustre: Skipped 34 previous similar messages
      LustreError: 25638:0:(service.c:1911:ptlrpc_server_handle_req_in()) drop incoming rpc opc 601, x1455716828388904
      Lustre: lustre-MDT0000: Client lustre-MDT0000-lwp-OST0000_UUID (at 10.10.16.207@tcp) reconnecting
      LustreError: 25389:0:(service.c:1911:ptlrpc_server_handle_req_in()) drop incoming rpc opc 601, x1455716828389172
      Lustre: *** cfs_fail_loc=513, val=0***
      Lustre: Skipped 75 previous similar messages
      Lustre: DEBUG MARKER: lctl set_param fail_val=0
      Lustre: DEBUG MARKER: lctl set_param fail_loc=0
      Lustre: lustre-MDT0000: Client lustre-MDT0000-lwp-OST0000_UUID (at 10.10.16.207@tcp) reconnecting
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Files for user \(quota_usr\):
      Lustre: DEBUG MARKER: Files for user (quota_usr):
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-quota test_6: @@@@@@ FAIL: file not growed in 30 seconds 2088960\/2088960 
      Lustre: DEBUG MARKER: sanity-quota test_6: @@@@@@ FAIL: file not growed in 30 seconds 2088960\/2088960 
      

      Attachments

        Issue Links

          Activity

            [LU-4422] Failure on test suite sanity-quota test_6: file not growed in 30 seconds 2088960\/2088960
            pjones Peter Jones added a comment -

            Fix landed to master

            pjones Peter Jones added a comment - Fix landed to master
            bogl Bob Glossman (Inactive) added a comment - - edited still seeing this problem in latest b2_5 https://maloo.whamcloud.com/test_sets/014b38e6-c62a-11e3-b07b-52540035b04c
            yujian Jian Yu added a comment -

            Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/9946.

            yujian Jian Yu added a comment - Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/9946 .

            failure seen in b2_5:
            https://maloo.whamcloud.com/test_sets/41b91d04-bada-11e3-9a01-52540035b04c

            Does the fix need a back port to b2_5?

            bogl Bob Glossman (Inactive) added a comment - failure seen in b2_5: https://maloo.whamcloud.com/test_sets/41b91d04-bada-11e3-9a01-52540035b04c Does the fix need a back port to b2_5?

            Fix the script to not fail for file not growing: http://review.whamcloud.com/#/c/9339/1

            niu Niu Yawei (Inactive) added a comment - Fix the script to not fail for file not growing: http://review.whamcloud.com/#/c/9339/1
            11:32:03:running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
            11:32:03: [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d6.sanity-quota/f6.sanity-quota-quota_usr] [count=3] [seek=1] [oflag=sync] [conv=notrunc]
            11:32:45:CMD: wtm-23vm3 lctl set_param fail_val=0
            11:32:45:fail_val=0
            11:32:45:CMD: wtm-23vm3 lctl set_param fail_loc=0
            11:32:45:fail_loc=0
            11:32:45:CMD: wtm-23vm4 dmesg
            11:33:16:Disk quotas for user quota_usr (uid 60000):
            11:33:16:     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
            11:33:16:    /mnt/lustre    2040       0    3072       -       1       0       0       -
            11:33:16:lustre-MDT0000_UUID
            11:33:16:                      0       -       0       -       1       -       0       -
            11:33:16:lustre-MDT0001_UUID
            11:33:16:                      0       -       0       -       0       -       0       -
            11:33:16:lustre-MDT0002_UUID
            11:33:16:                      0       -       0       -       0       -       0       -
            11:33:16:lustre-MDT0003_UUID
            11:33:17:                      0       -       0       -       0       -       0       -
            11:33:17:lustre-OST0000_UUID
            11:33:18:                   2040       -    2048       -       -       -       -       -
            11:33:18:lustre-OST0001_UUID
            11:33:18:                      0       -       0       -       -       -       -       -
            11:33:18:lustre-OST0002_UUID
            11:33:18:                      0       -       0       -       -       -       -       -
            11:33:18:lustre-OST0003_UUID
            11:33:18:                      0       -       0       -       -       -       -       -
            11:33:18:lustre-OST0004_UUID
            11:33:18:                      0       -       0       -       -       -       -       -
            11:33:18:lustre-OST0005_UUID
            11:33:18:                      0       -       0       -       -       -       -       -
            11:33:18:lustre-OST0006_UUID
            11:33:18:                      0       -       0       -       -       -       -       -
            11:33:18:lustre-OST0007_UUID
            11:33:18:                      0       -       0       -       -       -       -       -
            11:33:18:Total allocated inode limit: 0, total allocated block limit: 0
            11:33:18:Files for user (quota_usr):
            11:33:18:  File: `/mnt/lustre/d6.sanity-quota/f6.sanity-quota-quota_usr'
            11:33:18:  Size: 2084864   	Blocks: 4080       IO Block: 4194304 regular file
            11:33:18:Device: 2c54f966h/743766374d	Inode: 288230930219272356  Links: 1
            11:33:18:Access: (0644/-rw-r--r--)  Uid: (60000/quota_usr)   Gid: (60000/quota_usr)
            11:33:18:Access: 2014-02-12 11:31:58.000000000 -0800
            11:33:18:Modify: 2014-02-12 11:31:58.000000000 -0800
            11:33:18:Change: 2014-02-12 11:31:58.000000000 -0800
            11:33:18: sanity-quota test_6: @@@@@@ FAIL: file not growed in 30 seconds 2084864/2084864 
            11:33:18:  Trace dump:
            11:33:18:  = /usr/lib64/lustre/tests/test-framework.sh:4455:error_noexit()
            11:33:18:  = /usr/lib64/lustre/tests/test-framework.sh:4486:error()
            11:33:18:  = /usr/lib64/lustre/tests/sanity-quota.sh:157:quota_error()
            11:33:18:  = /usr/lib64/lustre/tests/sanity-quota.sh:949:test_6()
            11:33:18:  = /usr/lib64/lustre/tests/test-framework.sh:4747:run_one()
            11:33:18:  = /usr/lib64/lustre/tests/test-framework.sh:4782:run_one_logged()
            11:33:18:  = /usr/lib64/lustre/tests/test-framework.sh:4635:run_test()
            11:33:18:  = /usr/lib64/lustre/tests/sanity-quota.sh:962:main()
            11:33:18:Dumping lctl log to /logdir/test_logs/2014-02-12/lustre-reviews-el6-x86_64--review-dne-part-2--1_3_1__21745__-69932649229140-090503/sanity-quota.test_6.*.1392233592.log
            11:33:18:CMD: wtm-23vm3,wtm-23vm4,wtm-23vm5,wtm-23vm6.rosso.whamcloud.com,wtm-23vm7 /usr/sbin/lctl dk > /logdir/test_logs/2014-02-12/lustre-reviews-el6-x86_64--review-dne-part-2--1_3_1__21745__-69932649229140-090503/sanity-quota.test_6.debug_log.\$(hostname -s).1392233592.log;
            11:33:18:         dmesg > /logdir/test_logs/2014-02-12/lustre-reviews-el6-x86_64--review-dne-part-2--1_3_1__21745__-69932649229140-090503/sanity-quota.test_6.dmesg.\$(hostname -s).1392233592.log
            11:33:18:Delete files...
            11:33:30:Wait for unlink objects finished...
            11:33:30:CMD: wtm-23vm3 lctl set_param -n osd*.*MD*.force_sync 1
            11:33:30:CMD: wtm-23vm3 lctl get_param -n osc.*MDT*.sync_*
            11:33:30:Waiting for local destroys to complete
            11:33:30:CMD: wtm-23vm3,wtm-23vm7 lctl set_param -n osd*.*MDT*.force_sync 1
            11:33:30:CMD: wtm-23vm4 lctl set_param -n osd*.*OS*.force_sync 1
            11:33:30:dd: writing `/mnt/lustre/d6.sanity-quota/f6.sanity-quota-quota_usr': Disk quota exceeded
            11:33:30:2+0 records in
            11:33:30:1+0 records out
            

            Actually, the dd was finished in 45 seconds (far less then the 240 seconds), I think we just need to change the script to give quota slave more time to reconnect to master and finish dd.

            niu Niu Yawei (Inactive) added a comment - 11:32:03:running as uid/gid/euid/egid 60000/60000/60000/60000, groups: 11:32:03: [dd] [ if =/dev/zero] [bs=1M] [of=/mnt/lustre/d6.sanity-quota/f6.sanity-quota-quota_usr] [count=3] [seek=1] [oflag=sync] [conv=notrunc] 11:32:45:CMD: wtm-23vm3 lctl set_param fail_val=0 11:32:45:fail_val=0 11:32:45:CMD: wtm-23vm3 lctl set_param fail_loc=0 11:32:45:fail_loc=0 11:32:45:CMD: wtm-23vm4 dmesg 11:33:16:Disk quotas for user quota_usr (uid 60000): 11:33:16: Filesystem kbytes quota limit grace files quota limit grace 11:33:16: /mnt/lustre 2040 0 3072 - 1 0 0 - 11:33:16:lustre-MDT0000_UUID 11:33:16: 0 - 0 - 1 - 0 - 11:33:16:lustre-MDT0001_UUID 11:33:16: 0 - 0 - 0 - 0 - 11:33:16:lustre-MDT0002_UUID 11:33:16: 0 - 0 - 0 - 0 - 11:33:16:lustre-MDT0003_UUID 11:33:17: 0 - 0 - 0 - 0 - 11:33:17:lustre-OST0000_UUID 11:33:18: 2040 - 2048 - - - - - 11:33:18:lustre-OST0001_UUID 11:33:18: 0 - 0 - - - - - 11:33:18:lustre-OST0002_UUID 11:33:18: 0 - 0 - - - - - 11:33:18:lustre-OST0003_UUID 11:33:18: 0 - 0 - - - - - 11:33:18:lustre-OST0004_UUID 11:33:18: 0 - 0 - - - - - 11:33:18:lustre-OST0005_UUID 11:33:18: 0 - 0 - - - - - 11:33:18:lustre-OST0006_UUID 11:33:18: 0 - 0 - - - - - 11:33:18:lustre-OST0007_UUID 11:33:18: 0 - 0 - - - - - 11:33:18:Total allocated inode limit: 0, total allocated block limit: 0 11:33:18:Files for user (quota_usr): 11:33:18: File: `/mnt/lustre/d6.sanity-quota/f6.sanity-quota-quota_usr' 11:33:18: Size: 2084864 Blocks: 4080 IO Block: 4194304 regular file 11:33:18:Device: 2c54f966h/743766374d Inode: 288230930219272356 Links: 1 11:33:18:Access: (0644/-rw-r--r--) Uid: (60000/quota_usr) Gid: (60000/quota_usr) 11:33:18:Access: 2014-02-12 11:31:58.000000000 -0800 11:33:18:Modify: 2014-02-12 11:31:58.000000000 -0800 11:33:18:Change: 2014-02-12 11:31:58.000000000 -0800 11:33:18: sanity-quota test_6: @@@@@@ FAIL: file not growed in 30 seconds 2084864/2084864 11:33:18: Trace dump: 11:33:18: = /usr/lib64/lustre/tests/test-framework.sh:4455:error_noexit() 11:33:18: = /usr/lib64/lustre/tests/test-framework.sh:4486:error() 11:33:18: = /usr/lib64/lustre/tests/sanity-quota.sh:157:quota_error() 11:33:18: = /usr/lib64/lustre/tests/sanity-quota.sh:949:test_6() 11:33:18: = /usr/lib64/lustre/tests/test-framework.sh:4747:run_one() 11:33:18: = /usr/lib64/lustre/tests/test-framework.sh:4782:run_one_logged() 11:33:18: = /usr/lib64/lustre/tests/test-framework.sh:4635:run_test() 11:33:18: = /usr/lib64/lustre/tests/sanity-quota.sh:962:main() 11:33:18:Dumping lctl log to /logdir/test_logs/2014-02-12/lustre-reviews-el6-x86_64--review-dne-part-2--1_3_1__21745__-69932649229140-090503/sanity-quota.test_6.*.1392233592.log 11:33:18:CMD: wtm-23vm3,wtm-23vm4,wtm-23vm5,wtm-23vm6.rosso.whamcloud.com,wtm-23vm7 /usr/sbin/lctl dk > /logdir/test_logs/2014-02-12/lustre-reviews-el6-x86_64--review-dne-part-2--1_3_1__21745__-69932649229140-090503/sanity-quota.test_6.debug_log.\$(hostname -s).1392233592.log; 11:33:18: dmesg > /logdir/test_logs/2014-02-12/lustre-reviews-el6-x86_64--review-dne-part-2--1_3_1__21745__-69932649229140-090503/sanity-quota.test_6.dmesg.\$(hostname -s).1392233592.log 11:33:18:Delete files... 11:33:30:Wait for unlink objects finished... 11:33:30:CMD: wtm-23vm3 lctl set_param -n osd*.*MD*.force_sync 1 11:33:30:CMD: wtm-23vm3 lctl get_param -n osc.*MDT*.sync_* 11:33:30:Waiting for local destroys to complete 11:33:30:CMD: wtm-23vm3,wtm-23vm7 lctl set_param -n osd*.*MDT*.force_sync 1 11:33:30:CMD: wtm-23vm4 lctl set_param -n osd*.*OS*.force_sync 1 11:33:30:dd: writing `/mnt/lustre/d6.sanity-quota/f6.sanity-quota-quota_usr': Disk quota exceeded 11:33:30:2+0 records in 11:33:30:1+0 records out Actually, the dd was finished in 45 seconds (far less then the 240 seconds), I think we just need to change the script to give quota slave more time to reconnect to master and finish dd.

            There is temporary patch to disable sanity-quota test_6:

            http://review.whamcloud.com/#/c/9203/

            yong.fan nasf (Inactive) added a comment - There is temporary patch to disable sanity-quota test_6: http://review.whamcloud.com/#/c/9203/
            yujian Jian Yu added a comment - More instance on master branch: https://maloo.whamcloud.com/test_sets/ffae120c-9455-11e3-854a-52540035b04c
            bergwolf Peng Tao added a comment - another one: https://maloo.whamcloud.com/test_sets/dd0c7eae-90dc-11e3-a252-52540035b04c
            yong.fan nasf (Inactive) added a comment - another failure instance: https://maloo.whamcloud.com/test_sets/a67ad95a-91c5-11e3-bdf8-52540035b04c

            People

              niu Niu Yawei (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: