Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5245

sanity-quota test_1: user write success, but expect EDQUOT

Details

    • Bug
    • Resolution: Incomplete
    • Minor
    • None
    • Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0
    • None
    • 3
    • 14627

    Description

      This issue was created by maloo for wangdi <di.wang@intel.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/540fb674-fad7-11e3-b499-52540035b04c.

      The sub-test test_1 failed with the following error:

      Total allocated inode limit: 0, total allocated block limit: 0
      Files for user (quota_usr):
      File: `/mnt/lustre/d1.sanity-quota/f1.sanity-quota-0'
      Size: 11534336 Blocks: 22528 IO Block: 4194304 regular file
      Device: 2c54f966h/743766374d Inode: 288230930219270255 Links: 1
      Access: (0644/rw-rr-) Uid: (60000/quota_usr) Gid: (60000/quota_usr)
      Access: 2014-06-23 00:18:35.000000000 -0700
      Modify: 2014-06-23 00:18:36.000000000 -0700
      Change: 2014-06-23 00:18:36.000000000 -0700
      sanity-quota test_1: @@@@@@ FAIL: user write success, but expect EDQUOT
      Trace dump:
      = /usr/lib64/lustre/tests/test-framework.sh:4528:error_noexit()
      = /usr/lib64/lustre/tests/test-framework.sh:4559:error()
      = /usr/lib64/lustre/tests/sanity-quota.sh:154:quota_error()
      = /usr/lib64/lustre/tests/sanity-quota.sh:440:test_1()
      = /usr/lib64/lustre/tests/test-framework.sh:4820:run_one()
      = /usr/lib64/lustre/tests/test-framework.sh:4855:run_one_logged()
      = /usr/lib64/lustre/tests/test-framework.sh:4708:run_test()
      = /usr/lib64/lustre/tests/sanity-quota.sh:483:main()
      user write success, but expect EDQUOT

      Info required for matching: sanity-quota 1

      Attachments

        Issue Links

          Activity

            [LU-5245] sanity-quota test_1: user write success, but expect EDQUOT

            Closing this old issue, use LU-11678 for new issue with same symptom.

            adilger Andreas Dilger added a comment - Closing this old issue, use LU-11678 for new issue with same symptom.
            bogl Bob Glossman (Inactive) added a comment - another on master: https://testing.hpdd.intel.com/test_sets/772bbf04-6dec-11e8-a522-52540065bddc
            bogl Bob Glossman (Inactive) added a comment - another on master: https://testing.hpdd.intel.com/test_sets/a4bae786-61c9-11e7-9230-5254006e85c2
            yong.fan nasf (Inactive) added a comment - +1 on master: https://testing.hpdd.intel.com/test_sets/fedc1a0c-4219-11e7-b558-5254006e85c2
            jamesanunez James Nunez (Inactive) added a comment - - edited Another instance on master: 2015-11-15 14:01:26 - https://testing.hpdd.intel.com/test_sets/9e1632c2-8bbb-11e5-9933-5254006e85c2 2015-11-18 16:20:31 - https://testing.hpdd.intel.com/test_sets/cf90e830-8e25-11e5-8da8-5254006e85c2 2015-11-18 21:25:31 - https://testing.hpdd.intel.com/test_sets/1ce03240-8e52-11e5-8da8-5254006e85c2

            I think we hit this issue again on PPC builds. Logs at:
            2015-09-01 03:24:28 - https://testing.hpdd.intel.com/test_sets/b1d82b30-508d-11e5-95a9-5254006e85c2

            jamesanunez James Nunez (Inactive) added a comment - I think we hit this issue again on PPC builds. Logs at: 2015-09-01 03:24:28 - https://testing.hpdd.intel.com/test_sets/b1d82b30-508d-11e5-95a9-5254006e85c2

            All the failures were because quota slave didn't connect to master, so that quota wasn't enforced on slave.

            00000100:00100000:0.0:1436439856.246154:0:23692:0:(nrs_fifo.c:179:nrs_fifo_req_get()) NRS start fifo request from 12345-10.1.4.48@tcp, seq: 852
            00000100:00100000:0.0:1436439856.246163:0:23692:0:(service.c:2076:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt00_004:0+-99:3137:x1505240840316480:12345-10.1.4.48@tcp:38
            00010000:02000400:0.0:1436439856.246191:0:23692:0:(ldlm_lib.c:1026:target_handle_connect()) lustre-MDT0000: Received LWP connection from 10.1.4.48@tcp, removing former export from 10.1.4.52@tcp
            00000020:00080000:0.0:1436439856.246195:0:23692:0:(genops.c:1382:class_fail_export()) disconnecting export ffff8800798e9400/lustre-MDT0000-lwp-OST0003_UUID
            00000020:00000080:0.0:1436439856.246204:0:23692:0:(genops.c:1215:class_disconnect()) disconnect: cookie 0x515596b0134eb7f7
            00000100:00080000:0.0:1436439856.246211:0:23692:0:(import.c:1601:ptlrpc_cleanup_imp()) ffff880063495000 PA#}: changing import state from FULL to CLOSED
            00000020:00080000:0.0:1436439856.246261:0:23692:0:(genops.c:1399:class_fail_export()) disconnected export ffff8800798e9400/lustre-MDT0000-lwp-OST0003_UUID
            00000020:00000080:0.0:1436439856.246263:0:23692:0:(genops.c:814:class_export_put()) final put ffff8800798e9400/lustre-MDT0000-lwp-OST0003_UUID
            00010000:00080000:0.0:1436439856.246290:0:23692:0:(ldlm_lib.c:1092:target_handle_connect()) lustre-MDT0000: connection from lustre-MDT0000-lwp-OST0003_UUID@10.1.4.48@tcp t0 exp (null) cur 1436439856 last 0
            

            I see from the log that OST connected to master with different IP address during testing, do we failover OSTs while testing?

            niu Niu Yawei (Inactive) added a comment - All the failures were because quota slave didn't connect to master, so that quota wasn't enforced on slave. 00000100:00100000:0.0:1436439856.246154:0:23692:0:(nrs_fifo.c:179:nrs_fifo_req_get()) NRS start fifo request from 12345-10.1.4.48@tcp, seq: 852 00000100:00100000:0.0:1436439856.246163:0:23692:0:(service.c:2076:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt00_004:0+-99:3137:x1505240840316480:12345-10.1.4.48@tcp:38 00010000:02000400:0.0:1436439856.246191:0:23692:0:(ldlm_lib.c:1026:target_handle_connect()) lustre-MDT0000: Received LWP connection from 10.1.4.48@tcp, removing former export from 10.1.4.52@tcp 00000020:00080000:0.0:1436439856.246195:0:23692:0:(genops.c:1382:class_fail_export()) disconnecting export ffff8800798e9400/lustre-MDT0000-lwp-OST0003_UUID 00000020:00000080:0.0:1436439856.246204:0:23692:0:(genops.c:1215:class_disconnect()) disconnect: cookie 0x515596b0134eb7f7 00000100:00080000:0.0:1436439856.246211:0:23692:0:( import .c:1601:ptlrpc_cleanup_imp()) ffff880063495000 PA#}: changing import state from FULL to CLOSED 00000020:00080000:0.0:1436439856.246261:0:23692:0:(genops.c:1399:class_fail_export()) disconnected export ffff8800798e9400/lustre-MDT0000-lwp-OST0003_UUID 00000020:00000080:0.0:1436439856.246263:0:23692:0:(genops.c:814:class_export_put()) final put ffff8800798e9400/lustre-MDT0000-lwp-OST0003_UUID 00010000:00080000:0.0:1436439856.246290:0:23692:0:(ldlm_lib.c:1092:target_handle_connect()) lustre-MDT0000: connection from lustre-MDT0000-lwp-OST0003_UUID@10.1.4.48@tcp t0 exp ( null ) cur 1436439856 last 0 I see from the log that OST connected to master with different IP address during testing, do we failover OSTs while testing?
            emoly.liu Emoly Liu added a comment - I hit the similar issue: https://testing.hpdd.intel.com/test_sets/f878df9a-2661-11e5-8b33-5254006e85c2
            kit.westneat Kit Westneat (Inactive) added a comment - This test result looks somewhat similar: https://testing.hpdd.intel.com/test_sets/08918008-2437-11e5-b6b4-5254006e85c2

            Looks the used space is far below the quota limit, so we'd figure out if it really write to OST first:

            • If write on OST failed with EDQUOT either, this could be LU-4505, collecting log on OST could be helpful. (for file write, with D_QUOTA enabled)
            • If write on OST succeeded, we'd collect log to see why this client can't read it (as I mentioned in previous comment), and try the same test on another clean client to see if it can be reproduced.

            Anyway, I don't think this is related to current ticket, could you create a new ticket for this problem? Thanks.

            niu Niu Yawei (Inactive) added a comment - Looks the used space is far below the quota limit, so we'd figure out if it really write to OST first: If write on OST failed with EDQUOT either, this could be LU-4505 , collecting log on OST could be helpful. (for file write, with D_QUOTA enabled) If write on OST succeeded, we'd collect log to see why this client can't read it (as I mentioned in previous comment), and try the same test on another clean client to see if it can be reproduced. Anyway, I don't think this is related to current ticket, could you create a new ticket for this problem? Thanks.

            People

              niu Niu Yawei (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: