[LU-15129] sanity-quota test_75: 'write failed, expect succeed (2)' Created: 19/Oct/21  Updated: 25/Jan/24  Resolved: 25/Mar/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0, Lustre 2.15.3
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Sergey Cheremencev
Resolution: Fixed Votes: 0
Labels: failing_tests

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for S Buisson <sbuisson@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/310e8154-d9b2-4028-8e0a-fe683a01e469

test_75 failed with the following error:

write failed, expect succeed (2)

Test log shows:

Disk quotas for usr 60000 (uid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre    9052*  10240   20480    none       1       0       0       -
lustre-MDT0000_UUID
                      0       -       0       -       1       -       0       -
lustre-MDT0001_UUID
                      0       -       0       -       0       -       0       -
lustre-MDT0002_UUID
                      0       -       0       -       0       -       0       -
lustre-MDT0003_UUID
                      0       -    2052       -       0       -       0       -
lustre-OST0000_UUID
                      0       -      48       -       -       -       -       -
lustre-OST0001_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0002_UUID
                   9052       -    9060       -       -       -       -       -
lustre-OST0003_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0004_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0005_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0006_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0007_UUID
                      0       -       0       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 9108
Files for user (60000):
  File: /mnt/lustre
  Size: 12288     	Blocks: 24         IO Block: 4096   directory
Device: 2c54f966h/743766374d	Inode: 144115188193296385  Links: 5
Access: (0755/drwxr-xr-x)  Uid: (60000/quota_usr)   Gid: (60000/quota_usr)
Access: 1970-01-01 00:00:00.000000000 +0000
Modify: 2021-10-18 21:24:13.000000000 +0000
Change: 2021-10-18 21:24:13.000000000 +0000
 Birth: -
  File: /mnt/lustre/ffsx.sanity-dom.fsxgood
  Size: 0         	Blocks: 0          IO Block: 4194304 regular empty file
Device: 2c54f966h/743766374d	Inode: 144115675051327527  Links: 1
Access: (0644/-rw-r--r--)  Uid: (60000/quota_usr)   Gid: (60000/quota_usr)
Access: 2021-10-18 16:52:16.000000000 +0000
Modify: 2021-10-18 16:52:16.000000000 +0000
Change: 2021-10-18 16:52:16.000000000 +0000
 Birth: -
  File: /mnt/lustre/d75.sanity-quota
  Size: 4096      	Blocks: 8          IO Block: 4096   directory
Device: 2c54f966h/743766374d	Inode: 144116245459894494  Links: 2
Access: (0777/drwxrwxrwx)  Uid: (60000/quota_usr)   Gid: (60000/quota_usr)
Access: 2021-10-18 21:26:24.000000000 +0000
Modify: 2021-10-18 21:26:00.000000000 +0000
Change: 2021-10-18 21:26:00.000000000 +0000
 Birth: -
  File: /mnt/lustre/d75.sanity-quota/file
  Size: 9269248   	Blocks: 18104      IO Block: 4194304 regular file
Device: 2c54f966h/743766374d	Inode: 144116245459894498  Links: 1
Access: (0644/-rw-r--r--)  Uid: (60000/quota_usr)   Gid: (60000/quota_usr)
Access: 2021-10-18 21:26:00.000000000 +0000
Modify: 2021-10-18 21:26:23.000000000 +0000
Change: 2021-10-18 21:26:23.000000000 +0000
 Birth: -
CMD: trevis-66vm7 /usr/sbin/lctl get_param -n version 2>/dev/null
 sanity-quota test_75: @@@@@@ FAIL: write failed, expect succeed (2)

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-quota test_75 - write failed, expect succeed (2)



 Comments   
Comment by Andreas Dilger [ 20/Nov/21 ]

+1 on master: https://testing.whamcloud.com/test_sets/d800bf5f-7265-496a-a281-a5826bd4849c

Comment by Chris Horn [ 02/Dec/21 ]

+1 on master https://testing.whamcloud.com/test_sets/536ccaca-85a4-4c9a-a502-40933f735bcb

Comment by Artem Blagodarenko (Inactive) [ 22/Jan/22 ]

 +1 https://testing.whamcloud.com/test_sets/f4ca7c8b-f924-4de6-8de6-82ac0e1ab94d

Comment by Nikitas Angelinas [ 19/May/22 ]

+1 on master: https://testing.whamcloud.com/test_sets/7c686c53-2ed7-4645-b6e9-9ea6392d59d4

Comment by Arshad Hussain [ 21/Dec/22 ]

+1 on Master

https://testing.whamcloud.com/test_sets/ec4e6eb3-fc08-4789-99f1-89853a3421c6

Comment by Etienne Aujames [ 17/Feb/23 ]

+1 on master: https://testing.whamcloud.com/test_sets/585a6fbc-4dc5-4ba1-a591-47abfd633df9

Comment by Andreas Dilger [ 21/Feb/23 ]

It looks like this subtest failed 40/474 runs in the past week:
https://testing.whamcloud.com/search?test_set_script_id=61149410-4a46-11e0-a7f6-52540025f9af&sub_test_script_id=217bd2d8-23d7-4016-a521-48d60f5a070f&start_date=2023-02-14&end_date=2023-02-21&source=sub_tests#redirect

Sergey, could you please investigate.

Comment by Gerrit Updater [ 24/Feb/23 ]

"Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50129
Subject: LU-15129 tests: sanity-quota/test_75
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 70e5c1a89a941e2166b88146f13b929f7161532d

Comment by Sergey Cheremencev [ 28/Feb/23 ]

It fails because dd uses "oflag=sync". Due to that a client writes page by page instead of writing a big chunk of pages. It takes a time.

 00040000:04000000:0.0:1676593722.914518:0:713014:0:(qsd_handler.c:748:qsd_op_begin0()) $$$ op_begin space:12  qsd:lustre-OST0005 qtype:usr id:60000 enforced:1 granted: 8644 pending:0 waiting:0 req:0 usage: 8636 qunit:1024 qtune:512 edquot:0 default:no
00040000:04000000:0.0:1676593723.069069:0:713014:0:(qsd_handler.c:748:qsd_op_begin0()) $$$ op_begin space:12  qsd:lustre-OST0005 qtype:usr id:60000 enforced:1 granted: 8648 pending:0 waiting:0 req:0 usage: 8640 qunit:1024 qtune:512 edquot:0 default:no
00040000:04000000:0.0:1676593723.365384:0:713014:0:(qsd_handler.c:748:qsd_op_begin0()) $$$ op_begin space:12  qsd:lustre-OST0005 qtype:usr id:60000 enforced:1 granted: 8652 pending:0 waiting:0 req:0 usage: 8644 qunit:1024 qtune:512 edquot:0 default:no
...

Furthermore, when granted space becomes closer to soft_limit(i.e. over 9MB if soft_limit is 10MB), OST can not preacquire space anymore. Also OST could acquire only requested amount of space - see qmt_alloc_expand. Thus OST has to send quota acquire request at MDT for each BRW request from the client. Sometimes 20 seconds is not enough to write 10MB.

I'll send a patch to change test_dom_75.

Comment by Gerrit Updater [ 01/Mar/23 ]

"Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50164
Subject: LU-15129 tests: sanity-quota_75_dom fix
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b41175be794beac336a815fbc3a98c402f537cc9

Comment by Gerrit Updater [ 08/Mar/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50164/
Subject: LU-15129 tests: sanity-quota_75_dom fix
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7d05a687ee5d4f4b95585244a7f60394475fe0ba

Comment by Peter Jones [ 25/Mar/23 ]

Landed for 2.16

Comment by Gerrit Updater [ 30/May/23 ]

"Minh Diep <mdiep@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51168
Subject: LU-15129 tests: sanity-quota_75_dom fix
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: d73758030096243d4915b60d5988046ba44d3269

Generated at Sat Feb 10 03:15:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.