Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
local testing
-
3
-
9223372036854775807
Description
w/o the patch only 3 from 36 racer runs timed out. w/ the patch - 35 from 36 did.

Attachments
- bt-all.txt
- 558 kB
- serial.txt
- 66 kB
Issue Links
Activity
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40991/
Subject: LU-14187 osd-ldiskfs: fix locking in write commit
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f0f92773ee18a9da71cd27e3b5c32e5d318ed0d2
Sure, would you help attach debug for me? i could try to reproduce locally, but that is not fast than yours.
wshilong I can't reproduce that with clean master (local setup), but it can't be that some unluck conditions/timings get exposed better with the patch.
Alex, the first guess is this might be similar problem like LU-12702, anyway, would you get debug logs for it.
with the latest version I hit sanity-quota/3b very often:
== sanity-quota test 3b: Quota pools: Block soft limit (start timer, expires, stop timer) ============ 13:06:59 (1608710819) limit 4 glbl_limit 8 grace 20 glbl_grace 40 User quota in qpool1(soft limit:4 MB grace:20 seconds) Creating new pool Pool lustre.qpool1 created Adding targets to pool OST lustre-OST0000_UUID added to pool lustre.qpool1 OST lustre-OST0001_UUID added to pool lustre.qpool1 Trying to set grace for pool qpool1 Trying to set quota for pool qpool1 Waiting for local destroys to complete Creating test directory fail_val=0 fail_loc=0 Write up to soft limit running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [count=4] 4+0 records in 4+0 records out 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.205771 s, 20.4 MB/s Write to exceed soft limit running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [dd] [if=/dev/zero] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [bs=1K] [count=10] [seek=4096] 10+0 records in 10+0 records out 10240 bytes (10 kB, 10 KiB) copied, 0.00531433 s, 1.9 MB/s mmap write when over soft limit running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [multiop] [/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0.mmap] [OT40960SMW] Disk quotas for usr quota_usr (uid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 4148 8192 0 - 2 0 0 - lustre-MDT0000_UUID 0 - 0 - 2 - 0 - lustre-OST0000_UUID 4108 - 4144 - - - - - lustre-OST0001_UUID 40* - 40 - - - - - Total allocated inode limit: 0, total allocated block limit: 4184 Disk quotas for grp quota_usr (gid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 4148 0 0 - 2 0 0 - lustre-MDT0000_UUID 0 - 0 - 2 - 0 - lustre-OST0000_UUID 4108 - 0 - - - - - lustre-OST0001_UUID 40 - 0 - - - - - Total allocated inode limit: 0, total allocated block limit: 0 Disk quotas for prj 1000 (pid 1000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 0 0 0 - 0 0 0 - lustre-MDT0000_UUID 0 - 0 - 0 - 0 - lustre-OST0000_UUID 0 - 0 - - - - - lustre-OST0001_UUID 0 - 0 - - - - - Total allocated inode limit: 0, total allocated block limit: 0 Block grace time: 40s; Inode grace time: 1w Block grace time: 1w; Inode grace time: 1w Block grace time: 1w; Inode grace time: 1w Write before timer goes off running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [dd] [if=/dev/zero] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [bs=1K] [count=10] [seek=5120] 10+0 records in 10+0 records out 10240 bytes (10 kB, 10 KiB) copied, 0.011448 s, 894 kB/s Quota info for qpool1: Disk quotas for usr quota_usr (uid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 4160* 4096 0 20s 2 0 0 - Sleep through grace ... ...sleep 25 seconds Disk quotas for usr quota_usr (uid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 4160 8192 0 - 2 0 0 - lustre-MDT0000_UUID 0 - 0 - 2 - 0 - lustre-OST0000_UUID 4120 - 4144 - - - - - lustre-OST0001_UUID 40* - 40 - - - - - Total allocated inode limit: 0, total allocated block limit: 4184 Disk quotas for grp quota_usr (gid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 4160 0 0 - 2 0 0 - lustre-MDT0000_UUID 0 - 0 - 2 - 0 - lustre-OST0000_UUID 4120 - 0 - - - - - lustre-OST0001_UUID 40 - 0 - - - - - Total allocated inode limit: 0, total allocated block limit: 0 Disk quotas for prj 1000 (pid 1000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 0 0 0 - 0 0 0 - lustre-MDT0000_UUID 0 - 0 - 0 - 0 - lustre-OST0000_UUID 0 - 0 - - - - - lustre-OST0001_UUID 0 - 0 - - - - - Total allocated inode limit: 0, total allocated block limit: 0 Block grace time: 40s; Inode grace time: 1w Block grace time: 1w; Inode grace time: 1w Block grace time: 1w; Inode grace time: 1w Write after timer goes off running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [dd] [if=/dev/zero] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [bs=1K] [count=10] [seek=6144] 10+0 records in 10+0 records out 10240 bytes (10 kB, 10 KiB) copied, 0.00177872 s, 5.8 MB/s Write after cancel lru locks running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [dd] [if=/dev/zero] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [bs=1K] [count=10] [seek=7168] 10+0 records in 10+0 records out 10240 bytes (10 kB, 10 KiB) copied, 0.00480989 s, 2.1 MB/s Disk quotas for usr quota_usr (uid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 4172 8192 0 - 2 0 0 - lustre-MDT0000_UUID 0 - 0 - 2 - 0 - lustre-OST0000_UUID 4132 - 4144 - - - - - lustre-OST0001_UUID 40* - 40 - - - - - Total allocated inode limit: 0, total allocated block limit: 4184 Files for user (quota_usr): File: /mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0.mmap Size: 40960 Blocks: 80 IO Block: 4194304 regular file Device: 2c54f966h/743766374d Inode: 144115238826934298 Links: 1 Access: (0644/-rw-r--r--) Uid: (60000/quota_usr) Gid: (60000/quota_usr) Access: 2020-12-23 13:07:15.000000000 +0500 Modify: 2020-12-23 13:07:15.000000000 +0500 Change: 2020-12-23 13:07:15.000000000 +0500 Birth: 2020-12-23 13:07:15.000000000 +0500 File: /mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0 Size: 7350272 Blocks: 8264 IO Block: 4194304 regular file Device: 2c54f966h/743766374d Inode: 144115238826934296 Links: 1 Access: (0644/-rw-r--r--) Uid: (60000/quota_usr) Gid: (60000/quota_usr) Access: 2020-12-23 13:07:14.000000000 +0500 Modify: 2020-12-23 13:07:41.000000000 +0500 Change: 2020-12-23 13:07:41.000000000 +0500 Birth: 2020-12-23 13:07:14.000000000 +0500 Disk quotas for grp quota_usr (gid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 4172 0 0 - 2 0 0 - lustre-MDT0000_UUID 0 - 0 - 2 - 0 - lustre-OST0000_UUID 4132 - 0 - - - - - lustre-OST0001_UUID 40 - 0 - - - - - Total allocated inode limit: 0, total allocated block limit: 0 Files for group (quota_usr): File: /mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0.mmap Size: 40960 Blocks: 80 IO Block: 4194304 regular file Device: 2c54f966h/743766374d Inode: 144115238826934298 Links: 1 Access: (0644/-rw-r--r--) Uid: (60000/quota_usr) Gid: (60000/quota_usr) Access: 2020-12-23 13:07:15.000000000 +0500 Modify: 2020-12-23 13:07:15.000000000 +0500 Change: 2020-12-23 13:07:15.000000000 +0500 Birth: 2020-12-23 13:07:15.000000000 +0500 File: /mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0 Size: 7350272 Blocks: 8264 IO Block: 4194304 regular file Device: 2c54f966h/743766374d Inode: 144115238826934296 Links: 1 Access: (0644/-rw-r--r--) Uid: (60000/quota_usr) Gid: (60000/quota_usr) Access: 2020-12-23 13:07:14.000000000 +0500 Modify: 2020-12-23 13:07:41.000000000 +0500 Change: 2020-12-23 13:07:41.000000000 +0500 Birth: 2020-12-23 13:07:14.000000000 +0500 Disk quotas for prj 1000 (pid 1000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 0 0 0 - 0 0 0 - lustre-MDT0000_UUID 0 - 0 - 0 - 0 - lustre-OST0000_UUID 0 - 0 - - - - - lustre-OST0001_UUID 0 - 0 - - - - - Total allocated inode limit: 0, total allocated block limit: 0 Files for project (1000): sanity-quota test_3b: @@@@@@ FAIL: write success, but expect EDQUOT Trace dump: = ./../tests/test-framework.sh:6273:error() = sanity-quota.sh:159:quota_error() = sanity-quota.sh:1297:test_block_soft() = sanity-quota.sh:1435:test_3b() = ./../tests/test-framework.sh:6576:run_one() = ./../tests/test-framework.sh:6623:run_one_logged() = ./../tests/test-framework.sh:6450:run_test() = sanity-quota.sh:1494:main() Dumping lctl log to /tmp/ltest-logs/sanity-quota.test_3b.*.1608710861.log Dumping logs only on local client. Resetting fail_loc on all nodes...done. Delete files... Wait for unlink objects finished... Waiting for local destroys to complete Destroy the created pools: qpool1 lustre.qpool1 OST lustre-OST0000_UUID removed from pool lustre.qpool1 OST lustre-OST0001_UUID removed from pool lustre.qpool1 Pool lustre.qpool1 destroyed FAIL 3b (85s)
Wang Shilong (wshilong@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40991
Subject: LU-14187 osd-ldiskfs: fix locking in write commit
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: be8337f36c01227e8a959041146577134a402977
bzzzyup, i saw your pushed patch, i'll try to fix by moving restart to ofd layer.
It looks currently, some ofd_write_lock is held outside transaction, some inside, so maybe that is reason why it could timeout even without patch
I found this only in fallocate path which was landed just recently and this is clearly a bug.
It looks currently, some ofd_write_lock is held outside transaction, some inside, so maybe that is reason why it could timeout even without patch
Landed for 2.14