Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
local testing
-
3
-
9223372036854775807
Description
w/o the patch only 3 from 36 racer runs timed out. w/ the patch - 35 from 36 did.

Attachments
- bt-all.txt
- 558 kB
- serial.txt
- 66 kB
Issue Links
Activity
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40991/
Subject: LU-14187 osd-ldiskfs: fix locking in write commit
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f0f92773ee18a9da71cd27e3b5c32e5d318ed0d2
Sure, would you help attach debug for me? i could try to reproduce locally, but that is not fast than yours.
wshilong I can't reproduce that with clean master (local setup), but it can't be that some unluck conditions/timings get exposed better with the patch.
Alex, the first guess is this might be similar problem like LU-12702, anyway, would you get debug logs for it.
with the latest version I hit sanity-quota/3b very often:
== sanity-quota test 3b: Quota pools: Block soft limit (start timer, expires, stop timer) ============ 13:06:59 (1608710819) limit 4 glbl_limit 8 grace 20 glbl_grace 40 User quota in qpool1(soft limit:4 MB grace:20 seconds) Creating new pool Pool lustre.qpool1 created Adding targets to pool OST lustre-OST0000_UUID added to pool lustre.qpool1 OST lustre-OST0001_UUID added to pool lustre.qpool1 Trying to set grace for pool qpool1 Trying to set quota for pool qpool1 Waiting for local destroys to complete Creating test directory fail_val=0 fail_loc=0 Write up to soft limit running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [count=4] 4+0 records in 4+0 records out 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.205771 s, 20.4 MB/s Write to exceed soft limit running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [dd] [if=/dev/zero] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [bs=1K] [count=10] [seek=4096] 10+0 records in 10+0 records out 10240 bytes (10 kB, 10 KiB) copied, 0.00531433 s, 1.9 MB/s mmap write when over soft limit running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [multiop] [/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0.mmap] [OT40960SMW] Disk quotas for usr quota_usr (uid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 4148 8192 0 - 2 0 0 - lustre-MDT0000_UUID 0 - 0 - 2 - 0 - lustre-OST0000_UUID 4108 - 4144 - - - - - lustre-OST0001_UUID 40* - 40 - - - - - Total allocated inode limit: 0, total allocated block limit: 4184 Disk quotas for grp quota_usr (gid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 4148 0 0 - 2 0 0 - lustre-MDT0000_UUID 0 - 0 - 2 - 0 - lustre-OST0000_UUID 4108 - 0 - - - - - lustre-OST0001_UUID 40 - 0 - - - - - Total allocated inode limit: 0, total allocated block limit: 0 Disk quotas for prj 1000 (pid 1000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 0 0 0 - 0 0 0 - lustre-MDT0000_UUID 0 - 0 - 0 - 0 - lustre-OST0000_UUID 0 - 0 - - - - - lustre-OST0001_UUID 0 - 0 - - - - - Total allocated inode limit: 0, total allocated block limit: 0 Block grace time: 40s; Inode grace time: 1w Block grace time: 1w; Inode grace time: 1w Block grace time: 1w; Inode grace time: 1w Write before timer goes off running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [dd] [if=/dev/zero] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [bs=1K] [count=10] [seek=5120] 10+0 records in 10+0 records out 10240 bytes (10 kB, 10 KiB) copied, 0.011448 s, 894 kB/s Quota info for qpool1: Disk quotas for usr quota_usr (uid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 4160* 4096 0 20s 2 0 0 - Sleep through grace ... ...sleep 25 seconds Disk quotas for usr quota_usr (uid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 4160 8192 0 - 2 0 0 - lustre-MDT0000_UUID 0 - 0 - 2 - 0 - lustre-OST0000_UUID 4120 - 4144 - - - - - lustre-OST0001_UUID 40* - 40 - - - - - Total allocated inode limit: 0, total allocated block limit: 4184 Disk quotas for grp quota_usr (gid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 4160 0 0 - 2 0 0 - lustre-MDT0000_UUID 0 - 0 - 2 - 0 - lustre-OST0000_UUID 4120 - 0 - - - - - lustre-OST0001_UUID 40 - 0 - - - - - Total allocated inode limit: 0, total allocated block limit: 0 Disk quotas for prj 1000 (pid 1000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 0 0 0 - 0 0 0 - lustre-MDT0000_UUID 0 - 0 - 0 - 0 - lustre-OST0000_UUID 0 - 0 - - - - - lustre-OST0001_UUID 0 - 0 - - - - - Total allocated inode limit: 0, total allocated block limit: 0 Block grace time: 40s; Inode grace time: 1w Block grace time: 1w; Inode grace time: 1w Block grace time: 1w; Inode grace time: 1w Write after timer goes off running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [dd] [if=/dev/zero] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [bs=1K] [count=10] [seek=6144] 10+0 records in 10+0 records out 10240 bytes (10 kB, 10 KiB) copied, 0.00177872 s, 5.8 MB/s Write after cancel lru locks running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [dd] [if=/dev/zero] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [bs=1K] [count=10] [seek=7168] 10+0 records in 10+0 records out 10240 bytes (10 kB, 10 KiB) copied, 0.00480989 s, 2.1 MB/s Disk quotas for usr quota_usr (uid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 4172 8192 0 - 2 0 0 - lustre-MDT0000_UUID 0 - 0 - 2 - 0 - lustre-OST0000_UUID 4132 - 4144 - - - - - lustre-OST0001_UUID 40* - 40 - - - - - Total allocated inode limit: 0, total allocated block limit: 4184 Files for user (quota_usr): File: /mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0.mmap Size: 40960 Blocks: 80 IO Block: 4194304 regular file Device: 2c54f966h/743766374d Inode: 144115238826934298 Links: 1 Access: (0644/-rw-r--r--) Uid: (60000/quota_usr) Gid: (60000/quota_usr) Access: 2020-12-23 13:07:15.000000000 +0500 Modify: 2020-12-23 13:07:15.000000000 +0500 Change: 2020-12-23 13:07:15.000000000 +0500 Birth: 2020-12-23 13:07:15.000000000 +0500 File: /mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0 Size: 7350272 Blocks: 8264 IO Block: 4194304 regular file Device: 2c54f966h/743766374d Inode: 144115238826934296 Links: 1 Access: (0644/-rw-r--r--) Uid: (60000/quota_usr) Gid: (60000/quota_usr) Access: 2020-12-23 13:07:14.000000000 +0500 Modify: 2020-12-23 13:07:41.000000000 +0500 Change: 2020-12-23 13:07:41.000000000 +0500 Birth: 2020-12-23 13:07:14.000000000 +0500 Disk quotas for grp quota_usr (gid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 4172 0 0 - 2 0 0 - lustre-MDT0000_UUID 0 - 0 - 2 - 0 - lustre-OST0000_UUID 4132 - 0 - - - - - lustre-OST0001_UUID 40 - 0 - - - - - Total allocated inode limit: 0, total allocated block limit: 0 Files for group (quota_usr): File: /mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0.mmap Size: 40960 Blocks: 80 IO Block: 4194304 regular file Device: 2c54f966h/743766374d Inode: 144115238826934298 Links: 1 Access: (0644/-rw-r--r--) Uid: (60000/quota_usr) Gid: (60000/quota_usr) Access: 2020-12-23 13:07:15.000000000 +0500 Modify: 2020-12-23 13:07:15.000000000 +0500 Change: 2020-12-23 13:07:15.000000000 +0500 Birth: 2020-12-23 13:07:15.000000000 +0500 File: /mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0 Size: 7350272 Blocks: 8264 IO Block: 4194304 regular file Device: 2c54f966h/743766374d Inode: 144115238826934296 Links: 1 Access: (0644/-rw-r--r--) Uid: (60000/quota_usr) Gid: (60000/quota_usr) Access: 2020-12-23 13:07:14.000000000 +0500 Modify: 2020-12-23 13:07:41.000000000 +0500 Change: 2020-12-23 13:07:41.000000000 +0500 Birth: 2020-12-23 13:07:14.000000000 +0500 Disk quotas for prj 1000 (pid 1000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 0 0 0 - 0 0 0 - lustre-MDT0000_UUID 0 - 0 - 0 - 0 - lustre-OST0000_UUID 0 - 0 - - - - - lustre-OST0001_UUID 0 - 0 - - - - - Total allocated inode limit: 0, total allocated block limit: 0 Files for project (1000): sanity-quota test_3b: @@@@@@ FAIL: write success, but expect EDQUOT Trace dump: = ./../tests/test-framework.sh:6273:error() = sanity-quota.sh:159:quota_error() = sanity-quota.sh:1297:test_block_soft() = sanity-quota.sh:1435:test_3b() = ./../tests/test-framework.sh:6576:run_one() = ./../tests/test-framework.sh:6623:run_one_logged() = ./../tests/test-framework.sh:6450:run_test() = sanity-quota.sh:1494:main() Dumping lctl log to /tmp/ltest-logs/sanity-quota.test_3b.*.1608710861.log Dumping logs only on local client. Resetting fail_loc on all nodes...done. Delete files... Wait for unlink objects finished... Waiting for local destroys to complete Destroy the created pools: qpool1 lustre.qpool1 OST lustre-OST0000_UUID removed from pool lustre.qpool1 OST lustre-OST0001_UUID removed from pool lustre.qpool1 Pool lustre.qpool1 destroyed FAIL 3b (85s)
Wang Shilong (wshilong@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40991
Subject: LU-14187 osd-ldiskfs: fix locking in write commit
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: be8337f36c01227e8a959041146577134a402977
bzzzyup, i saw your pushed patch, i'll try to fix by moving restart to ofd layer.
It looks currently, some ofd_write_lock is held outside transaction, some inside, so maybe that is reason why it could timeout even without patch
I found this only in fallocate path which was landed just recently and this is clearly a bug.
It looks currently, some ofd_write_lock is held outside transaction, some inside, so maybe that is reason why it could timeout even without patch
What do you think we move IO restart upper layer(in OFD layer), or we should implement similar way like truncate?
Landed for 2.14