[LU-14187] LU-14134 causes many racer timeouts Created: 07/Dec/20  Updated: 01/Feb/21  Resolved: 05/Jan/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Alex Zhuravlev Assignee: Wang Shilong (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

local testing


Attachments: Text File bt-all.txt     Text File sanity-quota.test_3b.debug_log.tmp.1608711135.log     Text File serial.txt    
Issue Links:
Related
is related to LU-14134 reduce credits for new writing potent... Resolved
is related to LU-14304 parallel-scale-nfsv3 test racer_on_nf... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

w/o the patch only 3 from 36 racer runs timed out. w/ the patch - 35 from 36 did.



 Comments   
Comment by Alex Zhuravlev [ 07/Dec/20 ]
PID: 1677   TASK: ffff88012f27e000  CPU: 1   COMMAND: "ll_ost_io00_030"
 #0 [ffff880105c5ba80] __schedule at ffffffff816af9ad
    /home/lustre/linux-4.18.0-32.el8/kernel/sched/core.c: 2853
 #1 [ffff880105c5bb18] schedule at ffffffff816b0234
    /home/lustre/linux-4.18.0-32.el8/./arch/x86/include/asm/preempt.h: 81
 #2 [ffff880105c5bb28] rwsem_down_write_failed at ffffffff816b479b
    /home/lustre/linux-4.18.0-32.el8/./arch/x86/include/asm/current.h: 15
 #3 [ffff880105c5bbb8] call_rwsem_down_write_failed at ffffffff816abb13
    /home/lustre/linux-4.18.0-32.el8/arch/x86/lib/rwsem.S: 118
 #4 [ffff880105c5bbf8] down_write at ffffffff816b3a34
    /home/lustre/linux-4.18.0-32.el8/./arch/x86/include/asm/rwsem.h: 142
 #5 [ffff880105c5bc18] osd_trunc_lock at ffffffffa0b2da21 [osd_ldiskfs]
    /home/lustre/master-mine/lustre/osd-ldiskfs/osd_io.c: 2530
 #6 [ffff880105c5bc48] osd_declare_punch at ffffffffa0b2e0fe [osd_ldiskfs]
    /home/lustre/master-mine/libcfs/include/libcfs/libcfs_debug.h: 155
 #7 [ffff880105c5bc70] ofd_object_punch at ffffffffa0e824f9 [ofd]
    /home/lustre/master-mine/lustre/include/dt_object.h: 2570
 #8 [ffff880105c5bd48] ofd_punch_hdl at ffffffffa0e6c54c [ofd]
    /home/lustre/master-mine/lustre/ofd/ofd_dev.c: 2125

PID: 6515   TASK: ffff88012c996000  CPU: 1   COMMAND: "ll_ost_io00_006"
 #0 [ffff88010d40b9a8] __schedule at ffffffff816af9ad
    /home/lustre/linux-4.18.0-32.el8/kernel/sched/core.c: 2853
 #1 [ffff88010d40ba40] schedule at ffffffff816b0234
    /home/lustre/linux-4.18.0-32.el8/./arch/x86/include/asm/preempt.h: 81
 #2 [ffff88010d40ba50] wait_transaction_locked at ffffffff812ef568
    /home/lustre/linux-4.18.0-32.el8/fs/jbd2/transaction.c: 160
 #3 [ffff88010d40baa8] add_transaction_credits at ffffffff812ef700
    /home/lustre/linux-4.18.0-32.el8/fs/jbd2/transaction.c: 188
 #4 [ffff88010d40bb00] start_this_handle at ffffffff812efa1c
    /home/lustre/linux-4.18.0-32.el8/fs/jbd2/transaction.c: 357
 #5 [ffff88010d40bb80] osd_write_commit at ffffffffa0b2ccc0 [osd_ldiskfs]
    /home/lustre/master-mine/lustre/osd-ldiskfs/osd_io.c: 1476
 #6 [ffff88010d40bbf8] ofd_commitrw at ffffffffa0e892df [ofd]
    /home/lustre/master-mine/lustre/include/dt_object.h: 2530
 #7 [ffff88010d40bcb0] tgt_brw_write at ffffffffa05c756c [ptlrpc]
    /home/lustre/master-mine/libcfs/include/libcfs/libcfs_debug.h: 155

basically LU-14134 breaks transaction-locks ordering as restart (which is just start in essense) is done with the lock held

Comment by Wang Shilong (Inactive) [ 08/Dec/20 ]

bzzz Thanks for helping track it down.
Would you be specific a bit more? It looks now we hold trunc lock before start transaction, and for restart case, trunc lock is still held?

Comment by Alex Zhuravlev [ 14/Dec/20 ]

while trunc lock was involved in this specific example, I think the issue is not directly related to that one - regular dt locks (e.g. ofd_write_lock() and ofd_read_lock()) are taken within a running transaction and transaction restart breaks this. actually that's the reason why truncate was implemented via osd_trans_stop().

Comment by Wang Shilong (Inactive) [ 15/Dec/20 ]

What do you think we move IO restart upper layer(in OFD layer), or we should implement similar way like truncate?

Comment by Wang Shilong (Inactive) [ 15/Dec/20 ]

It looks currently, some ofd_write_lock is held outside transaction, some inside, so maybe that is reason why it could timeout even without patch

Comment by Alex Zhuravlev [ 15/Dec/20 ]

It looks currently, some ofd_write_lock is held outside transaction, some inside, so maybe that is reason why it could timeout even without patch

I found this only in fallocate path which was landed just recently and this is clearly a bug.

Comment by Wang Shilong (Inactive) [ 15/Dec/20 ]

bzzzyup, i saw your pushed patch, i'll try to fix by moving restart to ofd layer.

Comment by Gerrit Updater [ 16/Dec/20 ]

Wang Shilong (wshilong@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40991
Subject: LU-14187 osd-ldiskfs: fix locking in write commit
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: be8337f36c01227e8a959041146577134a402977

Comment by Alex Zhuravlev [ 23/Dec/20 ]

with the latest version I hit sanity-quota/3b very often:

== sanity-quota test 3b: Quota pools: Block soft limit (start timer, expires, stop timer) ============ 13:06:59 (1608710819)
limit 4 glbl_limit 8
grace 20 glbl_grace 40
User quota in qpool1(soft limit:4 MB  grace:20 seconds)
Creating new pool
Pool lustre.qpool1 created
Adding targets to pool
OST lustre-OST0000_UUID added to pool lustre.qpool1
OST lustre-OST0001_UUID added to pool lustre.qpool1
Trying to set grace for pool qpool1
Trying to set quota for pool qpool1
Waiting for local destroys to complete
Creating test directory
fail_val=0
fail_loc=0
Write up to soft limit
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
 [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [count=4]
4+0 records in
4+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.205771 s, 20.4 MB/s
Write to exceed soft limit
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
 [dd] [if=/dev/zero] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [bs=1K] [count=10] [seek=4096]
10+0 records in
10+0 records out
10240 bytes (10 kB, 10 KiB) copied, 0.00531433 s, 1.9 MB/s
mmap write when over soft limit
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
 [multiop] [/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0.mmap] [OT40960SMW]
Disk quotas for usr quota_usr (uid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre    4148    8192       0       -       2       0       0       -
lustre-MDT0000_UUID
                      0       -       0       -       2       -       0       -
lustre-OST0000_UUID
                   4108       -    4144       -       -       -       -       -
lustre-OST0001_UUID
                     40*      -      40       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 4184
Disk quotas for grp quota_usr (gid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre    4148       0       0       -       2       0       0       -
lustre-MDT0000_UUID
                      0       -       0       -       2       -       0       -
lustre-OST0000_UUID
                   4108       -       0       -       -       -       -       -
lustre-OST0001_UUID
                     40       -       0       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 0
Disk quotas for prj 1000 (pid 1000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre       0       0       0       -       0       0       0       -
lustre-MDT0000_UUID
                      0       -       0       -       0       -       0       -
lustre-OST0000_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0001_UUID
                      0       -       0       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 0
Block grace time: 40s; Inode grace time: 1w
Block grace time: 1w; Inode grace time: 1w
Block grace time: 1w; Inode grace time: 1w
Write before timer goes off
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
 [dd] [if=/dev/zero] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [bs=1K] [count=10] [seek=5120]
10+0 records in
10+0 records out
10240 bytes (10 kB, 10 KiB) copied, 0.011448 s, 894 kB/s
Quota info for qpool1:
Disk quotas for usr quota_usr (uid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre    4160*   4096       0     20s       2       0       0       -
Sleep through grace ...
...sleep 25 seconds
Disk quotas for usr quota_usr (uid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre    4160    8192       0       -       2       0       0       -
lustre-MDT0000_UUID
                      0       -       0       -       2       -       0       -
lustre-OST0000_UUID
                   4120       -    4144       -       -       -       -       -
lustre-OST0001_UUID
                     40*      -      40       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 4184
Disk quotas for grp quota_usr (gid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre    4160       0       0       -       2       0       0       -
lustre-MDT0000_UUID
                      0       -       0       -       2       -       0       -
lustre-OST0000_UUID
                   4120       -       0       -       -       -       -       -
lustre-OST0001_UUID
                     40       -       0       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 0
Disk quotas for prj 1000 (pid 1000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre       0       0       0       -       0       0       0       -
lustre-MDT0000_UUID
                      0       -       0       -       0       -       0       -
lustre-OST0000_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0001_UUID
                      0       -       0       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 0
Block grace time: 40s; Inode grace time: 1w
Block grace time: 1w; Inode grace time: 1w
Block grace time: 1w; Inode grace time: 1w
Write after timer goes off
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
 [dd] [if=/dev/zero] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [bs=1K] [count=10] [seek=6144]
10+0 records in
10+0 records out
10240 bytes (10 kB, 10 KiB) copied, 0.00177872 s, 5.8 MB/s
Write after cancel lru locks
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
 [dd] [if=/dev/zero] [of=/mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0] [bs=1K] [count=10] [seek=7168]
10+0 records in
10+0 records out
10240 bytes (10 kB, 10 KiB) copied, 0.00480989 s, 2.1 MB/s
Disk quotas for usr quota_usr (uid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre    4172    8192       0       -       2       0       0       -
lustre-MDT0000_UUID
                      0       -       0       -       2       -       0       -
lustre-OST0000_UUID
                   4132       -    4144       -       -       -       -       -
lustre-OST0001_UUID
                     40*      -      40       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 4184
Files for user (quota_usr):
  File: /mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0.mmap
  Size: 40960     	Blocks: 80         IO Block: 4194304 regular file
Device: 2c54f966h/743766374d	Inode: 144115238826934298  Links: 1
Access: (0644/-rw-r--r--)  Uid: (60000/quota_usr)   Gid: (60000/quota_usr)
Access: 2020-12-23 13:07:15.000000000 +0500
Modify: 2020-12-23 13:07:15.000000000 +0500
Change: 2020-12-23 13:07:15.000000000 +0500
 Birth: 2020-12-23 13:07:15.000000000 +0500
  File: /mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0
  Size: 7350272   	Blocks: 8264       IO Block: 4194304 regular file
Device: 2c54f966h/743766374d	Inode: 144115238826934296  Links: 1
Access: (0644/-rw-r--r--)  Uid: (60000/quota_usr)   Gid: (60000/quota_usr)
Access: 2020-12-23 13:07:14.000000000 +0500
Modify: 2020-12-23 13:07:41.000000000 +0500
Change: 2020-12-23 13:07:41.000000000 +0500
 Birth: 2020-12-23 13:07:14.000000000 +0500
Disk quotas for grp quota_usr (gid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre    4172       0       0       -       2       0       0       -
lustre-MDT0000_UUID
                      0       -       0       -       2       -       0       -
lustre-OST0000_UUID
                   4132       -       0       -       -       -       -       -
lustre-OST0001_UUID
                     40       -       0       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 0
Files for group (quota_usr):
  File: /mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0.mmap
  Size: 40960     	Blocks: 80         IO Block: 4194304 regular file
Device: 2c54f966h/743766374d	Inode: 144115238826934298  Links: 1
Access: (0644/-rw-r--r--)  Uid: (60000/quota_usr)   Gid: (60000/quota_usr)
Access: 2020-12-23 13:07:15.000000000 +0500
Modify: 2020-12-23 13:07:15.000000000 +0500
Change: 2020-12-23 13:07:15.000000000 +0500
 Birth: 2020-12-23 13:07:15.000000000 +0500
  File: /mnt/lustre/d3b.sanity-quota/f3b.sanity-quota-0
  Size: 7350272   	Blocks: 8264       IO Block: 4194304 regular file
Device: 2c54f966h/743766374d	Inode: 144115238826934296  Links: 1
Access: (0644/-rw-r--r--)  Uid: (60000/quota_usr)   Gid: (60000/quota_usr)
Access: 2020-12-23 13:07:14.000000000 +0500
Modify: 2020-12-23 13:07:41.000000000 +0500
Change: 2020-12-23 13:07:41.000000000 +0500
 Birth: 2020-12-23 13:07:14.000000000 +0500
Disk quotas for prj 1000 (pid 1000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre       0       0       0       -       0       0       0       -
lustre-MDT0000_UUID
                      0       -       0       -       0       -       0       -
lustre-OST0000_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0001_UUID
                      0       -       0       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 0
Files for project (1000):
 sanity-quota test_3b: @@@@@@ FAIL: write success, but expect EDQUOT 
  Trace dump:
  = ./../tests/test-framework.sh:6273:error()
  = sanity-quota.sh:159:quota_error()
  = sanity-quota.sh:1297:test_block_soft()
  = sanity-quota.sh:1435:test_3b()
  = ./../tests/test-framework.sh:6576:run_one()
  = ./../tests/test-framework.sh:6623:run_one_logged()
  = ./../tests/test-framework.sh:6450:run_test()
  = sanity-quota.sh:1494:main()
Dumping lctl log to /tmp/ltest-logs/sanity-quota.test_3b.*.1608710861.log
Dumping logs only on local client.
Resetting fail_loc on all nodes...done.
Delete files...
Wait for unlink objects finished...
Waiting for local destroys to complete
Destroy the created pools: qpool1
lustre.qpool1
OST lustre-OST0000_UUID removed from pool lustre.qpool1
OST lustre-OST0001_UUID removed from pool lustre.qpool1
Pool lustre.qpool1 destroyed
FAIL 3b (85s)
Comment by Wang Shilong (Inactive) [ 23/Dec/20 ]

Alex, the first guess is this might be similar problem like LU-12702, anyway, would you get debug logs for it.

Comment by Alex Zhuravlev [ 23/Dec/20 ]

wshilong I can't reproduce that with clean master (local setup), but it can't be that some unluck conditions/timings get exposed better with the patch.

Comment by Wang Shilong (Inactive) [ 23/Dec/20 ]

Sure, would you help attach debug for me? i could try to reproduce locally, but that is not fast than yours.

Comment by Gerrit Updater [ 05/Jan/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40991/
Subject: LU-14187 osd-ldiskfs: fix locking in write commit
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f0f92773ee18a9da71cd27e3b5c32e5d318ed0d2

Comment by Peter Jones [ 05/Jan/21 ]

Landed for 2.14

Generated at Sat Feb 10 03:07:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.