[LU-10048] osd-ldiskfs to truncate outside of main transaction Created: 29/Sep/17 Updated: 07/Jul/22 Resolved: 27/Aug/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.10.4, Lustre 2.10.5 |
| Fix Version/s: | Lustre 2.13.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alex Zhuravlev | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
this is needed to implement (transaction first; locking next) order to unify locking among MDT/OST/OUT |
| Comments |
| Comment by Andreas Dilger [ 27/Oct/17 ] |
|
The https://review.whamcloud.com/27488 patch is for ldiskfs, while the |
| Comment by Gerrit Updater [ 13/Feb/18 ] |
|
Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: https://review.whamcloud.com/31293 |
| Comment by Gerrit Updater [ 14/Jun/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27488/ |
| Comment by Andreas Dilger [ 08/Oct/18 ] |
|
Still one more patch to land. |
| Comment by Lukasz Flis [ 10/Oct/18 ] |
|
is there a backport for b2_10 available or planned? Alex pointed this issue as duplicate of We are experiencing MDT/OST lock-ups on 2_10_5 few times a day in the worst case
|
| Comment by Gerrit Updater [ 06/Nov/18 ] |
|
Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33586 |
| Comment by Gerrit Updater [ 06/Nov/18 ] |
|
Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33587 |
| Comment by Mahmoud Hanafi [ 16/Nov/18 ] |
|
Are the 2.10 ok to use? or do they still need additional work?
|
| Comment by Peter Jones [ 17/Nov/18 ] |
|
Mahmoud I would recommend holding off for now Peter |
| Comment by Andreas Dilger [ 20/Nov/18 ] |
|
I think the https://review.whamcloud.com/33586 patch " |
| Comment by Gerrit Updater [ 27/Aug/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/31293/ |
| Comment by Peter Jones [ 27/Aug/19 ] |
|
So...is this ok to mark as resolved now? |
| Comment by Peter Jones [ 27/Aug/19 ] |
|
Andreas thinks yes |
| Comment by Gerrit Updater [ 04/Mar/20 ] |
|
Mark Roper (markroper@gmail.com) uploaded a new patch: https://review.whamcloud.com/37797 |
| Comment by Aurelien Degremont (Inactive) [ 05/Mar/20 ] |
|
We submitted a backport of this patch to b2_10 if anybody is looking for it. @Lukasz, if you are still looking for it
|
| Comment by Gerrit Updater [ 12/Apr/21 ] |
|
Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/43277 |
| Comment by Etienne Aujames [ 12/Apr/21 ] |
|
Hello, I have backported the https://review.whamcloud.com/43277 (" It seems we trigger that bug in 2.12.6 with external journal (on flash dev) for rotational disk while several migrate thread is running on a robinhood client (512 thread). jbd2 journal seems to be locked when the transaction is in T_LOCKED. transaction_t.t_state = T_LOCKED transaction_t.t_handle_count = 40 transaction_t.t_updates = 1 nbr of task in j_wait_transaction_locked = 324 nbr of task in j_wait_update = 0 |
| Comment by Alex Zhuravlev [ 12/Apr/21 ] |
|
there is yet another patch under |
| Comment by Alex Zhuravlev [ 12/Apr/21 ] |
|
as for fast journal - we've got number of nodes running all the tests on RAM-backed devices 24h a day. |
| Comment by Etienne Aujames [ 12/Apr/21 ] |
|
The " |
| Comment by Alex Zhuravlev [ 12/Apr/21 ] |
|
please, generate full backtrace ( echo t >/proc/sysrq-trigger ) and attach to the ticket |
| Comment by Etienne Aujames [ 12/Apr/21 ] |
|
The issue seems to occurs relatively often with a lot of migrate threads (today 4 times on different OSTs). We will test the backports quickly on a production environment (maybe this week). I will try to get backtrace from the crashdump (manually triggered) tomorrow. |
| Comment by Etienne Aujames [ 14/Apr/21 ] |
|
I have added the our backtrace to this tickets: crash_bt_jbd2_locked_20210412.log |
| Comment by Etienne Aujames [ 20/Apr/21 ] |
|
Hello Alex, Did you have the time to look to our backtrace? If you need more data from the crashdump I can get you some. |
| Comment by Etienne Aujames [ 11/May/21 ] |
|
Hello, We have applied the https://review.whamcloud.com/43277 (" We were able to reproduce this issue in 5/10 min with many creations of small files and several threads doing file migrations ("lfs migrate" between OSTs). |