[LU-6527] Journal commit callback opitmization Created: 27/Apr/15 Updated: 31/Mar/16 Resolved: 31/Mar/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Sergey Cheremencev | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Faced with following soft lockup's on mds(mds had internal journal with commit interval 5): Feb 18 14:03:28 snx11127n003 kernel: BUG: soft lockup - CPU#2 stuck for 67s! [jbd2/md66-8:132029] ... Feb 18 14:03:28 snx11127n003 kernel: Pid: 132029, comm: jbd2/md66-8 Not tainted 2.6.32-431.17.1.x2.0.47.x86_64 #1 Intel Corporation S2600JF/S2600JF Feb 18 14:03:28 snx11127n003 kernel: RIP: 0010:[<ffffffffa08c9e99>] [<ffffffffa08c9e99>] ptlrpc_commit_replies+0xb9/0x320 [ptlrpc] Feb 18 14:03:28 snx11127n003 kernel: RSP: 0018:ffff880791763c20 EFLAGS: 00000206 Feb 18 14:03:28 snx11127n003 kernel: RAX: ffff880d260c6030 RBX: ffff880791763c80 RCX: 0000000000000000 Feb 18 14:03:28 snx11127n003 kernel: RDX: ffff880bc7b01030 RSI: ffff880d1974f880 RDI: ffff881014c32928 Feb 18 14:03:28 snx11127n003 kernel: RBP: ffffffff8100bb8e R08: 0000000000000002 R09: 5a5a5a5a5a5a5a5a Feb 18 14:03:28 snx11127n003 kernel: R10: 5a5a5a5a5a5a5a5a R11: 5a5a5a5a5a5a5a5a R12: ffff88083fcb02c0 Feb 18 14:03:28 snx11127n003 kernel: R13: ffff88083febc140 R14: ffff88083febccc0 R15: 000001b100000000 Feb 18 14:03:28 snx11127n003 kernel: FS: 0000000000000000(0000) GS:ffff880044640000(0000) knlGS:0000000000000000 Feb 18 14:03:28 snx11127n003 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Feb 18 14:03:28 snx11127n003 kernel: CR2: 00007fb86046c518 CR3: 0000000001a85000 CR4: 00000000000407e0 Feb 18 14:03:28 snx11127n003 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 18 14:03:28 snx11127n003 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 18 14:03:28 snx11127n003 kernel: Process jbd2/md66-8 (pid: 132029, threadinfo ffff880791762000, task ffff8807ef3e4ae0) Feb 18 14:03:28 snx11127n003 kernel: Stack: Feb 18 14:03:28 snx11127n003 kernel: 0000000000000018 ffff881014c32928 ffff880791763c30 ffff880791763c30 Feb 18 14:03:28 snx11127n003 kernel: <d> 0000000000000000 0000000000000000 ffff880791763c80 ffff880bcc9fe240 Feb 18 14:03:28 snx11127n003 kernel: <d> 0000000000000000 ffff8807d7ac0000 ffff880d1974f900 ffff880d1974f900 Feb 18 14:03:28 snx11127n003 kernel: Call Trace: Feb 18 14:03:28 snx11127n003 kernel: [<ffffffffa0914258>] ? tgt_cb_last_committed+0x298/0x410 [ptlrpc] Feb 18 14:03:28 snx11127n003 kernel: [<ffffffffa0f6dba4>] ? osd_trans_commit_cb+0xb4/0x2b0 [osd_ldiskfs] Feb 18 14:03:28 snx11127n003 kernel: [<ffffffffa0f1c9ba>] ? ldiskfs_journal_commit_callback+0x8a/0xc0 [ldiskfs] Feb 18 14:03:28 snx11127n003 kernel: [<ffffffffa03df8ef>] ? jbd2_journal_commit_transaction+0x116f/0x15a0 [jbd2] Feb 18 14:03:28 snx11127n003 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 Feb 18 14:03:28 snx11127n003 kernel: [<ffffffff81084a8b>] ? try_to_del_timer_sync+0x7b/0xe0 Feb 18 14:03:28 snx11127n003 kernel: [<ffffffffa03e4c08>] ? kjournald2+0xb8/0x220 [jbd2] Feb 18 14:03:28 snx11127n003 kernel: [<ffffffff8109b010>] ? autoremove_wake_function+0x0/0x40 Feb 18 14:03:28 snx11127n003 kernel: [<ffffffffa03e4b50>] ? kjournald2+0x0/0x220 [jbd2] Feb 18 14:03:28 snx11127n003 kernel: [<ffffffff8109ac66>] ? kthread+0x96/0xa0 Feb 18 14:03:28 snx11127n003 kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20 Feb 18 14:03:28 snx11127n003 kernel: [<ffffffff8109abd0>] ? kthread+0x0/0xa0 Feb 18 14:03:28 snx11127n003 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 Analysis shows that there where about 28 millions jce: crash> transaction_t 0xffff880c0331ccc0 | grep handle_c t_handle_count = 28881407, When kjournald2 is busy, it can't mark running transaction as T_LOCKED, so ldiskfs writers may open transaction handles , add blocks to the transaction and add transaction commit hooks, making the commit / checkpont of the transaction even more complex and time consuming. |
| Comments |
| Comment by Gerrit Updater [ 27/Apr/15 ] | |||||||||
|
Sergey Cheremencev (sergey_cheremencev@xyratex.com) uploaded a new patch: http://review.whamcloud.com/14610 | |||||||||
| Comment by Gerrit Updater [ 27/Apr/15 ] | |||||||||
|
Sergey Cheremencev (sergey_cheremencev@xyratex.com) uploaded a new patch: http://review.whamcloud.com/14611 | |||||||||
| Comment by Sergey Cheremencev [ 27/Apr/15 ] | |||||||||
|
Test results we saw on real hardware. pdsh -w green[08-11] /mnt/devel/lustre/lustre/tests/createmany -o /mnt/greenfs/%h/foo -600& pdsh -w green[08-11] /mnt/devel/lustre/lustre/tests/createmany -o /mnt/greenfs/%h2/foo -600& And than delete: pdsh -w green[08-11] rm -rf /mnt/greenfs/%h/& pdsh -w green[08-11] rm -rf /mnt/greenfs/%h2/& Results (2nd and 3d columns - average number of callbacks handled per millisecond):
According to results 2 patches together give significant advantages on unlink. | |||||||||
| Comment by Alex Zhuravlev [ 27/Apr/15 ] | |||||||||
|
well, normally an advantage is expressed in something visible to the users/application. like create/unlink rate. | |||||||||
| Comment by Sergey Cheremencev [ 27/Apr/15 ] | |||||||||
|
I used such long commit timeout only for tests to estimate how patches impact to performance. | |||||||||
| Comment by Alex Zhuravlev [ 27/Apr/15 ] | |||||||||
|
you can use the following to run many threads through a single mount point: | |||||||||
| Comment by Gerrit Updater [ 07/May/15 ] | |||||||||
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14610/ | |||||||||
| Comment by Bob Glossman (Inactive) [ 08/May/15 ] | |||||||||
|
I see http://review.whamcloud.com/14610 only changes the el6 version of the journal-callback patch. Does the sles11 version need a similar change? | |||||||||
| Comment by Sergey Cheremencev [ 18/May/15 ] | |||||||||
I think yes. But I didn't make tests on sles. | |||||||||
| Comment by Andreas Dilger [ 06/Aug/15 ] | |||||||||
|
Please make versions of these two patches for SLES11 and RHEL7. I suspect the same SLES11 patch will also apply to RHEL7 without problems. Please also submit the http://review.whamcloud.com/14610 patch upstream to linux-ext4, so that we don't have to carry it in the ldiskfs patch list forever. I'm not sure whether the http://review.whamcloud.com/14611 patch would be accepted upstream or not, but you could try to submit that separately (so that it doesn't prevent 14610 from landing). | |||||||||
| Comment by Sergey Cheremencev [ 18/Sep/15 ] | |||||||||
|
Hi Andreas, Do you want me to add fixes for SLES11 and RHEL7 to existed patches ? Or should I prepare separate patches ? | |||||||||
| Comment by Andreas Dilger [ 19/Sep/15 ] | |||||||||
|
It looks like they will need to be separate patches for those kernels, since they no longer have the journal_conmit_callback patch. Also, if would be good to submit this upstream at the same time so that we don't need to carry these patches forever. | |||||||||
| Comment by Gerrit Updater [ 07/Oct/15 ] | |||||||||
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14611/ | |||||||||
| Comment by Peter Jones [ 07/Oct/15 ] | |||||||||
|
Yang Sheng Could you please look into what is required here for SLES/RHEL7 and upstream? Thanks Peter | |||||||||
| Comment by Gerrit Updater [ 09/Mar/16 ] | |||||||||
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/18835 | |||||||||
| Comment by Gerrit Updater [ 28/Mar/16 ] | |||||||||
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18835/ | |||||||||
| Comment by Yang Sheng [ 31/Mar/16 ] | |||||||||
|
Patch landed. Close ticket. |