[LU-6527] Journal commit callback opitmization Created: 27/Apr/15  Updated: 31/Mar/16  Resolved: 31/Mar/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Improvement Priority: Major
Reporter: Sergey Cheremencev Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: patch

Rank (Obsolete): 9223372036854775807

 Description   

Faced with following soft lockup's on mds(mds had internal journal with commit interval 5):

Feb 18 14:03:28 snx11127n003 kernel: BUG: soft lockup - CPU#2 stuck for 67s! [jbd2/md66-8:132029]
...
Feb 18 14:03:28 snx11127n003 kernel: Pid: 132029, comm: jbd2/md66-8 Not tainted 2.6.32-431.17.1.x2.0.47.x86_64 #1 Intel Corporation S2600JF/S2600JF
Feb 18 14:03:28 snx11127n003 kernel: RIP: 0010:[<ffffffffa08c9e99>]  [<ffffffffa08c9e99>] ptlrpc_commit_replies+0xb9/0x320 [ptlrpc]
Feb 18 14:03:28 snx11127n003 kernel: RSP: 0018:ffff880791763c20  EFLAGS: 00000206
Feb 18 14:03:28 snx11127n003 kernel: RAX: ffff880d260c6030 RBX: ffff880791763c80 RCX: 0000000000000000
Feb 18 14:03:28 snx11127n003 kernel: RDX: ffff880bc7b01030 RSI: ffff880d1974f880 RDI: ffff881014c32928
Feb 18 14:03:28 snx11127n003 kernel: RBP: ffffffff8100bb8e R08: 0000000000000002 R09: 5a5a5a5a5a5a5a5a
Feb 18 14:03:28 snx11127n003 kernel: R10: 5a5a5a5a5a5a5a5a R11: 5a5a5a5a5a5a5a5a R12: ffff88083fcb02c0
Feb 18 14:03:28 snx11127n003 kernel: R13: ffff88083febc140 R14: ffff88083febccc0 R15: 000001b100000000
Feb 18 14:03:28 snx11127n003 kernel: FS:  0000000000000000(0000) GS:ffff880044640000(0000) knlGS:0000000000000000
Feb 18 14:03:28 snx11127n003 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb 18 14:03:28 snx11127n003 kernel: CR2: 00007fb86046c518 CR3: 0000000001a85000 CR4: 00000000000407e0
Feb 18 14:03:28 snx11127n003 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 18 14:03:28 snx11127n003 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 18 14:03:28 snx11127n003 kernel: Process jbd2/md66-8 (pid: 132029, threadinfo ffff880791762000, task ffff8807ef3e4ae0)
Feb 18 14:03:28 snx11127n003 kernel: Stack:
Feb 18 14:03:28 snx11127n003 kernel:  0000000000000018 ffff881014c32928 ffff880791763c30 ffff880791763c30
Feb 18 14:03:28 snx11127n003 kernel: <d> 0000000000000000 0000000000000000 ffff880791763c80 ffff880bcc9fe240
Feb 18 14:03:28 snx11127n003 kernel: <d> 0000000000000000 ffff8807d7ac0000 ffff880d1974f900 ffff880d1974f900
Feb 18 14:03:28 snx11127n003 kernel: Call Trace:
Feb 18 14:03:28 snx11127n003 kernel:  [<ffffffffa0914258>] ? tgt_cb_last_committed+0x298/0x410 [ptlrpc]
Feb 18 14:03:28 snx11127n003 kernel:  [<ffffffffa0f6dba4>] ? osd_trans_commit_cb+0xb4/0x2b0 [osd_ldiskfs]
Feb 18 14:03:28 snx11127n003 kernel:  [<ffffffffa0f1c9ba>] ? ldiskfs_journal_commit_callback+0x8a/0xc0 [ldiskfs]
Feb 18 14:03:28 snx11127n003 kernel:  [<ffffffffa03df8ef>] ? jbd2_journal_commit_transaction+0x116f/0x15a0 [jbd2]
Feb 18 14:03:28 snx11127n003 kernel:  [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Feb 18 14:03:28 snx11127n003 kernel:  [<ffffffff81084a8b>] ? try_to_del_timer_sync+0x7b/0xe0
Feb 18 14:03:28 snx11127n003 kernel:  [<ffffffffa03e4c08>] ? kjournald2+0xb8/0x220 [jbd2]
Feb 18 14:03:28 snx11127n003 kernel:  [<ffffffff8109b010>] ? autoremove_wake_function+0x0/0x40
Feb 18 14:03:28 snx11127n003 kernel:  [<ffffffffa03e4b50>] ? kjournald2+0x0/0x220 [jbd2]
Feb 18 14:03:28 snx11127n003 kernel:  [<ffffffff8109ac66>] ? kthread+0x96/0xa0
Feb 18 14:03:28 snx11127n003 kernel:  [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Feb 18 14:03:28 snx11127n003 kernel:  [<ffffffff8109abd0>] ? kthread+0x0/0xa0
Feb 18 14:03:28 snx11127n003 kernel:  [<ffffffff8100c200>] ? child_rip+0x0/0x20

Analysis shows that there where about 28 millions jce:

crash> transaction_t 0xffff880c0331ccc0 | grep handle_c
  t_handle_count = 28881407,

When kjournald2 is busy, it can't mark running transaction as T_LOCKED, so ldiskfs writers may open transaction handles , add blocks to the transaction and add transaction commit hooks, making the commit / checkpont of the transaction even more complex and time consuming.



 Comments   
Comment by Gerrit Updater [ 27/Apr/15 ]

Sergey Cheremencev (sergey_cheremencev@xyratex.com) uploaded a new patch: http://review.whamcloud.com/14610
Subject: LU-6527 ext4: journal_commit_callback optimization
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6819a947557dee49ada7364fd8e0f3cdf940891c

Comment by Gerrit Updater [ 27/Apr/15 ]

Sergey Cheremencev (sergey_cheremencev@xyratex.com) uploaded a new patch: http://review.whamcloud.com/14611
Subject: LU-6527 ext4: journal_commit_callback optimization
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bbb2da173ac3d00cb9dd5ecc88ecca06f202ea4b

Comment by Sergey Cheremencev [ 27/Apr/15 ]

Test results we saw on real hardware.
Test environment: green cluster, 4 clients, 4 OST, Internal journal mds with commit interval 60 secnds.
Were tested create and delete operations. Each client had 2 threads that did create in 2 different dirs:

pdsh -w green[08-11] /mnt/devel/lustre/lustre/tests/createmany -o /mnt/greenfs/%h/foo -600&
pdsh -w green[08-11] /mnt/devel/lustre/lustre/tests/createmany -o /mnt/greenfs/%h2/foo -600&

And than delete:

pdsh -w green[08-11]  rm -rf /mnt/greenfs/%h/&
pdsh -w green[08-11]  rm -rf /mnt/greenfs/%h2/&

Results (2nd and 3d columns - average number of callbacks handled per millisecond):

fix name crete delete
no patch 144.5 49.2
2 patches together 200.7 166.8

According to results 2 patches together give significant advantages on unlink.
Want to note that according to MRP-2466 investigation of soft lockups on customer side there were only unlink operations in list of callbacks.

Comment by Alex Zhuravlev [ 27/Apr/15 ]

well, normally an advantage is expressed in something visible to the users/application. like create/unlink rate.
btw, what's the point to use such a long commit timeout?

Comment by Sergey Cheremencev [ 27/Apr/15 ]

I used such long commit timeout only for tests to estimate how patches impact to performance.
It was needed because I haven't enough hardware resources - only 4 clients.
But originally our customer faced above soft lockup with default journal settings: commit=5, internal journal 4Gb.
mdt params used in one case that caused problems were "-i 5 -v -n 16384 -u -C -T -r -F", this was from 32 client nodes, 1 task per node.

Comment by Alex Zhuravlev [ 27/Apr/15 ]

you can use the following to run many threads through a single mount point:
#define OBD_FAIL_MDC_RPCS_SEM 0x804

Comment by Gerrit Updater [ 07/May/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14610/
Subject: LU-6527 ext4: journal_commit_callback optimization
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 512f2ff1b17db4d03979a1caa86f94defc6653b4

Comment by Bob Glossman (Inactive) [ 08/May/15 ]

I see http://review.whamcloud.com/14610 only changes the el6 version of the journal-callback patch. Does the sles11 version need a similar change?

Comment by Sergey Cheremencev [ 18/May/15 ]

I see http://review.whamcloud.com/14610 only changes the el6 version of the journal-callback patch. Does the sles11 version need a similar change?

I think yes. But I didn't make tests on sles.
If it is needed I can prepare patch for sles11 also.

Comment by Andreas Dilger [ 06/Aug/15 ]

Please make versions of these two patches for SLES11 and RHEL7. I suspect the same SLES11 patch will also apply to RHEL7 without problems.

Please also submit the http://review.whamcloud.com/14610 patch upstream to linux-ext4, so that we don't have to carry it in the ldiskfs patch list forever. I'm not sure whether the http://review.whamcloud.com/14611 patch would be accepted upstream or not, but you could try to submit that separately (so that it doesn't prevent 14610 from landing).

Comment by Sergey Cheremencev [ 18/Sep/15 ]

Hi Andreas,

Do you want me to add fixes for SLES11 and RHEL7 to existed patches ? Or should I prepare separate patches ?

Comment by Andreas Dilger [ 19/Sep/15 ]

It looks like they will need to be separate patches for those kernels, since they no longer have the journal_conmit_callback patch. Also, if would be good to submit this upstream at the same time so that we don't need to carry these patches forever.

Comment by Gerrit Updater [ 07/Oct/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14611/
Subject: LU-6527 ext4: journal_commit_callback optimization
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 55afbf5934e145e48a31193de601abd58e605a42

Comment by Peter Jones [ 07/Oct/15 ]

Yang Sheng

Could you please look into what is required here for SLES/RHEL7 and upstream?

Thanks

Peter

Comment by Gerrit Updater [ 09/Mar/16 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/18835
Subject: LU-6527 ldiskfs: Journal commit callback optimization
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5d92db8dd8e147cc1360e409b15cfa0eb83e5864

Comment by Gerrit Updater [ 28/Mar/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18835/
Subject: LU-6527 ldiskfs: Journal commit callback optimization
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 16ce6a7ecb1558c5a18e8ba4751fef71e6bb97bc

Comment by Yang Sheng [ 31/Mar/16 ]

Patch landed. Close ticket.

Generated at Sat Feb 10 02:00:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.