[LU-11089] Performance improvements for lu_object locking Created: 13/Jun/18  Updated: 11/Aug/20  Resolved: 18/Jul/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.13.0, Lustre 2.12.5

Type: Improvement Priority: Major
Reporter: James A Simmons Assignee: James A Simmons
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Related
is related to LU-6800 Significant performance regression wi... Resolved
is related to LU-9679 Prepare lustre for adoption into the ... Resolved
is related to LU-8346 conf-sanity test_93: test failed to r... Resolved
is related to LU-12565 Use bit locking in obd_echo Resolved
Rank (Obsolete): 9223372036854775807

 Description   

While porting the LU-6800 work upstream the reaction to the approach was disliked since it wasn't a real improvement. Neil has created a patch series to break up the global lock to increase its performance.



 Comments   
Comment by Gerrit Updater [ 13/Jun/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/32711
Subject: LU-11089 obdclass: make key_set_version an atomic_t
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2da1e82923e31c017592353a79e52c8ddad9348f

Comment by Gerrit Updater [ 13/Jun/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/32712
Subject: LU-11089 obdclass: use an rwsem instead of lu_key_initing_cnt.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8d698149c0170224e4fdb0cc4b13a5e7190f742a

Comment by Gerrit Updater [ 13/Jun/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/32713
Subject: LU-11089 obdclass: remove locking from lu_context_exit()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 872ea28d4961209f74b827805a749a33407bb9b7

Comment by James A Simmons [ 14/Aug/18 ]

Here are some performance numbers with the 3 patches posted so far for this work. Two more patches are needed to complete this work.

Without patches:

mdtest-1.9.4-rc was launched with 5 total task(s) on 5 node(s)

Command line used: /lustre/crius/jsimmons/x86_64/mdtest -n 1000 -i 5 -z 2 -d /lustre/crius/jsimmons/performance_md_test

Path: /lustre/crius/jsimmons

FS: 100.2 TiB   Used FS: 0.0%   Inodes: 8.0 Mi   Used Inodes: 0.0%

 

5 tasks, 4995 files/directories

 

SUMMARY: (of 5 iterations)

   Operation                      Max            Min           Mean        Std Dev

   ---------                      ---            —           ----        -------

   Directory creation:      15427.405      12699.715      14055.058        990.423

   Directory stat    :      25565.025      21514.805      24128.395       1750.604

   Directory removal :      16948.667      13410.061      15774.485       1439.361

   File creation     :       5982.378       4995.855       5361.390        375.919

   File stat         :      10494.583       9369.004      10041.334        408.314

   File read         :       7705.990       6695.037       7290.709        434.290

   File removal      :       8118.476       7387.675       7833.091        262.636

   Tree creation     :       2031.828       1576.483       1891.981        163.704

   Tree removal      :       1221.602        952.344       1119.629         94.340

 

– finished at 08/13/2018 19:14:55 –

 

*******************************************************************************************

With the 3 patches:

mdtest-1.9.4-rc was launched with 5 total task(s) on 5 node(s)

Command line used: /lustre/crius/x86_64/mdtest -n 1000 -i 5 -z 2 -d /lustre/crius/performance_md_test

Path: /lustre/crius

FS: 100.2 TiB   Used FS: 0.0%   Inodes: 8.0 Mi   Used Inodes: 0.0%

 

5 tasks, 4995 files/directories

 

SUMMARY: (of 5 iterations)

   Operation                      Max            Min           Mean        Std Dev

   ---------                      ---            —           ----        -------

   Directory creation:      17621.888      13372.852      15968.954       1441.216

   Directory stat    :      29448.160      24189.123      27233.240       1811.668

   Directory removal :      20349.080      16581.538      18883.208       1315.397

   File creation     :       6240.638       5677.361       5930.478        201.596

   File stat         :      10785.819      10541.888      10701.592         85.094

   File read         :       7550.051       6797.333       7383.661        293.526

   File removal      :       9781.788       8219.130       8877.036        522.711

   Tree creation     :       2150.501       1322.451       1820.234        279.086

   Tree removal      :       1308.718        939.562       1115.832        151.934

 

– finished at 08/13/2018 20:10:34 –

Comment by Shuichi Ihara [ 08/Oct/18 ]

I'm getting the following crash on servers. (both OSS and MDS). I'm still not sure this crash comes form patch of LU-11089 , but at least, getting crash after applied patch. so, might be related...

[74844.146432] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[74844.157577] IP: [<ffffffffc0b75aed>] nid_hash+0x2d/0x50 [obdclass]
[74844.167059] PGD 0 
[74844.172253] Oops: 0000 [#1] SMP 
[74844.178646] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) ksocklnd(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dell_rbu ib_srp(OE) scsi_transport_srp(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) skx_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass dm_round_robin crc32_pclmul ghash_clmulni_intel dm_service_time aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev iTCO_wdt iTCO_vendor_support pcspkr ipmi_si mei_me mei nfit lpc_ich ipmi_devintf acpi_power_meter
[74844.270643]  i2c_i801 acpi_cpufreq acpi_pad libnvdimm shpchp ipmi_msghandler wmi nfsd auth_rpcgss nfs_acl lockd knem(OE) dm_multipath grace sunrpc ip_tables ext4 mbcache jbd2 mlx4_ib(OE) ib_core(OE) sd_mod crc_t10dif crct10dif_generic ast i2c_algo_bit drm_kms_helper qla2xxx syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel ahci i40e drm libahci libata mlx4_core(OE) devlink scsi_transport_fc ptp mlx_compat(OE) pps_core i2c_core scsi_tgt dm_mirror dm_region_hash dm_log dm_mod sg [last unloaded: libcfs]
[74844.334968] CPU: 34 PID: 321939 Comm: mdt05_002 Tainted: G           OE  ------------   3.10.0-693.21.1.el7_lustre.ddn1.x86_64 #1
[74844.352809] Hardware name: Supermicro SYS-5019P-WT/X11SPW-TF, BIOS 1.0 06/06/2017
[74844.363379] task: ffff88173113af70 ti: ffff88176cef0000 task.ti: ffff88176cef0000
[74844.373918] RIP: 0010:[<ffffffffc0b75aed>]  [<ffffffffc0b75aed>] nid_hash+0x2d/0x50 [obdclass]
[74844.385637] RSP: 0018:ffff88176cef3b40  EFLAGS: 00010206
[74844.393941] RAX: 000000000002b5a5 RBX: ffff8801d5ae1080 RCX: 0000000000000001
[74844.404043] RDX: 000000000000007f RSI: 0000000000000010 RDI: 000000000002a0a0
[74844.414099] RBP: ffff88176cef3b68 R08: 0000000000000000 R09: ffffffffc0dc02d1
[74844.424113] R10: ffff8817da29b960 R11: ffff8817277c9400 R12: 0000000000000007
[74844.434089] R13: ffff88176cef3b88 R14: ffff8817d138fa40 R15: ffff8801deae1038
[74844.444021] FS:  0000000000000000(0000) GS:ffff8817da280000(0000) knlGS:0000000000000000
[74844.454889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[74844.463390] CR2: 0000000000000010 CR3: 0000000001a02000 CR4: 00000000003607e0
[74844.473242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[74844.483049] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[74844.492804] Call Trace:
[74844.497826]  [<ffffffffc0913388>] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs]
[74844.507416]  [<ffffffffc0913425>] cfs_hash_bd_get+0x25/0x70 [libcfs]
[74844.516384]  [<ffffffffc09166d2>] cfs_hash_add+0x52/0x1a0 [libcfs]
[74844.525211]  [<ffffffffc0d8a765>] target_handle_connect+0x1fe5/0x29b0 [ptlrpc]
[74844.535080]  [<ffffffffc0e2e93a>] tgt_request_handle+0x50a/0x1580 [ptlrpc]
[74844.544540]  [<ffffffffc0e0aa41>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
[74844.554587]  [<ffffffff810ee42f>] ? __getnstimeofday64+0x3f/0xd0
[74844.563088]  [<ffffffffc0dd5b6b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[74844.573214]  [<ffffffffc0dd29f5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
[74844.582360]  [<ffffffff810c7c82>] ? default_wake_function+0x12/0x20
[74844.590949]  [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90
[74844.599113]  [<ffffffffc0dd9384>] ptlrpc_main+0xaf4/0x1fa0 [ptlrpc]
[74844.607705]  [<ffffffffc0dd8890>] ? ptlrpc_register_service+0xe90/0xe90 [ptlrpc]
[74844.617338]  [<ffffffff810b4031>] kthread+0xd1/0xe0
[74844.624400]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
[74844.632668]  [<ffffffff816c0577>] ret_from_fork+0x77/0xb0
[74844.640212]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
[74844.648439] Code: 44 00 00 48 85 f6 74 37 b9 01 00 00 00 45 31 c0 b8 05 15 00 00 eb 0d 0f 1f 80 00 00 00 00 49 89 c8 48 89 f9 89 c7 c1 e7 05 01 f8 <42> 0f be 3c 06 01 f8 48 8d 79 01 48 83 ff 09 75 e2 21 d0 c3 55 
[74844.673703] RIP  [<ffffffffc0b75aed>] nid_hash+0x2d/0x50 [obdclass]
[74844.682236]  RSP <ffff88176cef3b40>
[74844.687869] CR2: 0000000000000010

 

Comment by James A Simmons [ 08/Oct/18 ]

What triggers this crash? One of the maloo test or does running a particular application cause this? Also are you using all the posted LU-100189 patches?

Comment by Ruth Klundt (Inactive) [ 15/Oct/18 ]

hi james,

I've seen that too, just once on an OSS. I had done an abort_recovery on the mdt, and was mounting ~130 2.8 clients. got impatient I guess.

Servers x86 built at commit 4e42995.

Comment by James A Simmons [ 15/Oct/18 ]

Thanks Ruth for the info. Is this with these patches applied or did it happen independently? Hmmm. I suspect  a bug is buried in the NID hash code. Anyways I was planning to port it to rhashtable handling since that scales better and rhashtable is a standard in the linux kernel. Ruth can you reproduce it every time or was this a once off?

Comment by Ruth Klundt (Inactive) [ 15/Oct/18 ]

One off with no patches applied.

After I brought the node back all the clients mounted and tests ran.

Comment by Shuichi Ihara [ 15/Oct/18 ]

This happens quite offten. I saw crash even at intial mount. e.g. create filesystem and mount Lustre on 32 clients, then got crash of one of OSS.

Comment by James A Simmons [ 23/Oct/18 ]

Thanks for the info. Ruth has pointed out that this is a general bug. I have started the port of the nid hash to rhashtable and I'm seeing hidden issues with the original code.

Comment by James A Simmons [ 30/Oct/18 ]

Since the NID hash seems to be broken in general I did a port to rhashtables. Still need to work on the /proc entries to display hash stats. Please try it out to see if no longer crashes your nodes. Patch is at:

https://review.whamcloud.com/#/c/33518

The build breakage is only on SLES12SP3.

Comment by Patrick Farrell (Inactive) [ 31/Oct/18 ]

Seen here during recovery as well.  Interesting.  I imagine even if the bug was already there, the changes made it easier to hit.  (Doesn't mean the changes are wrong, just that there's probably a reason we're suddenly seeing it.)

Comment by Patrick Farrell (Inactive) [ 01/Nov/18 ]

Ah, I see now that none of the patches have landed.  So it is definitely a pre-existing bug.  Interesting.

Comment by Peter Jones [ 02/Nov/18 ]

Could we please have a separate ticket for any instances seen on 2.12 or earlier releases without James's unlanded patches being applied? Is there any suggestion that this is happening more frequently on 2.12 compared to 2.11 and earlier releases?

Comment by Shuichi Ihara [ 06/Nov/18 ]

James, crashing servers were not related to your patches (LU-11089), but looks like more general problem in master. me open new jira ticket for this.

Comment by James A Simmons [ 06/Nov/18 ]

Thanks. I have a patch based on LU-8130 work that should fix this.

Comment by Patrick Farrell (Inactive) [ 06/Nov/18 ]

Ihara,

Could you link that ticket here?  I'm interested in tracking it.  Our MDSses running 2.12 are crashing when we fail them over under load.  Pretty reliably.

Comment by Shuichi Ihara [ 06/Nov/18 ]

Patrick, sorry! forgot to update this ticket. Here is new ticket. LU-11624.

Comment by Gerrit Updater [ 15/Nov/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33667
Subject: LU-11089 obd: use wait_event_var() in lu_context_key_degister()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 37bb534e3779c4cfcf46a0206583ce3a88be69d1

Comment by Gerrit Updater [ 15/Nov/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33668
Subject: LU-11089 obd: remove lock from key register/degister
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 837c5d44d10daf04353e900939d9977feed064c5

Comment by Gerrit Updater [ 16/Nov/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33673
Subject: LU-11089 obd: rename lu_keys_guard to lu_context_remembered_guard
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0c790df73f57138e554326d95566de04618c1e93

Comment by Gerrit Updater [ 16/Nov/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33674
Subject: LU-11089 lu_object: fix possible hang waiting for LCS_LEAVING
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 28bd00dbcada616bcb5cf14899c3feac784bc6c1

Comment by James A Simmons [ 16/Nov/18 ]

Sigh, RHEL 7.6 did a port wrong from upstream  Need to download RHEL kernel source and see how they botched the port.

Comment by Gerrit Updater [ 27/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32711/
Subject: LU-11089 obdclass: make key_set_version an atomic_t
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e9213217691ae78d15237b0c5ecd3ba0b0416652

Comment by Gerrit Updater [ 27/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32712/
Subject: LU-11089 obdclass: use an rwsem instead of lu_key_initing_cnt.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 99bb9f91f5c5ca6a380b22efa04a3c00c8f520ca

Comment by Gerrit Updater [ 21/May/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32713/
Subject: LU-11089 obdclass: remove locking from lu_context_exit()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 62f6496f81ff5896ecc778c9e57b6f84d0f83da9

Comment by Gerrit Updater [ 01/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33667/
Subject: LU-11089 obd: use wait_event_var() in lu_context_key_degister()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 372ef85512dd2a722415fba9a3df66f81029508b

Comment by Gerrit Updater [ 01/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33668/
Subject: LU-11089 obd: remove lock from key register/degister
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f0b78533f07ca6d766f1ea97a623cdd6ff063e0f

Comment by Gerrit Updater [ 20/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33673/
Subject: LU-11089 obd: rename lu_keys_guard to lu_context_remembered_guard
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: bf86b80e4eacd0734665aa818d9cdebf0c157ee1

Comment by James A Simmons [ 20/Jun/19 ]

Last patch landed.

Comment by Gerrit Updater [ 12/May/20 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38570
Subject: LU-11089 obd: use wait_event_var() in lu_context_key_degister()
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 5df1c23bd60c193ae8e396840d58c7d7e532568e

Comment by Gerrit Updater [ 12/May/20 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38573
Subject: LU-11089 obd: use wait_event_var() in lu_context_key_degister()
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: a69b6a8f4f1de86ce247620315877a8050e102f8

Comment by Gerrit Updater [ 21/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38573/
Subject: LU-11089 obd: use wait_event_var() in lu_context_key_degister()
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: ceb45b5fbe35a65539b76678d8187a902504b138

Generated at Sat Feb 10 02:40:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.