[LU-7898] remove unnecessary declarations from osd-zfs Created: 22/Mar/16  Updated: 02/Feb/17  Resolved: 10/Sep/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Improvement Priority: Minor
Reporter: Alex Zhuravlev Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocker
is blocking LU-7895 zfs metadata performance improvements Resolved
Related
Rank (Obsolete): 9223372036854775807

 Description   

in contrast with ldiskfs, declarations in ZFS can be very expensive (even more expensive than execution itself). there are number of declarations in osd-zfs which can be removed safely.



 Comments   
Comment by Gerrit Updater [ 23/Mar/16 ]

Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/19101
Subject: LU-7898 osd: remove unnecessary declarations
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a25217845bc24d21648e8dffc6863a9ccb1813f8

Comment by Alex Zhuravlev [ 24/Mar/16 ]

I've took to-write and to-overwrite estimations for createmany -m and createmany -o, in Kbs. the sum of these estimations then used to reserve space in TXG and ARC.

clean master:
createmany -m:
40 thx – 2w=16160 2ov=3072
createmany -o:
73 thx – 2w=27536 2ov=6048

LU-7898:
createmany -m:
12 thx – 2w=4592 2ov=1616
createmany -o:
25 thx – 2w=8800 2ov=3472

LU-7898 -dnode accounting:
createmany -m:
8 thx – 2w=2800 2ov=1168
createmany -o:
17 thx – 2w=5216 2ov=2576

Comment by Alex Zhuravlev [ 27/Mar/16 ]

this is how credits are distributed for a single-stripe file creation with this patch, in Kbs:

8 – 2w=3568 + 2ov=1040 = 4608
0: 2w=112 2ov=0 = 112 // dmu_object_alloc()
6: 2w=896 2ov=112 = 1008 // dmu_tx_hold_zap(tx, sa->sa_layout_attr_obj
8: 2w=896 2ov=112 = 1008 // OI
136: 2w=384 2ov=112 = 496 // iusr zap
137: 2w=384 2ov=112 = 496 // igrp zap
200: 2w=0 2ov=240 = 240 // lovobjids
169: 2w=896 2ov=112 = 1008 // dir insert
152: 2w=0 2ov=240 = 240 // last_rcvd

it's clear that ZAP's related credits consume the majority of the whole reservation.
first of all, it's currently not very optimal because calculated using SPA_OLD_MAXBLOCKSIZE which is 128K:

*towrite += (3 + (add ? 4 : 0)) * SPA_OLD_MAXBLOCKSIZE;

while by default ZAP's blocksize: int fzap_default_block_shift = 14; /* 16k blocksize */
so, if we change the code to use actual blocksize, then we can shrink that 896K to 112K.
with dnode allocation move into ZFS, that would lead to ~1M of credits for a single-stripe file creation.
current master reserves ~33,5MB.

unlink is still rather heavy due to llog:
19 – 2w=5616 + 2ov=2656 = 8272
169: 2w=384 2ov=112 = 496 // dir delete
207: 2w=0 2ov=112 = 112 // first ref_del on child?
8: 2w=384 2ov=112 = 496 // OI
136: 2w=384 2ov=112 = 496 // quota usr zap
137: 2w=384 2ov=112 = 496 // quota grp zap
207: 2w=0 2ov=224 = 224 // dmu_tx_hold_free(child)
204: 2w=0 2ov=240 = 240 // write to llog header
204: 2w=256 2ov=112 = 368 // append to llog
142: 2w=0 2ov=240 = 240 // add new llog to catalog
0: 2w=112 2ov=0 = 112 L1 // new llog object create
6: 2w=896 2ov=112 = 1008 // dmu_tx_hold_zap(tx, sa->sa_layout_attr_obj
8: 2w=896 2ov=112 = 1008 // OI for new llog object
136: 2w=384 2ov=112 = 496 // quota usr zap
137: 2w=384 2ov=112 = 496 // quota grp zap
203: 2w=0 2ov=240 = 240 // init new llog
203: 2w=256 2ov=112 = 368 // write to new llog
173: 2w=896 2ov=112 = 1008 // PENDING
207: 2w=0 2ov=128 = 128 // LinkEA
152: 2w=0 2ov=240 = 240 // last_rcvd

with dnode accounting moved to ZFS and zap blocksize=16k that would be 2774K.

Comment by Alex Zhuravlev [ 27/Mar/16 ]

llog is still very expensive.. with the patch a single record needs 4576K reserved:
reservations per llog record - 4576K:
204: 2w=0 2ov=240 = 240 // write to llog header
204: 2w=256 2ov=112 = 368 // append to llog
142: 2w=0 2ov=240 = 240 // add new llog to catalog
0: 2w=112 2ov=0 = 112 // new llog object create
6: 2w=896 2ov=112 = 1008 // dmu_tx_hold_zap(tx, sa->sa_layout_attr_obj
8: 2w=896 2ov=112 = 1008 // OI for new llog object
136: 2w=384 2ov=112 = 496 // quota usr zap
137: 2w=384 2ov=112 = 496 // quota grp zap
203: 2w=0 2ov=240 = 240 // init new llog
203: 2w=256 2ov=112 = 368 // write to new llog

with dnode accounting moved into ZFS and zap blocksize=16K we can shrink it to 2016K,
but it's still a lot. especially when we need to modify few dozen llogs at unlink. say, 128 stripes
would need 128*2=256MB reserved on disk and in memory.

Comment by Gerrit Updater [ 02/Sep/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19101/
Subject: LU-7898 osd: remove unnecessary declarations
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ead6df2feee9c143b617cb60e50e403c955bd401

Comment by Peter Jones [ 02/Sep/16 ]

Landed for 2.9

Comment by Gerrit Updater [ 02/Sep/16 ]

Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/22293
Subject: Revert "LU-7898 osd: remove unnecessary declarations"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: cbcd75dc816af495b232577b82bbd37da1c829ca

Comment by Gerrit Updater [ 02/Sep/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22293/
Subject: Revert "LU-7898 osd: remove unnecessary declarations"
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d16c76ea58f70cdac6c0de0e4fdbe5e329951c33

Comment by Gerrit Updater [ 02/Sep/16 ]

Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/22296
Subject: LU-7898 osd: remove unnecessary declarations
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7f4123879778e5c94915851240cb2e6d43e1cca2

Comment by Andreas Dilger [ 02/Sep/16 ]

Patch was reverted due to build issues.

Comment by Gerrit Updater [ 10/Sep/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22296/
Subject: LU-7898 osd: remove unnecessary declarations
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f2d4419227806850cee36c54c0028dc43ee02b1f

Comment by Peter Jones [ 10/Sep/16 ]

Landed for 2.9

Comment by nasf (Inactive) [ 02/Feb/17 ]

Jenkins build failure:

13:39:05 /tmp/rpmbuild-lustre-jenkins-hHtDucWf/BUILD/lustre-2.7.19.8/lustre/osd-zfs/osd_oi.c: In function 'osd_oi_create':
13:39:05 /tmp/rpmbuild-lustre-jenkins-hHtDucWf/BUILD/lustre-2.7.19.8/lustre/osd-zfs/osd_oi.c:179: error: 'DN_MAX_BONUSLEN' undeclared (first use in this function)
13:39:05 /tmp/rpmbuild-lustre-jenkins-hHtDucWf/BUILD/lustre-2.7.19.8/lustre/osd-zfs/osd_oi.c:179: error: (Each undeclared identifier is reported only once
13:39:05 /tmp/rpmbuild-lustre-jenkins-hHtDucWf/BUILD/lustre-2.7.19.8/lustre/osd-zfs/osd_oi.c:179: error: for each function it appears in.)
13:39:05 make[7]: *** [/tmp/rpmbuild-lustre-jenkins-hHtDucWf/BUILD/lustre-2.7.19.8/lustre/osd-zfs/osd_oi.o] Error 1

https://build.hpdd.intel.com/job/lustre-b_ieel-reviews/4526/
https://build.hpdd.intel.com/job/lustre-b_ieel-reviews/arch=x86_64,build_type=server,distro=el6.8,ib_stack=inkernel/4520/consoleFull

Generated at Sat Feb 10 02:12:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.