[LU-6226] lu-822 have a bad hash function for a switching files Created: 09/Feb/15  Updated: 28/Aug/15

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Critical
Reporter: Alexey Lyashkov Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-822 allow multiple Object Index files to ... Resolved
Severity: 3
Rank (Obsolete): 17426

 Description   

after running sanity.sh i see very bad distribution for a fid's between OI files.

[root@rhel6-64 tmp]# debugfs -R 'ls -l' lustre-mdt1 | grep oi.16 | grep -v 8192
debugfs 1.42.7.wc1 (12-Apr-2013)
22 100644 (1) 0 0 876544 9-Feb-2015 21:11 oi.16.0
39 100644 (1) 0 0 491520 9-Feb-2015 21:11 oi.16.17
70 100644 (1) 0 0 28672 9-Feb-2015 21:11 oi.16.48
71 100644 (1) 0 0 2424832 9-Feb-2015 21:11 oi.16.49
72 100644 (1) 0 0 483328 9-Feb-2015 21:11 oi.16.50
73 100644 (1) 0 0 389120 9-Feb-2015 21:11 oi.16.51

all other files 8k byte size - so likely unused.

I don't see any reason to have 64 OI files if 5 only used.

Please fix a hash function to have better distribution.



 Comments   
Comment by Alex Zhuravlev [ 09/Feb/15 ]

probably you should try to use many more clients?

Comment by Alexey Lyashkov [ 09/Feb/15 ]

did you really think - it function have a dependence to number a clients?

+static inline struct osd_oi *
+osd_fid2oi(struct osd_device *osd, const struct lu_fid *fid)
+{
+        if (!fid_is_norm(fid))
+                return NULL;
+
+        LASSERT(osd->od_oi_table != NULL && osd->od_oi_count >= 1);
+        /* It can work even od_oi_count equals to 1 although it's unexpected,
+         * the only reason we set it to 1 is for performance measurement */
+        return &osd->od_oi_table[fid->f_seq & (osd->od_oi_count - 1)];
+}

f_seq & osd->od_oi_count - will don't depend to number a clients as uses only lower bits.

why don't use fid_hash() instead ?

Comment by Alex Zhuravlev [ 09/Feb/15 ]

f_seq depends on the number of the clients indirectly.

Comment by Alexey Lyashkov [ 09/Feb/15 ]

and we use only lower bits of it ? seq may be in range 0 .. 64 not more

you don't answer - why don't use fid_hash with better distribution ? if we have one line change and replace fid->f_seq with fid_hash(fid) distribution will changed dramatically and all files will be used.

Comment by Alex Zhuravlev [ 09/Feb/15 ]

seq isn't limited at all. seq is per client, then every client uses upto 128K oids and gets a new sequences. imagine 10K clients mount and start to generate roughly similar workload, they all will be sending roughly same f_oid..

Comment by Alexey Lyashkov [ 09/Feb/15 ]

Alex,

please look carefully to function - seq & (fid_files_num -1) so only lower bits used - so seq limited to 1-64.

Comment by Alex Zhuravlev [ 09/Feb/15 ]

probably Liang can comment better. I'd suggest to use more clients to see how the function behaves really. a single client is a very special case. and even in this case few OI files were used which is good.

Comment by Liang Zhen (Inactive) [ 10/Feb/15 ]

yes, my intention is hashing different clients to different OI files, so we don't have all clients contend on all OIs all the time (drawback of fid_hash). Also, because we have 64 OI files, so seq & (osd->od_oi_count - 1) should be between 0-63, is there any issue at here?

Comment by Alexey Lyashkov [ 24/Feb/15 ]

Liang,

distribution too bad in case few clients, as lower bits is mostly same for these clients (seq always allocated 128 fid's range), so only few files used after test. from disk perspective any parallel access add additional seeks so not have a differences is different clients work on same file or not.

Comment by Liang Zhen (Inactive) [ 25/Feb/15 ]

I don't think performance bottleneck is going to be OI operations when there are very few clients, also, why do we bother to use all OI files parallelly for handful clients? There was a single OI for whole filesystem in 2.0/2.1, multiple OIs were added to improve aggregated performance for many clients, not for single client.

Generated at Sat Feb 10 01:58:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.