[LU-14055] Write performance regression caused by an commit from LU-13344 - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.15.0
Affects Version/s: Lustre 2.14.0
Labels:
None
Environment:
master (commit: 56526a90ae)

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

commit 76626d6c52 "LU-13344 all: Separate debugfs and procfs handling" caused write performance regression. Here is a reproducer and tested workload.

Single Client(Ubuntu 18.04, 5.4.0-47-generic), 16MB O_DIRECT, FPP (128 processes)

# mpirun --allow-run-as-root -np 128 --oversubscribe --mca btl_openib_warn_default_gid_prefix 0 --bind-to none ior -u -w -r
 -k -e -F -t 16384k -b 16384k -s 1000 -u -o /mnt/ai400x/ior.out/file --posix.odirect

"git bisect" indentified an commit where regression started.

Here is test results.
76626d6c52 ~~LU-13344~~ all: Separate debugfs and procfs handling

access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
write     21861      1366.33    60.78       16384      16384      0.091573   93.68      40.38      93.68      0   
read      38547      2409.18    46.14       16384      16384      0.005706   53.13      8.26       53.13      0

5bc1fe092c ~~LU-13196~~ llite: Remove mutex on dio read

access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
write     32678      2042.40    58.96       16384      16384      0.105843   62.67      4.98       62.67      0   
read      38588      2411.78    45.89       16384      16384      0.004074   53.07      8.11       53.07      0

master (commit 56526a90ae)

access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
write     17046      1065.37    119.02      16384      16384      0.084449   120.15     67.76      120.15     0   
read      38512      2407.00    45.04       16384      16384      0.006462   53.18      9.07       53.18      0

master still has this regression and when commit 76626d6c52 reverts from master, the performrance is back.

master (commit 56526a90ae)+ revert commit 76626d6c52

access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
write     32425      2026.59    59.88       16384      16384      0.095842   63.16      4.79       63.16      0   
read      39601      2475.09    47.22       16384      16384      0.003637   51.72      5.73       51.72      0

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

master.svg
245 kB
12/Jan/21 3:45 AM
master-revert.svg
414 kB
12/Jan/21 3:45 AM

Issue Links

is related to

LU-14580 Lustre 2.12.6 performance regression

Open

LU-13601 page allocation failure during mount

Resolved

is related to

LU-8837 Cleanly separate server code from client code.

Open

LU-13344 Support for linux 5.6 clients

Resolved

Activity

[LU-14055] Write performance regression caused by an commit from LU-13344

Andreas Dilger added a comment - 08/Jan/21 7:09 PM

Shuichi, if shrinking struct obd_device does not solve the problem, then it seems the problem is caused by a misalignment of some data structure that follows the added obd_debugfs_vars field.

Can you please try another set of tests that move the "obd_debugfs_vars" line until we isolate the problematic field. The first test would be to move obd_debugfs_vars to the end of the struct:

 /* --- cacheline 101 boundary (6464 bytes) --- */
-       struct ldebugfs_vars *     obd_debugfs_vars;     /*  6464     8 */
        atomic_t                   obd_evict_inprogress; /*  6472     4 */
        wait_queue_head_t          obd_evict_inprogress_waitq; /*  6480    24 */
        struct list_head           obd_evict_list;       /*  6504    16 */
        rwlock_t                   obd_pool_lock;        /*  6520     8 */
        /* --- cacheline 102 boundary (6528 bytes) --- */                
        __u64                      obd_pool_slv;         /*  6528     8 */
        int                        obd_pool_limit;       /*  6536     4 */
        int                        obd_conn_inprogress;  /*  6540     4 */
        struct lu_ref              obd_reference;        /*  6544     0 */
        struct kset                obd_kset;             /*  6544   160 */
        /* --- cacheline 104 boundary (6656 bytes) was 48 bytes ago --- */
        struct kobj_type           obd_ktype;            /*  6704    72 */
        /* --- cacheline 105 boundary (6720 bytes) was 56 bytes ago --- */
        struct completion          obd_kobj_unregister;  /*  6776    32 */
+       struct ldebugfs_vars *     obd_debugfs_vars;     /*  6464     8 */

to see if this solves the problem (without my other patches). I've pushed a patch to do this. If it fixes the problem, then this confirms that the problem is caused by the alignment or cacheline contention on of one of the fields between obd_evict_inprogress and obd_kobj_unregister. This would be enough to land for 2.14.0 to solve the problem, but I don't want to leave the reason for the problem unsolved, since it is likely to be accidentally returned again in the future (e.g. by landing my patches to shrink lu_tgt_desc or anything else).

To isolate the reason for the problem you would need to "bisect" the 11 fields/366 bytes to see which one is causing the slowdown.

First try moving obd_debugfs_vars after obd_kset to see if this causes the slowdown again. If not, then the problem is obd_kset or earlier, so try moving it immediately before obd_kset (this is the largest field so makes it difficult to "bisect" exactly). If the problem is still not seen, move it after obd_evict_list, etc. Essentially, when obd_debugfs_vars is immediately before the offending struct the performance will be bad, and when it is immediately after the struct then the performance problem should go away. Once you find out what the structure is, try moving that field to be at the start of struct obd_device so that there is no chance of it being misaligned, after obd_lu_dev and after obd_recovery_expired. If these also show good performance, then this can be a permanent solution (I would prefer after obd_recovery_expired since these bitfields are very commonly used).

Please run the "pahole" command on the obdclass.ko module to show the "good" and "bad" structures to see what the problem is, and attach the results here.

Neil, James, since the obd_kobj and obd_ktype fields are recent additions and the largest fields in this area, it seems likely that they are the culprit here. Is there anything "special" about them that would require their alignment, or to avoid cacheline contention? Are they "hot" and referenced/refcounted continuously during object access?

Andreas Dilger added a comment - 08/Jan/21 7:09 PM Shuichi, if shrinking struct obd_device does not solve the problem, then it seems the problem is caused by a misalignment of some data structure that follows the added obd_debugfs_vars field. Can you please try another set of tests that move the " obd_debugfs_vars " line until we isolate the problematic field. The first test would be to move obd_debugfs_vars to the end of the struct: /* --- cacheline 101 boundary (6464 bytes) --- */ - struct ldebugfs_vars * obd_debugfs_vars; /* 6464 8 */ atomic_t obd_evict_inprogress; /* 6472 4 */ wait_queue_head_t obd_evict_inprogress_waitq; /* 6480 24 */ struct list_head obd_evict_list; /* 6504 16 */ rwlock_t obd_pool_lock; /* 6520 8 */ /* --- cacheline 102 boundary (6528 bytes) --- */ __u64 obd_pool_slv; /* 6528 8 */ int obd_pool_limit; /* 6536 4 */ int obd_conn_inprogress; /* 6540 4 */ struct lu_ref obd_reference; /* 6544 0 */ struct kset obd_kset; /* 6544 160 */ /* --- cacheline 104 boundary (6656 bytes) was 48 bytes ago --- */ struct kobj_type obd_ktype; /* 6704 72 */ /* --- cacheline 105 boundary (6720 bytes) was 56 bytes ago --- */ struct completion obd_kobj_unregister; /* 6776 32 */ + struct ldebugfs_vars * obd_debugfs_vars; /* 6464 8 */ to see if this solves the problem (without my other patches). I've pushed a patch to do this. If it fixes the problem, then this confirms that the problem is caused by the alignment or cacheline contention on of one of the fields between obd_evict_inprogress and obd_kobj_unregister . This would be enough to land for 2.14.0 to solve the problem, but I don't want to leave the reason for the problem unsolved, since it is likely to be accidentally returned again in the future (e.g. by landing my patches to shrink lu_tgt_desc or anything else). To isolate the reason for the problem you would need to "bisect" the 11 fields/366 bytes to see which one is causing the slowdown. First try moving obd_debugfs_vars after obd_kset to see if this causes the slowdown again. If not, then the problem is obd_kset or earlier, so try moving it immediately before obd_kset (this is the largest field so makes it difficult to "bisect" exactly). If the problem is still not seen, move it after obd_evict_list , etc. Essentially, when obd_debugfs_vars is immediately before the offending struct the performance will be bad, and when it is immediately after the struct then the performance problem should go away. Once you find out what the structure is, try moving that field to be at the start of struct obd_device so that there is no chance of it being misaligned, after obd_lu_dev and after obd_recovery_expired . If these also show good performance, then this can be a permanent solution (I would prefer after obd_recovery_expired since these bitfields are very commonly used). Please run the " pahole " command on the obdclass.ko module to show the "good" and "bad" structures to see what the problem is, and attach the results here. Neil, James, since the obd_kobj and obd_ktype fields are recent additions and the largest fields in this area, it seems likely that they are the culprit here. Is there anything "special" about them that would require their alignment, or to avoid cacheline contention? Are they "hot" and referenced/refcounted continuously during object access?

Gerrit Updater added a comment - 08/Jan/21 6:38 PM

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41178
Subject: ~~LU-14055~~ obdclass: move obd_debugfs_vars to end obd_device
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: dd0a0df383387e6455bbad565503883433516454

Gerrit Updater added a comment - 08/Jan/21 6:38 PM Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41178 Subject: LU-14055 obdclass: move obd_debugfs_vars to end obd_device Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: dd0a0df383387e6455bbad565503883433516454

Shuichi Ihara added a comment - 08/Jan/21 5:33 AM

Andreas,
I just tested two patches, but the performrance was even worse below.

access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
write     14218      888.67     161.01      16384      16384      0.159725   172.84     27.90      172.85     0   
write     14168      885.51     159.56      16384      16384      0.218456   173.46     23.85      173.46     0   
write     14093      880.82     161.54      16384      16384      0.191401   174.38     25.50      174.38     0

Shuichi Ihara added a comment - 08/Jan/21 5:33 AM Andreas, I just tested two patches, but the performrance was even worse below. access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- write 14218 888.67 161.01 16384 16384 0.159725 172.84 27.90 172.85 0 write 14168 885.51 159.56 16384 16384 0.218456 173.46 23.85 173.46 0 write 14093 880.82 161.54 16384 16384 0.191401 174.38 25.50 174.38 0

Andreas Dilger added a comment - 07/Jan/21 8:13 PM - edited

John, it is on the client only, AFAIK.

Shuichi, could you please run "pahole" on the obdclass.ko module with and without the problem and attach it here. That program is part of the "dwarves RPM.

Andreas Dilger added a comment - 07/Jan/21 8:13 PM - edited John, it is on the client only, AFAIK. Shuichi, could you please run " pahole " on the obdclass.ko module with and without the problem and attach it here. That program is part of the " dwarves RPM.

John Hammond added a comment - 07/Jan/21 4:46 PM

I cannot tell if this is already understood but it would be useful to know if the change in performance is due to the debugfs changes being applied to the client, to the server, or both.

John Hammond added a comment - 07/Jan/21 4:46 PM I cannot tell if this is already understood but it would be useful to know if the change in performance is due to the debugfs changes being applied to the client, to the server, or both.

Andreas Dilger added a comment - 07/Jan/21 2:33 PM

Shuichi, could you please try with these two patches. The first one just decreases the struct size by 4 bytes, but I'm not sure if that will be enough. The second reduces it by over 2KB, which hopefully is enough. Otherwise, it may be that there is an alignment issue with some struct (likely the new "obd_k*" ones at the end) that will need some specific alignment requests.

Andreas Dilger added a comment - 07/Jan/21 2:33 PM Shuichi, could you please try with these two patches. The first one just decreases the struct size by 4 bytes, but I'm not sure if that will be enough. The second reduces it by over 2KB, which hopefully is enough. Otherwise, it may be that there is an alignment issue with some struct (likely the new " obd_k* " ones at the end) that will need some specific alignment requests.

Gerrit Updater added a comment - 07/Jan/21 1:23 PM

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41162
Subject: ~~LU-14055~~ lmv: reduce struct lmv_obd size
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4f6159a3f2175cdcb8eee4017ba8e0a3d70268f2

Gerrit Updater added a comment - 07/Jan/21 1:23 PM Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41162 Subject: LU-14055 lmv: reduce struct lmv_obd size Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4f6159a3f2175cdcb8eee4017ba8e0a3d70268f2

Gerrit Updater added a comment - 07/Jan/21 1:23 PM

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41161
Subject: ~~LU-14055~~ obdclass: fill hole in struct obd_device
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 29898c50da924ad146b71046ee066371058cbb6a

Gerrit Updater added a comment - 07/Jan/21 1:23 PM Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41161 Subject: LU-14055 obdclass: fill hole in struct obd_device Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 29898c50da924ad146b71046ee066371058cbb6a

Andreas Dilger added a comment - 07/Jan/21 8:41 AM

As for fields that might be affected by the addition of obd_debugfs_vars there are only a few that are after this new field, and of those only a subset are used on the client:

        /* --- cacheline 101 boundary (6464 bytes) --- */                       
        struct ldebugfs_vars *     obd_debugfs_vars;     /*  6464     8 */      
        atomic_t                   obd_evict_inprogress; /*  6472     4 */      
                                                                                
        /* XXX 4 bytes hole, try to pack */                                     
                                                                                
        wait_queue_head_t          obd_evict_inprogress_waitq; /*  6480    24 */
        struct list_head           obd_evict_list;       /*  6504    16 */      
        rwlock_t                   obd_pool_lock;        /*  6520     8 */      
        /* --- cacheline 102 boundary (6528 bytes) --- */                       
        __u64                      obd_pool_slv;         /*  6528     8 */      
        int                        obd_pool_limit;       /*  6536     4 */      
        int                        obd_conn_inprogress;  /*  6540     4 */      
        struct lu_ref              obd_reference;        /*  6544     0 */      
        struct kset                obd_kset;             /*  6544   160 */      
        /* --- cacheline 104 boundary (6656 bytes) was 48 bytes ago --- */      
        struct kobj_type           obd_ktype;            /*  6704    72 */      
        /* --- cacheline 105 boundary (6720 bytes) was 56 bytes ago --- */      
        struct completion          obd_kobj_unregister;  /*  6776    32 */

Likely candidates might be obd_pool_slv being split from obd_pool_lock, or something in the recently-added obd_kset or obd_ktype fields. I'm hoping that shrinking obd_device will resolve the problem, since random field alignment issues causing such huge performance swings is a nightmare.

Andreas Dilger added a comment - 07/Jan/21 8:41 AM As for fields that might be affected by the addition of obd_debugfs_vars there are only a few that are after this new field, and of those only a subset are used on the client: /* --- cacheline 101 boundary (6464 bytes) --- */ struct ldebugfs_vars * obd_debugfs_vars; /* 6464 8 */ atomic_t obd_evict_inprogress; /* 6472 4 */ /* XXX 4 bytes hole, try to pack */ wait_queue_head_t obd_evict_inprogress_waitq; /* 6480 24 */ struct list_head obd_evict_list; /* 6504 16 */ rwlock_t obd_pool_lock; /* 6520 8 */ /* --- cacheline 102 boundary (6528 bytes) --- */ __u64 obd_pool_slv; /* 6528 8 */ int obd_pool_limit; /* 6536 4 */ int obd_conn_inprogress; /* 6540 4 */ struct lu_ref obd_reference; /* 6544 0 */ struct kset obd_kset; /* 6544 160 */ /* --- cacheline 104 boundary (6656 bytes) was 48 bytes ago --- */ struct kobj_type obd_ktype; /* 6704 72 */ /* --- cacheline 105 boundary (6720 bytes) was 56 bytes ago --- */ struct completion obd_kobj_unregister; /* 6776 32 */ Likely candidates might be obd_pool_slv being split from obd_pool_lock , or something in the recently-added obd_kset or obd_ktype fields. I'm hoping that shrinking obd_device will resolve the problem, since random field alignment issues causing such huge performance swings is a nightmare.

Andreas Dilger added a comment - 07/Jan/21 7:59 AM - edited

Very strange that adding an 8-byte pointer to the already-large struct obd_device causes such a huge performance problem. According to pahole:

        /* size: 6808, cachelines: 107, members: 98 */
        /* sum members: 6792, holes: 2, sum holes: 8 */
        /* sum bitfield members: 20 bits, bit holes: 1, sum bit holes: 44 bits */
        /* paddings: 2, sum paddings: 8 */
        /* forced alignments: 1 */
        /* last cacheline: 24 bytes */
} __attribute__((__aligned__(8)));

Maybe it pushes the struct from 106 cachelines to 107, but I wouldn't think that would make a huge difference in performance. We could pack the two 4-byte holes in the struct to get back 8 bytes to solve this problem quickly before 2.14. That means the next patch that touches this struct is going to cause problems again, but would also give us some breathing room to resolve this issue more completely in 2.15. I will make a patch for this.

pahole shows a number of other major offenders in the struct that could be cleaned up:

        struct rhashtable          obd_uuid_hash;        /*   200   176 */
        struct rhltable            obd_nid_hash;         /*   376   176 */
 
        struct obd_llog_group      obd_olg;              /*   936   176 */

        struct hrtimer             obd_recovery_timer __attribute__((__aligned__
(8))); /*  1272    80 */

        union {
                struct obd_device_target obt;            /*  1520    96 */
                struct filter_obd  filter;               /*  1520    96 */
                struct ost_obd     ost;                  /*  1520    72 */
                struct echo_obd    echo;                 /*  1520   336 */
                struct client_obd  cli;                  /*  1520  2664 */
                struct echo_client_obd echo_client;      /*  1520    56 */
                struct lov_obd     lov;                  /*  1520   680 */
                struct lmv_obd     lmv;                  /*  1520  4872 */
        } u;                                             /*  1520  4872 */

There seems to have been a lot of cruft added to obd_device over the years that is server-specific and could be moved into struct obd_device_target instead of being kept in the common struct:

obd_nid_hash usage is already under HAVE_SERVER_SUPPORT and could be declared that way also. I have a patch for this. Moving it into obd_device_target would be better since it would also reduce memory usage on clients that are built with server support.
obd_uuid_hash entry is also only needed on the server. While it is referenced by __class_new_export() (also used on the client for "self" exports), it should only really be needed for targets that add remote connections. The whole block that checks obd_uuid_equals(cluuid, obd_uuid) could be under HAVE_SERVER_SUPPORT to allow obd_uuid_hash to also be moved into obd_device_target, along with obd_uuid_add() and obd_uuid_del(). I have a patch for this.
obd_olg is larger than it needs to be, since many of the LLOG_*_CTXT contexts are no longer used (at least LLOG_MDS_OST_REPL_CTXT, LLOG_SIZE_ORIG,REPL_CTXT, and LLOG_TEST_REPL_CTXT. ~~LU-5218~~ explains some of the details. OST_SZ_REC which seems to imply that LLOG_SIZE_ORIG_CTXT is used is itself not actually set anywhere but test code, and was never used in production (it was for an old Size-on-MDS implementation that was never finished). Removing the unused LLOG_*_CTXT constants will not directly shrink LLOG_MAX_CTXTS, but may allow a more efficient mapping to be used (e.g. a simple mapping to an in-memory dense enum).
obd_recovery_timer is only needed on the server and can move into obd_device_target, along with all of the other obd_recovery_*, obd_replay_*, obd_lwp_export, obd_exports_timed, obd_eviction_timer, obd_*_clients, and at least some of the obd_*transno* fields (though clients may use some of them). The target_*() functions under HAVE_SERVER_SUPPORT in ldlm_lib.c should all be moved into lustre/target/tgt_recovery.c (or similar) since they don't really have anything to do with LDLM.

The worst offender is the device-specific union u, with client_obd and lmv_obd being the largest members (though obd_device_target may increase in size in the future). One option is to dynamically allocate this member depending on the type used, since there is typically only one lmv_obd on a client, though client_obd is used for most of the devices on the client so will not help much.

The (almost only) offender in lmv_obd is struct lu_tgt_descs lmv_mdt_descs, which is a static array for all of the potential MDT devices the LMV may have (struct lu_tgt_desc_idx *ltd_tgt_idx[TGT_PTRS] being the major contributor). This is really unnecessary, and could be dynamically allocated for the maximum current MDT count. I have included that into patch https://review.whamcloud.com/40901 "LU-13601 llite: avoid needless large allocations" but it deserves to be split into its own patch, since making ltd_tgt_index dynamically sized is relatively easy compared to the llite changes also in that patch. I have a patch for this already.

The worst offender in client_obd are the seven obd_histogram fields, consuming 1848 of 2664 bytes, with 34 bytes of holes. I had a patch to dynamically allocate these structures on an as-needed basis, but it was complex and never landed. Maybe I need to revive that patch again.

Andreas Dilger added a comment - 07/Jan/21 7:59 AM - edited Very strange that adding an 8-byte pointer to the already-large struct obd_device causes such a huge performance problem. According to pahole : /* size: 6808, cachelines: 107, members: 98 */ /* sum members: 6792, holes: 2, sum holes: 8 */ /* sum bitfield members: 20 bits, bit holes: 1, sum bit holes: 44 bits */ /* paddings: 2, sum paddings: 8 */ /* forced alignments: 1 */ /* last cacheline: 24 bytes */ } __attribute__((__aligned__(8))); Maybe it pushes the struct from 106 cachelines to 107, but I wouldn't think that would make a huge difference in performance. We could pack the two 4-byte holes in the struct to get back 8 bytes to solve this problem quickly before 2.14. That means the next patch that touches this struct is going to cause problems again, but would also give us some breathing room to resolve this issue more completely in 2.15. I will make a patch for this. pahole shows a number of other major offenders in the struct that could be cleaned up: struct rhashtable obd_uuid_hash; /* 200 176 */ struct rhltable obd_nid_hash; /* 376 176 */ struct obd_llog_group obd_olg; /* 936 176 */ struct hrtimer obd_recovery_timer __attribute__((__aligned__ (8))); /* 1272 80 */ union { struct obd_device_target obt; /* 1520 96 */ struct filter_obd filter; /* 1520 96 */ struct ost_obd ost; /* 1520 72 */ struct echo_obd echo; /* 1520 336 */ struct client_obd cli; /* 1520 2664 */ struct echo_client_obd echo_client; /* 1520 56 */ struct lov_obd lov; /* 1520 680 */ struct lmv_obd lmv; /* 1520 4872 */ } u; /* 1520 4872 */ There seems to have been a lot of cruft added to obd_device over the years that is server-specific and could be moved into struct obd_device_target instead of being kept in the common struct: obd_nid_hash usage is already under HAVE_SERVER_SUPPORT and could be declared that way also. I have a patch for this. Moving it into obd_device_target would be better since it would also reduce memory usage on clients that are built with server support. obd_uuid_hash entry is also only needed on the server. While it is referenced by __class_new_export() (also used on the client for "self" exports), it should only really be needed for targets that add remote connections. The whole block that checks obd_uuid_equals(cluuid, obd_uuid) could be under HAVE_SERVER_SUPPORT to allow obd_uuid_hash to also be moved into obd_device_target , along with obd_uuid_add() and obd_uuid_del() . I have a patch for this. obd_olg is larger than it needs to be, since many of the LLOG_*_CTXT contexts are no longer used (at least LLOG_MDS_OST_REPL_CTXT , LLOG_SIZE_ORIG,REPL_CTXT , and LLOG_TEST_REPL_CTXT . LU-5218 explains some of the details. OST_SZ_REC which seems to imply that LLOG_SIZE_ORIG_CTXT is used is itself not actually set anywhere but test code, and was never used in production (it was for an old Size-on-MDS implementation that was never finished). Removing the unused LLOG_*_CTXT constants will not directly shrink LLOG_MAX_CTXTS , but may allow a more efficient mapping to be used (e.g. a simple mapping to an in-memory dense enum). obd_recovery_timer is only needed on the server and can move into obd_device_target , along with all of the other obd_recovery_* , obd_replay_* , obd_lwp_export , obd_exports_timed , obd_eviction_timer , obd_*_clients , and at least some of the obd_*transno* fields (though clients may use some of them). The target_*() functions under HAVE_SERVER_SUPPORT in ldlm_lib.c should all be moved into lustre/target/tgt_recovery.c (or similar) since they don't really have anything to do with LDLM. The worst offender is the device-specific union u , with client_obd and lmv_obd being the largest members (though obd_device_target may increase in size in the future). One option is to dynamically allocate this member depending on the type used, since there is typically only one lmv_obd on a client, though client_obd is used for most of the devices on the client so will not help much. The (almost only) offender in lmv_obd is struct lu_tgt_descs lmv_mdt_descs , which is a static array for all of the potential MDT devices the LMV may have ( struct lu_tgt_desc_idx *ltd_tgt_idx [TGT_PTRS] being the major contributor). This is really unnecessary, and could be dynamically allocated for the maximum current MDT count. I have included that into patch https://review.whamcloud.com/40901 " LU-13601 llite: avoid needless large allocations " but it deserves to be split into its own patch, since making ltd_tgt_index dynamically sized is relatively easy compared to the llite changes also in that patch. I have a patch for this already. The worst offender in client_obd are the seven obd_histogram fields, consuming 1848 of 2664 bytes, with 34 bytes of holes. I had a patch to dynamically allocate these structures on an as-needed basis, but it was complex and never landed. Maybe I need to revive that patch again.

Shuichi Ihara added a comment - 07/Jan/21 4:21 AM

I should have to inform that adding a "void *dummy;" top of "7 of 7" revert patch also caused same performance problem, then tested adding "struct ldebugfs_vars *obd_debugfs_vars" after of that.

Shuichi Ihara added a comment - 07/Jan/21 4:21 AM I should have to inform that adding a "void *dummy;" top of "7 of 7" revert patch also caused same performance problem, then tested adding "struct ldebugfs_vars *obd_debugfs_vars" after of that.

People

Assignee:: Patrick Farrell

Reporter:: Shuichi Ihara

Votes:: 0 Vote for this issue

Watchers:: 20 Start watching this issue

Dates

Created:: 21/Oct/20 4:12 AM

Updated:: 05/Jan/22 7:51 PM

Resolved:: 05/Jan/22 7:51 PM