[LU-13601] page allocation failure during mount Created: 25/May/20  Updated: 03/Dec/21  Resolved: 03/Dec/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Upstream, Lustre 2.12.4
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Alexander Zarochentsev Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-4533 rpc_stats histogram does not support ... Open
is related to LU-14055 Write performance regression caused b... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

A customer mounts several lustre fs , 10th fs is failed to mount due to a memory allocation failure.

[7335638.554981] mount.lustre: page allocation failure: order:4, mode:0xc050
[7335638.554990] CPU: 0 PID: 303506 Comm: mount.lustre Kdump: loaded Tainted: G           OE  ------------ T 3.10.0-957.41.1.el7.x86
_64 #1
[7335638.554992] Hardware name: Penguin Computing XE2142e-OEM/S2600BPS, BIOS SE5C620.86B.00.01.0016.020120190930 02/01/2019
[7335638.554994] Call Trace:
[7335638.555012]  [<ffffffffae565ac0>] dump_stack+0x19/0x1b
[7335638.555022]  [<ffffffffadfbe200>] warn_alloc_failed+0x110/0x180
[7335638.555025]  [<ffffffffadfc2cbf>] __alloc_pages_nodemask+0x9df/0xbe0
[7335638.555029]  [<ffffffffae00fce8>] alloc_pages_current+0x98/0x110
[7335638.555035]  [<ffffffffadfddb68>] kmalloc_order+0x18/0x40
[7335638.555042]  [<ffffffffae01b066>] kmalloc_order_trace+0x26/0xa0
[7335638.555095]  [<ffffffffc0f94eec>] ll_init_sbi+0x4c/0x660 [lustre]
[7335638.555154]  [<ffffffffc0ae7be2>] ? lustre_start_mgc+0x4d2/0x2b00 [obdclass]
[7335638.555157]  [<ffffffffae01df36>] ? kmem_cache_alloc_trace+0x1d6/0x200
[7335638.555169]  [<ffffffffc0fa145a>] ? ll_fill_super+0x7a/0x14e0 [lustre]
[7335638.555179]  [<ffffffffc0fa14b5>] ll_fill_super+0xd5/0x14e0 [lustre]
[7335638.555196]  [<ffffffffc0aeac04>] lustre_fill_super+0x264/0xb70 [obdclass]
[7335638.555210]  [<ffffffffc0aea9a0>] ? lustre_common_put_super+0x270/0x270 [obdclass]
[7335638.555216]  [<ffffffffae046e3f>] mount_nodev+0x4f/0xb0
[7335638.555230]  [<ffffffffc0ae07c8>] lustre_mount+0x38/0x60 [obdclass]
[7335638.555232]  [<ffffffffae0479be>] mount_fs+0x3e/0x1b0
[7335638.555237]  [<ffffffffae065607>] vfs_kern_mount+0x67/0x110
[7335638.555238]  [<ffffffffae067c2f>] do_mount+0x1ef/0xce0
[7335638.555242]  [<ffffffffae03fe1a>] ? __check_object_size+0x1ca/0x250
[7335638.555244]  [<ffffffffae01dd9c>] ? kmem_cache_alloc_trace+0x3c/0x200
[7335638.555246]  [<ffffffffae068a63>] SyS_mount+0x83/0xd0
[7335638.555249]  [<ffffffffae578ddb>] system_call_fastpath+0x22/0x27
[7335638.555251] Mem-Info:
[7335638.555264] active_anon:85113660 inactive_anon:4989547 isolated_anon:0
 active_file:1793179 inactive_file:2080266 isolated_file:0
 unevictable:0 dirty:26440 writeback:34 unstable:0
 slab_reclaimable:836114 slab_unreclaimable:1111382
 mapped:810319 shmem:25487119 pagetables:149686 bounce:0
 free:243878 free_pcp:287 free_cma:0


 Comments   
Comment by Alexander Zarochentsev [ 25/May/20 ]

Allocation of struct ll_sb_info fails in ll_init_sbi() b/c memory is fragmented and ll_sb_info is more than 50kB in size, requires order 4 allocation.
A quick trace of page allocation during mount/unmount and a simple file copy test shows no other page allocation with order > 3 not protected by OBD_ALLOC_LARGE except
this ll_init_sbi()

Comment by Gerrit Updater [ 25/May/20 ]

Alexander Zarochentsev (alexander.zarochentsev@hpe.com) uploaded a new patch: https://review.whamcloud.com/38713
Subject: LU-13601 llite: OBD_ALLOC_LARGE for ll_sb_info
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 541cc5cf186d5de2a4fb624219acad7f35a19f1b

Comment by Andreas Dilger [ 25/May/20 ]

It looks like the root cause of this very large allocation is because of ll_rw_extents_info and ll_rw_process_info stats that are almost never used:

$ pahole lustre/llite/lustre.ko | grep -A 80 '^struct ll_sb_info'
struct ll_sb_info {
        spinlock_t                 ll_lock;              /*     0     4 */
        spinlock_t                 ll_pp_extent_lock;    /*     4     4 */
        spinlock_t                 ll_process_lock;      /*     8     4 */
        struct obd_uuid            ll_sb_uuid;           /*    12    40 */

        /* XXX 4 bytes hole, try to pack */

        struct obd_export *        ll_md_exp;            /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct obd_export *        ll_dt_exp;            /*    64     8 */
        struct obd_device *        ll_md_obd;            /*    72     8 */
        struct obd_device *        ll_dt_obd;            /*    80     8 */
        struct dentry *            ll_debugfs_entry;     /*    88     8 */
        struct lu_fid              ll_root_fid;          /*    96    16 */
        int                        ll_flags;             /*   112     4 */
        unsigned int               ll_umounting:1;       /*   116:31  4 */
        unsigned int               ll_xattr_cache_enabled:1; /*   116:30  4 */
        unsigned int               ll_xattr_cache_set:1; /*   116:29  4 */
        unsigned int               ll_client_common_fill_super_succeeded:1; /*   116:28  4 */
        unsigned int               ll_checksum_set:1;    /*   116:27  4 */

        /* XXX 27 bits hole, try to pack */

        struct lustre_client_ocd   ll_lco;               /*   120    56 */
        /* --- cacheline 2 boundary (128 bytes) was 48 bytes ago --- */
        struct lprocfs_stats *     ll_stats;             /*   176     8 */
        struct cl_client_cache *   ll_cache;             /*   184     8 */
        /* --- cacheline 3 boundary (192 bytes) --- */
        struct lprocfs_stats *     ll_ra_stats;          /*   192     8 */
        struct ll_ra_info          ll_ra_info;           /*   200    32 */
        unsigned int               ll_namelen;           /*   232     4 */

        /* XXX 4 bytes hole, try to pack */

        struct file_operations *   ll_fop;               /*   240     8 */
        struct lu_site *           ll_site;              /*   248     8 */
        /* --- cacheline 4 boundary (256 bytes) --- */
        struct cl_device *         ll_cl;                /*   256     8 */
        struct ll_rw_extents_info  ll_rw_extents_info;   /*   264  5896 */
        /* --- cacheline 96 boundary (6144 bytes) was 16 bytes ago --- */
        int                        ll_extent_process_count; /*  6160     4 */

        /* XXX 4 bytes hole, try to pack */

        struct ll_rw_process_info  ll_rw_process_info[10]; /*  6168   640 */
        /* --- cacheline 106 boundary (6784 bytes) was 24 bytes ago --- */
        unsigned int               ll_offset_process_count; /*  6808     4 */
        enum stats_track_type      ll_stats_track_type;  /* 13224     4 */
        int                        ll_rw_stats_on;       /* 13228     4 */
        unsigned int               ll_sa_running_max;    /* 13232     4 */
        unsigned int               ll_sa_max;            /* 13236     4 */
        atomic_t                   ll_sa_total;          /* 13240     4 */
        atomic_t                   ll_sa_wrong;          /* 13244     4 */
        /* --- cacheline 207 boundary (13248 bytes) --- */
        atomic_t                   ll_sa_running;        /* 13248     4 */
        atomic_t                   ll_agl_total;         /* 13252     4 */
        dev_t                      ll_sdev_orig;         /* 13256     4 */

        /* XXX 4 bytes hole, try to pack */

        struct root_squash_info    ll_squash;            /* 13264    56 */
        /* --- cacheline 208 boundary (13312 bytes) was 8 bytes ago --- */
        struct path                ll_mnt;               /* 13320    16 */
        unsigned int               ll_stat_blksize;      /* 13336     4 */
        unsigned int               ll_statfs_max_age;    /* 13340     4 */
        struct kset                ll_kset;              /* 13344    96 */
        /* --- cacheline 210 boundary (13440 bytes) --- */
        struct completion          ll_kobj_unregister;   /* 13440    32 */

        /* size: 13472, cachelines: 211, members: 47 */
        /* sum members: 13452, holes: 5, sum holes: 20 */
        /* bit holes: 1, sum bit holes: 27 bits */
        /* last cacheline: 32 bytes */
};

What about dynamically allocating those stats structures when they are first used, and freeing them at unmount time? That would bring the ll_sb_info allocation size down to 1176 bytes for almost all uses, and it would never fail allocation at that size.

Comment by Alexander Zarochentsev [ 26/May/20 ]

Andreas, we have https://review.whamcloud.com/#/c/31236/ landed to our branch and it increases size of ll_sb_info up to 52kB, while master's version has only

(gdb) p sizeof(struct ll_sb_info)
$1 = 13568
(gdb)

I am going to abandon the patch.

Comment by Andreas Dilger [ 26/May/20 ]

I think especially in this case it would make sense to allocate the stats structs in the superblock dynamically when these stats are first enabled.

Comment by Peter Jones [ 18/Oct/20 ]

So should this ticket be closed as Will Not Fix?

Comment by Andreas Dilger [ 08/Dec/20 ]

I'd rather fix the problem than close the ticket. Allocating ll_rw_extents_info (5896 bytes), ll_rw_offset_info (6400 bytes), and ll_rw_process_info (640 bytes) only when these stats are enabled via ll_rw_extents_stats_pp_seq_write(), ll_rw_extents_stats_seq_write(), or ll_rw_offset_stats_seq_write() is straight forward to implement, and should really be done before patch https://review.whamcloud.com/31236 lands.

The struct obd_histogram in struct client_obd and struct lmv_obd are also very large. I'll push a prototype patch which allocates the histograms on demand.

Comment by Gerrit Updater [ 08/Dec/20 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40901
Subject: LU-13601 llite: avoid needless large allocations
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f1cf5c17ddd59a32bd6b0f0de1bb4e65f1009424

Comment by Gerrit Updater [ 03/Dec/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/40901/
Subject: LU-13601 llite: avoid needless large stats alloc
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9490fd9bb84dc277bd103bf16286fc26882e5b5e

Generated at Sat Feb 10 03:02:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.