[LU-13601] page allocation failure during mount Created: 25/May/20 Updated: 03/Dec/21 Resolved: 03/Dec/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Upstream, Lustre 2.12.4 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexander Zarochentsev | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
A customer mounts several lustre fs , 10th fs is failed to mount due to a memory allocation failure. [7335638.554981] mount.lustre: page allocation failure: order:4, mode:0xc050 [7335638.554990] CPU: 0 PID: 303506 Comm: mount.lustre Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.41.1.el7.x86 _64 #1 [7335638.554992] Hardware name: Penguin Computing XE2142e-OEM/S2600BPS, BIOS SE5C620.86B.00.01.0016.020120190930 02/01/2019 [7335638.554994] Call Trace: [7335638.555012] [<ffffffffae565ac0>] dump_stack+0x19/0x1b [7335638.555022] [<ffffffffadfbe200>] warn_alloc_failed+0x110/0x180 [7335638.555025] [<ffffffffadfc2cbf>] __alloc_pages_nodemask+0x9df/0xbe0 [7335638.555029] [<ffffffffae00fce8>] alloc_pages_current+0x98/0x110 [7335638.555035] [<ffffffffadfddb68>] kmalloc_order+0x18/0x40 [7335638.555042] [<ffffffffae01b066>] kmalloc_order_trace+0x26/0xa0 [7335638.555095] [<ffffffffc0f94eec>] ll_init_sbi+0x4c/0x660 [lustre] [7335638.555154] [<ffffffffc0ae7be2>] ? lustre_start_mgc+0x4d2/0x2b00 [obdclass] [7335638.555157] [<ffffffffae01df36>] ? kmem_cache_alloc_trace+0x1d6/0x200 [7335638.555169] [<ffffffffc0fa145a>] ? ll_fill_super+0x7a/0x14e0 [lustre] [7335638.555179] [<ffffffffc0fa14b5>] ll_fill_super+0xd5/0x14e0 [lustre] [7335638.555196] [<ffffffffc0aeac04>] lustre_fill_super+0x264/0xb70 [obdclass] [7335638.555210] [<ffffffffc0aea9a0>] ? lustre_common_put_super+0x270/0x270 [obdclass] [7335638.555216] [<ffffffffae046e3f>] mount_nodev+0x4f/0xb0 [7335638.555230] [<ffffffffc0ae07c8>] lustre_mount+0x38/0x60 [obdclass] [7335638.555232] [<ffffffffae0479be>] mount_fs+0x3e/0x1b0 [7335638.555237] [<ffffffffae065607>] vfs_kern_mount+0x67/0x110 [7335638.555238] [<ffffffffae067c2f>] do_mount+0x1ef/0xce0 [7335638.555242] [<ffffffffae03fe1a>] ? __check_object_size+0x1ca/0x250 [7335638.555244] [<ffffffffae01dd9c>] ? kmem_cache_alloc_trace+0x3c/0x200 [7335638.555246] [<ffffffffae068a63>] SyS_mount+0x83/0xd0 [7335638.555249] [<ffffffffae578ddb>] system_call_fastpath+0x22/0x27 [7335638.555251] Mem-Info: [7335638.555264] active_anon:85113660 inactive_anon:4989547 isolated_anon:0 active_file:1793179 inactive_file:2080266 isolated_file:0 unevictable:0 dirty:26440 writeback:34 unstable:0 slab_reclaimable:836114 slab_unreclaimable:1111382 mapped:810319 shmem:25487119 pagetables:149686 bounce:0 free:243878 free_pcp:287 free_cma:0 |
| Comments |
| Comment by Alexander Zarochentsev [ 25/May/20 ] |
|
Allocation of struct ll_sb_info fails in ll_init_sbi() b/c memory is fragmented and ll_sb_info is more than 50kB in size, requires order 4 allocation. |
| Comment by Gerrit Updater [ 25/May/20 ] |
|
Alexander Zarochentsev (alexander.zarochentsev@hpe.com) uploaded a new patch: https://review.whamcloud.com/38713 |
| Comment by Andreas Dilger [ 25/May/20 ] |
|
It looks like the root cause of this very large allocation is because of ll_rw_extents_info and ll_rw_process_info stats that are almost never used: $ pahole lustre/llite/lustre.ko | grep -A 80 '^struct ll_sb_info'
struct ll_sb_info {
spinlock_t ll_lock; /* 0 4 */
spinlock_t ll_pp_extent_lock; /* 4 4 */
spinlock_t ll_process_lock; /* 8 4 */
struct obd_uuid ll_sb_uuid; /* 12 40 */
/* XXX 4 bytes hole, try to pack */
struct obd_export * ll_md_exp; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct obd_export * ll_dt_exp; /* 64 8 */
struct obd_device * ll_md_obd; /* 72 8 */
struct obd_device * ll_dt_obd; /* 80 8 */
struct dentry * ll_debugfs_entry; /* 88 8 */
struct lu_fid ll_root_fid; /* 96 16 */
int ll_flags; /* 112 4 */
unsigned int ll_umounting:1; /* 116:31 4 */
unsigned int ll_xattr_cache_enabled:1; /* 116:30 4 */
unsigned int ll_xattr_cache_set:1; /* 116:29 4 */
unsigned int ll_client_common_fill_super_succeeded:1; /* 116:28 4 */
unsigned int ll_checksum_set:1; /* 116:27 4 */
/* XXX 27 bits hole, try to pack */
struct lustre_client_ocd ll_lco; /* 120 56 */
/* --- cacheline 2 boundary (128 bytes) was 48 bytes ago --- */
struct lprocfs_stats * ll_stats; /* 176 8 */
struct cl_client_cache * ll_cache; /* 184 8 */
/* --- cacheline 3 boundary (192 bytes) --- */
struct lprocfs_stats * ll_ra_stats; /* 192 8 */
struct ll_ra_info ll_ra_info; /* 200 32 */
unsigned int ll_namelen; /* 232 4 */
/* XXX 4 bytes hole, try to pack */
struct file_operations * ll_fop; /* 240 8 */
struct lu_site * ll_site; /* 248 8 */
/* --- cacheline 4 boundary (256 bytes) --- */
struct cl_device * ll_cl; /* 256 8 */
struct ll_rw_extents_info ll_rw_extents_info; /* 264 5896 */
/* --- cacheline 96 boundary (6144 bytes) was 16 bytes ago --- */
int ll_extent_process_count; /* 6160 4 */
/* XXX 4 bytes hole, try to pack */
struct ll_rw_process_info ll_rw_process_info[10]; /* 6168 640 */
/* --- cacheline 106 boundary (6784 bytes) was 24 bytes ago --- */
unsigned int ll_offset_process_count; /* 6808 4 */
enum stats_track_type ll_stats_track_type; /* 13224 4 */
int ll_rw_stats_on; /* 13228 4 */
unsigned int ll_sa_running_max; /* 13232 4 */
unsigned int ll_sa_max; /* 13236 4 */
atomic_t ll_sa_total; /* 13240 4 */
atomic_t ll_sa_wrong; /* 13244 4 */
/* --- cacheline 207 boundary (13248 bytes) --- */
atomic_t ll_sa_running; /* 13248 4 */
atomic_t ll_agl_total; /* 13252 4 */
dev_t ll_sdev_orig; /* 13256 4 */
/* XXX 4 bytes hole, try to pack */
struct root_squash_info ll_squash; /* 13264 56 */
/* --- cacheline 208 boundary (13312 bytes) was 8 bytes ago --- */
struct path ll_mnt; /* 13320 16 */
unsigned int ll_stat_blksize; /* 13336 4 */
unsigned int ll_statfs_max_age; /* 13340 4 */
struct kset ll_kset; /* 13344 96 */
/* --- cacheline 210 boundary (13440 bytes) --- */
struct completion ll_kobj_unregister; /* 13440 32 */
/* size: 13472, cachelines: 211, members: 47 */
/* sum members: 13452, holes: 5, sum holes: 20 */
/* bit holes: 1, sum bit holes: 27 bits */
/* last cacheline: 32 bytes */
};
What about dynamically allocating those stats structures when they are first used, and freeing them at unmount time? That would bring the ll_sb_info allocation size down to 1176 bytes for almost all uses, and it would never fail allocation at that size. |
| Comment by Alexander Zarochentsev [ 26/May/20 ] |
|
Andreas, we have https://review.whamcloud.com/#/c/31236/ landed to our branch and it increases size of ll_sb_info up to 52kB, while master's version has only (gdb) p sizeof(struct ll_sb_info) $1 = 13568 (gdb) I am going to abandon the patch. |
| Comment by Andreas Dilger [ 26/May/20 ] |
|
I think especially in this case it would make sense to allocate the stats structs in the superblock dynamically when these stats are first enabled. |
| Comment by Peter Jones [ 18/Oct/20 ] |
|
So should this ticket be closed as Will Not Fix? |
| Comment by Andreas Dilger [ 08/Dec/20 ] |
|
I'd rather fix the problem than close the ticket. Allocating ll_rw_extents_info (5896 bytes), ll_rw_offset_info (6400 bytes), and ll_rw_process_info (640 bytes) only when these stats are enabled via ll_rw_extents_stats_pp_seq_write(), ll_rw_extents_stats_seq_write(), or ll_rw_offset_stats_seq_write() is straight forward to implement, and should really be done before patch https://review.whamcloud.com/31236 lands. The struct obd_histogram in struct client_obd and struct lmv_obd are also very large. I'll push a prototype patch which allocates the histograms on demand. |
| Comment by Gerrit Updater [ 08/Dec/20 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40901 |
| Comment by Gerrit Updater [ 03/Dec/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/40901/ |