[LU-11691] lfs getstripe buffer overflows with very large stripe counts Created: 22/Nov/18  Updated: 30/Apr/19  Resolved: 30/Apr/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0

Type: Bug Priority: Minor
Reporter: Patrick Farrell (Inactive) Assignee: Patrick Farrell (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When doing lfs getstripe on very large stripe counts (1000+), it sometimes gets various pointer errors.  I'm having a little trouble pinning these down, as they go away if lfs getstripe is run again at the same stripe count (I suspect we're seeing something like glibc allocating a larger block of memory and then us not running off the end of it again - I can't come up with another reason why this would happen).

I'm opening this bug partly to track these, I will try to update with more details once I can reliably reproduce the problem.

getstripe doesn't crash at anything under ~1000 stripes, but valgrind reports memory access errors starting at a few hundred stripes.

This problem is not especially serious (if anyone were using counts this high, they would've reported it ages ago), and I should be able to create a patch.  Just not quite yet.

Here's some sample valgrind output for getstripe of 200 stripes.  I haven't dug in to the details yet, it may be straightforward to fix:

0 96 0x60 0
==29383== Invalid read of size 4
==29383== at 0x4E4343C: lov_dump_user_lmm_v1v3 (liblustreapi.c:2723)
==29383== by 0x4E44B21: lov_dump_plain_user_lmm (liblustreapi.c:3473)
==29383== by 0x4E44B21: llapi_lov_dump_user_lmm (liblustreapi.c:3512)
==29383== by 0x4E44B21: cb_getstripe (liblustreapi.c:4731)
==29383== by 0x4E3E5C6: llapi_semantic_traverse.constprop.29 (liblustreapi.c:1877)
==29383== by 0x4E3EDEC: param_callback (liblustreapi.c:1975)
==29383== by 0x40F9E2: lfs_getstripe_internal (lfs.c:4507)
==29383== by 0x40F9E2: lfs_getstripe (lfs.c:4559)
==29383== by 0x4E4F2F0: Parser_execarg (parser.c:115)
==29383== by 0x40433D: main (lfs.c:9583)
==29383== Address 0x618751c is 11 bytes after a block of size 4,241 alloc'd
==29383== at 0x4C2A9B5: calloc (vg_replace_malloc.c:711)
==29383== by 0x4E3ED6A: common_param_init (liblustreapi.c:1654)
==29383== by 0x4E3ED6A: param_callback (liblustreapi.c:1969)
==29383== by 0x40F9E2: lfs_getstripe_internal (lfs.c:4507)
==29383== by 0x40F9E2: lfs_getstripe (lfs.c:4559)
==29383== by 0x4E4F2F0: Parser_execarg (parser.c:115)
==29383== by 0x40433D: main (lfs.c:9583)
==29383==
==29383== Conditional jump or move depends on uninitialised value(s)
==29383== at 0x4E43444: ostid_id (lustre_ostid.h:101)
==29383== by 0x4E43444: lov_dump_user_lmm_v1v3 (liblustreapi.c:2724)
==29383== by 0x4E44B21: lov_dump_plain_user_lmm (liblustreapi.c:3473)
==29383== by 0x4E44B21: llapi_lov_dump_user_lmm (liblustreapi.c:3512)
==29383== by 0x4E44B21: cb_getstripe (liblustreapi.c:4731)
==29383== by 0x4E3E5C6: llapi_semantic_traverse.constprop.29 (liblustreapi.c:1877)
==29383== by 0x4E3EDEC: param_callback (liblustreapi.c:1975)
==29383== by 0x40F9E2: lfs_getstripe_internal (lfs.c:4507)
==29383== by 0x40F9E2: lfs_getstripe (lfs.c:4559)
==29383== by 0x4E4F2F0: Parser_execarg (parser.c:115)
==29383== by 0x40433D: main (lfs.c:9583)
==29383==



 Comments   
Comment by Gerrit Updater [ 29/Dec/18 ]

Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/33941
Subject: LU-11691 llapi: PFL layout size limit
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f6c351b0113adc1531a6f49fb7fe8c5500e4573e

Comment by Gerrit Updater [ 03/Feb/19 ]

Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34171
Subject: LU-11691 lov: Limit layout size to max ea size
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4352c7a6fb52533f6135bf6056bfda15e0caed81

Comment by Gerrit Updater [ 30/Apr/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34171/
Subject: LU-11691 lov: Limit layout size to max ea size
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: aa72de32ba76943d9c96a962a0eb6b5503fad7a6

Comment by Peter Jones [ 30/Apr/19 ]

Landed for 2.13

Generated at Sat Feb 10 02:46:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.