Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
When doing lfs getstripe on very large stripe counts (1000+), it sometimes gets various pointer errors. I'm having a little trouble pinning these down, as they go away if lfs getstripe is run again at the same stripe count (I suspect we're seeing something like glibc allocating a larger block of memory and then us not running off the end of it again - I can't come up with another reason why this would happen).
I'm opening this bug partly to track these, I will try to update with more details once I can reliably reproduce the problem.
getstripe doesn't crash at anything under ~1000 stripes, but valgrind reports memory access errors starting at a few hundred stripes.
This problem is not especially serious (if anyone were using counts this high, they would've reported it ages ago), and I should be able to create a patch. Just not quite yet.
Here's some sample valgrind output for getstripe of 200 stripes. I haven't dug in to the details yet, it may be straightforward to fix:
0 96 0x60 0
==29383== Invalid read of size 4
==29383== at 0x4E4343C: lov_dump_user_lmm_v1v3 (liblustreapi.c:2723)
==29383== by 0x4E44B21: lov_dump_plain_user_lmm (liblustreapi.c:3473)
==29383== by 0x4E44B21: llapi_lov_dump_user_lmm (liblustreapi.c:3512)
==29383== by 0x4E44B21: cb_getstripe (liblustreapi.c:4731)
==29383== by 0x4E3E5C6: llapi_semantic_traverse.constprop.29 (liblustreapi.c:1877)
==29383== by 0x4E3EDEC: param_callback (liblustreapi.c:1975)
==29383== by 0x40F9E2: lfs_getstripe_internal (lfs.c:4507)
==29383== by 0x40F9E2: lfs_getstripe (lfs.c:4559)
==29383== by 0x4E4F2F0: Parser_execarg (parser.c:115)
==29383== by 0x40433D: main (lfs.c:9583)
==29383== Address 0x618751c is 11 bytes after a block of size 4,241 alloc'd
==29383== at 0x4C2A9B5: calloc (vg_replace_malloc.c:711)
==29383== by 0x4E3ED6A: common_param_init (liblustreapi.c:1654)
==29383== by 0x4E3ED6A: param_callback (liblustreapi.c:1969)
==29383== by 0x40F9E2: lfs_getstripe_internal (lfs.c:4507)
==29383== by 0x40F9E2: lfs_getstripe (lfs.c:4559)
==29383== by 0x4E4F2F0: Parser_execarg (parser.c:115)
==29383== by 0x40433D: main (lfs.c:9583)
==29383==
==29383== Conditional jump or move depends on uninitialised value(s)
==29383== at 0x4E43444: ostid_id (lustre_ostid.h:101)
==29383== by 0x4E43444: lov_dump_user_lmm_v1v3 (liblustreapi.c:2724)
==29383== by 0x4E44B21: lov_dump_plain_user_lmm (liblustreapi.c:3473)
==29383== by 0x4E44B21: llapi_lov_dump_user_lmm (liblustreapi.c:3512)
==29383== by 0x4E44B21: cb_getstripe (liblustreapi.c:4731)
==29383== by 0x4E3E5C6: llapi_semantic_traverse.constprop.29 (liblustreapi.c:1877)
==29383== by 0x4E3EDEC: param_callback (liblustreapi.c:1975)
==29383== by 0x40F9E2: lfs_getstripe_internal (lfs.c:4507)
==29383== by 0x40F9E2: lfs_getstripe (lfs.c:4559)
==29383== by 0x4E4F2F0: Parser_execarg (parser.c:115)
==29383== by 0x40433D: main (lfs.c:9583)
==29383==