Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11691

lfs getstripe buffer overflows with very large stripe counts

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When doing lfs getstripe on very large stripe counts (1000+), it sometimes gets various pointer errors.  I'm having a little trouble pinning these down, as they go away if lfs getstripe is run again at the same stripe count (I suspect we're seeing something like glibc allocating a larger block of memory and then us not running off the end of it again - I can't come up with another reason why this would happen).

      I'm opening this bug partly to track these, I will try to update with more details once I can reliably reproduce the problem.

      getstripe doesn't crash at anything under ~1000 stripes, but valgrind reports memory access errors starting at a few hundred stripes.

      This problem is not especially serious (if anyone were using counts this high, they would've reported it ages ago), and I should be able to create a patch.  Just not quite yet.

      Here's some sample valgrind output for getstripe of 200 stripes.  I haven't dug in to the details yet, it may be straightforward to fix:

      0 96 0x60 0
      ==29383== Invalid read of size 4
      ==29383== at 0x4E4343C: lov_dump_user_lmm_v1v3 (liblustreapi.c:2723)
      ==29383== by 0x4E44B21: lov_dump_plain_user_lmm (liblustreapi.c:3473)
      ==29383== by 0x4E44B21: llapi_lov_dump_user_lmm (liblustreapi.c:3512)
      ==29383== by 0x4E44B21: cb_getstripe (liblustreapi.c:4731)
      ==29383== by 0x4E3E5C6: llapi_semantic_traverse.constprop.29 (liblustreapi.c:1877)
      ==29383== by 0x4E3EDEC: param_callback (liblustreapi.c:1975)
      ==29383== by 0x40F9E2: lfs_getstripe_internal (lfs.c:4507)
      ==29383== by 0x40F9E2: lfs_getstripe (lfs.c:4559)
      ==29383== by 0x4E4F2F0: Parser_execarg (parser.c:115)
      ==29383== by 0x40433D: main (lfs.c:9583)
      ==29383== Address 0x618751c is 11 bytes after a block of size 4,241 alloc'd
      ==29383== at 0x4C2A9B5: calloc (vg_replace_malloc.c:711)
      ==29383== by 0x4E3ED6A: common_param_init (liblustreapi.c:1654)
      ==29383== by 0x4E3ED6A: param_callback (liblustreapi.c:1969)
      ==29383== by 0x40F9E2: lfs_getstripe_internal (lfs.c:4507)
      ==29383== by 0x40F9E2: lfs_getstripe (lfs.c:4559)
      ==29383== by 0x4E4F2F0: Parser_execarg (parser.c:115)
      ==29383== by 0x40433D: main (lfs.c:9583)
      ==29383==
      ==29383== Conditional jump or move depends on uninitialised value(s)
      ==29383== at 0x4E43444: ostid_id (lustre_ostid.h:101)
      ==29383== by 0x4E43444: lov_dump_user_lmm_v1v3 (liblustreapi.c:2724)
      ==29383== by 0x4E44B21: lov_dump_plain_user_lmm (liblustreapi.c:3473)
      ==29383== by 0x4E44B21: llapi_lov_dump_user_lmm (liblustreapi.c:3512)
      ==29383== by 0x4E44B21: cb_getstripe (liblustreapi.c:4731)
      ==29383== by 0x4E3E5C6: llapi_semantic_traverse.constprop.29 (liblustreapi.c:1877)
      ==29383== by 0x4E3EDEC: param_callback (liblustreapi.c:1975)
      ==29383== by 0x40F9E2: lfs_getstripe_internal (lfs.c:4507)
      ==29383== by 0x40F9E2: lfs_getstripe (lfs.c:4559)
      ==29383== by 0x4E4F2F0: Parser_execarg (parser.c:115)
      ==29383== by 0x40433D: main (lfs.c:9583)
      ==29383==

      Attachments

        Activity

          People

            paf Patrick Farrell
            paf Patrick Farrell
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: