Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9977

client ran out of memory when diffing two 2GB files

Details

    • Bug
    • Resolution: Not a Bug
    • Critical
    • None
    • Lustre 2.11.0, Lustre 2.10.2
    • None
    • clients: trevis-60vm1 & 2
      mds: trevis-62
      ost: trevis-65

      before upgrade: el7.3, zfs 6.5.9, b2_10 branch, v2.10.0, b5
      after upgrade: el7.4, zfs 7.1, master branch, v2.10.52, b3631
    • 3
    • 9223372036854775807

    Description

      Systems were imaged to el7.3/lustre2.10.0. A zfs mount point (v6.5.9) was
      created on the lustre file system. A 2GB file was then copied to a directory on
      the zfs mount point.

      After systems were imaged to el7.4/2.10.52, an import of the 6.5.9 zpool was
      performed. A 2GB file was then copied onto the zfs mount point (same file as
      above - different directory). Diff was then used to compare the two files.

      While diff was running, top showed it consuming 80-90% of memory. At some
      point close to 90%, the client killed the diff process.

      I've found two ways to avoid this:

      1) Keep everything above the same except work with a newly created zfs pool
      rather than an imported pool.

      2) Instead of diffing two 2GB files, diff two 2GB sets of several smaller
      files (largest file in set <60MB).

      Note: When diff is used to compare several smaller files, it uses much less
      memory (<10%).

      Note: This has also been seen with ldiskfs, but is easier to repro with zfs.

      Attachments

        1. before diff.JPG
          before diff.JPG
          149 kB
        2. during diff.JPG
          during diff.JPG
          170 kB
        3. slabtop and top screens during diff.JPG
          slabtop and top screens during diff.JPG
          161 kB
        4. stack for OOM during diff.txt
          19 kB
        5. vmcore
          36.63 MB

        Issue Links

          Activity

            [LU-9977] client ran out of memory when diffing two 2GB files
            adilger Andreas Dilger made changes -
            Fix Version/s Original: Lustre 2.11.0 [ 13091 ]
            Resolution New: Not a Bug [ 6 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-9601 [ LU-9601 ]
            jamesanunez James Nunez (Inactive) made changes -
            Affects Version/s New: Lustre 2.10.2 [ 13494 ]
            pjones Peter Jones made changes -
            Assignee Original: Nathaniel Clark [ utopiabound ] New: Zhenyu Xu [ bobijam ]
            pjones Peter Jones made changes -
            Priority Original: Minor [ 4 ] New: Critical [ 2 ]
            jcasper James Casper (Inactive) made changes -
            Description Original: Systems were imaged to el7.3/lustre2.10.0. A zfs mount point (v6.5.9) was
            created on the lustre file system. A 2GB file was then copied to a directory on
            the zfs mount point.

            After systems were imaged to el7.4/2.10.52, an import of the 6.5.9 zpool was
            performed. A 2GB file was then copied onto the zfs mount point (same file as
            above - different directory). Diff was then used to compare the two files.

            While diff was running, top showed it consuming 80-90% of memory. At some
            point close to 90%, the client killed the diff process.


            I've found two ways to avoid this:

            1) Keep everything above the same except work with a newly created zfs pool
               rather than an imported pool.

            2) Instead of diffing two 2GB files, diff two 2GB sets of several smaller
               files (largest file in set <60MB).


            Note: When diff is used to compare several smaller files, it uses much less
            memory (<10%).

            Note: This is not seen when zfs is not in the config.
            New: Systems were imaged to el7.3/lustre2.10.0. A zfs mount point (v6.5.9) was
            created on the lustre file system. A 2GB file was then copied to a directory on
            the zfs mount point.

            After systems were imaged to el7.4/2.10.52, an import of the 6.5.9 zpool was
            performed. A 2GB file was then copied onto the zfs mount point (same file as
            above - different directory). Diff was then used to compare the two files.

            While diff was running, top showed it consuming 80-90% of memory. At some
            point close to 90%, the client killed the diff process.


            I've found two ways to avoid this:

            1) Keep everything above the same except work with a newly created zfs pool
               rather than an imported pool.

            2) Instead of diffing two 2GB files, diff two 2GB sets of several smaller
               files (largest file in set <60MB).


            Note: When diff is used to compare several smaller files, it uses much less
            memory (<10%).

            Note: This has also been seen with ldiskfs, but is easier to repro with zfs.
            jcasper James Casper (Inactive) made changes -
            Summary Original: client ran out of memory when using an imported zfs pool New: client ran out of memory when diffing two 2GB files
            jcasper James Casper (Inactive) made changes -
            Attachment New: vmcore [ 28282 ]
            jcasper James Casper (Inactive) made changes -
            Attachment New: slabtop and top screens during diff.JPG [ 28280 ]
            Attachment New: stack for OOM during diff.txt [ 28281 ]
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.11.0 [ 13091 ]

            People

              bobijam Zhenyu Xu
              jcasper James Casper (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: