Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4507

Server hangs and terrible performance - ZFS IOR

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Blocker Blocker
    • None
    • Lustre 2.6.0
    • None
    • Hyperion/LLNL
    • 3
    • 12333

      For sometime now we have been observing terrible read performance when running ZFS IOR file-per-proccess. The system will see ~7 GB/s reading with ldiskfs, at higher client counts the ZFS read performance on this test will drop to ~400 MB/s which is roughly a single client level.
      Observing the OSTs we typically see one or two of the 12 OSTs with a very high load, the rest idle. The busy OST with then timeout, frequently evict several clients, and move forward. Stack dumps and errors from two servers are attached. These tests are ongoing, please advise what further data needs to be collected.

        1. h-agb15.errors.txt
          9 kB
        2. h-agb15.log.dump.txt
          1.66 MB
        3. h-agb21.zfs.read.txt
          1.41 MB
        4. MDTEST performance.xlsx
          35 kB

            isaac Isaac Huang (Inactive)
            cliffw Cliff White (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: