Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7393

OSS hung with high load and blocked ll_{*} threads

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Blocker
    • None
    • None
    • lola
      build: build: 2.7.62-28-g0754bc8, 0754bc8f2623bea184111af216f7567608db35b6; soakbuild '20151104.1'
    • 3
    • 9223372036854775807

    Description

      Error occurred during soak testing of build '20151104.1' on cluster lola (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20151104.1). MDTs are fromated with ldiskfs and OSTs with zfs as storage backend. DNE is enabled. MDSes are configured in HA failover configuration.
      OSS nodes are neither restarted nor failed over.

      Symptom:

      • OSS node (lola-3) shows high load to large number of blocked processes. No iowait or high disk load + long queue and wait times can seen
      • List of blocked process can be seen from 'w' and 't' sysrq-trigger iniiated at Nov 5 08:19:12 PST 2015, and 08:23:3 PST 2015 respectively (see attached messages file)
      • Problems most likely started at Nov 4, 18:50
        see messages file and debug log file (lustre-log.1446691819.85273.bz2) attached
      • 220 additional debug log files have been written which could be provided on demand

      Attachments

        Activity

          [LU-7393] OSS hung with high load and blocked ll_{*} threads

          People

            wc-triage WC Triage
            heckes Frank Heckes (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: