Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5674

Maloo test report should include zfs debugging data when when FSTYPE=zfs

Details

    • 1854

    Description

      If I haven't missed something, zfs debugging data hasn't been included in test reports, e.g.:
      https://maloo.whamcloud.com/test_sets/42573266-9f17-11e3-934b-52540035b04c

      It'd be very useful to have a tarball of /proc/spl/. Lots of useful data to troubleshoot ZFS problems can be found under that directory, e.g. dmu_tx_assign delay histogram.

      Attachments

        Activity

          [LU-5674] Maloo test report should include zfs debugging data when when FSTYPE=zfs
          mdiep Minh Diep added a comment -

          this ticket already in 2.7..etc. we can not include any log if a node is crashed (ie timeout). this ticket should be closed.

          mdiep Minh Diep added a comment - this ticket already in 2.7..etc. we can not include any log if a node is crashed (ie timeout). this ticket should be closed.
          mdiep Minh Diep added a comment -

          since the test has timed out, there isn't any way to collect the zfs log

          mdiep Minh Diep added a comment - since the test has timed out, there isn't any way to collect the zfs log

          https://testing.hpdd.intel.com/test_sets/deca9712-9bc1-11e4-857a-5254006e85c2

          In the test report above, I couldn't find any ZFS data requested here. Since dmesg and other data that'd require a working user space were all there, I'd believe that the ZFS data should be available as well. Please take a look - the missing of such data made it harder to debug. Thanks!

          isaac Isaac Huang (Inactive) added a comment - https://testing.hpdd.intel.com/test_sets/deca9712-9bc1-11e4-857a-5254006e85c2 In the test report above, I couldn't find any ZFS data requested here. Since dmesg and other data that'd require a working user space were all there, I'd believe that the ZFS data should be available as well. Please take a look - the missing of such data made it harder to debug. Thanks!

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12590/
          Subject: LU-5674 test: print spl debug info
          Project: fs/lustre-release
          Branch: b2_5
          Current Patch Set:
          Commit: f3ecfa69ecbfaa3e28b50c2849ffc99ca6bebf6a

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12590/ Subject: LU-5674 test: print spl debug info Project: fs/lustre-release Branch: b2_5 Current Patch Set: Commit: f3ecfa69ecbfaa3e28b50c2849ffc99ca6bebf6a

          When you said "the test timed out caused the zfs log to not collected", did you mean:

          • on test time out, the scripts would not try to collect the ZFS logs, or
          • it tries to get the logs, but servers wouldn't respond

          In the report I mentioned above, the OSS was in good state; there was a deadlock on the MDS, which made some service threads unresponsive, but user space process should still work. In addition, dmesg and Lustre debug logs were all available for both the OSS and the MDS, then why wasn't the ZFS logs available as well?

          isaac Isaac Huang (Inactive) added a comment - When you said "the test timed out caused the zfs log to not collected", did you mean: on test time out, the scripts would not try to collect the ZFS logs, or it tries to get the logs, but servers wouldn't respond In the report I mentioned above, the OSS was in good state; there was a deadlock on the MDS, which made some service threads unresponsive, but user space process should still work. In addition, dmesg and Lustre debug logs were all available for both the OSS and the MDS, then why wasn't the ZFS logs available as well?
          mdiep Minh Diep added a comment -

          sorry for not being clear. after looking at this I think it likely that the test timed out caused the zfs log to not collected.

          Please if you find a case where a test failed but not log, please open a new ticket instead of reopen this. I believe this enhancement is completed.

          mdiep Minh Diep added a comment - sorry for not being clear. after looking at this I think it likely that the test timed out caused the zfs log to not collected. Please if you find a case where a test failed but not log, please open a new ticket instead of reopen this. I believe this enhancement is completed.

          People

            mdiep Minh Diep
            isaac Isaac Huang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: