Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5674

Maloo test report should include zfs debugging data when when FSTYPE=zfs

Details

    • 1854

    Description

      If I haven't missed something, zfs debugging data hasn't been included in test reports, e.g.:
      https://maloo.whamcloud.com/test_sets/42573266-9f17-11e3-934b-52540035b04c

      It'd be very useful to have a tarball of /proc/spl/. Lots of useful data to troubleshoot ZFS problems can be found under that directory, e.g. dmu_tx_assign delay histogram.

      Attachments

        Activity

          [LU-5674] Maloo test report should include zfs debugging data when when FSTYPE=zfs
          mdiep Minh Diep added a comment -

          this ticket already in 2.7..etc. we can not include any log if a node is crashed (ie timeout). this ticket should be closed.

          mdiep Minh Diep added a comment - this ticket already in 2.7..etc. we can not include any log if a node is crashed (ie timeout). this ticket should be closed.
          mdiep Minh Diep added a comment -

          since the test has timed out, there isn't any way to collect the zfs log

          mdiep Minh Diep added a comment - since the test has timed out, there isn't any way to collect the zfs log

          https://testing.hpdd.intel.com/test_sets/deca9712-9bc1-11e4-857a-5254006e85c2

          In the test report above, I couldn't find any ZFS data requested here. Since dmesg and other data that'd require a working user space were all there, I'd believe that the ZFS data should be available as well. Please take a look - the missing of such data made it harder to debug. Thanks!

          isaac Isaac Huang (Inactive) added a comment - https://testing.hpdd.intel.com/test_sets/deca9712-9bc1-11e4-857a-5254006e85c2 In the test report above, I couldn't find any ZFS data requested here. Since dmesg and other data that'd require a working user space were all there, I'd believe that the ZFS data should be available as well. Please take a look - the missing of such data made it harder to debug. Thanks!

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12590/
          Subject: LU-5674 test: print spl debug info
          Project: fs/lustre-release
          Branch: b2_5
          Current Patch Set:
          Commit: f3ecfa69ecbfaa3e28b50c2849ffc99ca6bebf6a

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12590/ Subject: LU-5674 test: print spl debug info Project: fs/lustre-release Branch: b2_5 Current Patch Set: Commit: f3ecfa69ecbfaa3e28b50c2849ffc99ca6bebf6a

          When you said "the test timed out caused the zfs log to not collected", did you mean:

          • on test time out, the scripts would not try to collect the ZFS logs, or
          • it tries to get the logs, but servers wouldn't respond

          In the report I mentioned above, the OSS was in good state; there was a deadlock on the MDS, which made some service threads unresponsive, but user space process should still work. In addition, dmesg and Lustre debug logs were all available for both the OSS and the MDS, then why wasn't the ZFS logs available as well?

          isaac Isaac Huang (Inactive) added a comment - When you said "the test timed out caused the zfs log to not collected", did you mean: on test time out, the scripts would not try to collect the ZFS logs, or it tries to get the logs, but servers wouldn't respond In the report I mentioned above, the OSS was in good state; there was a deadlock on the MDS, which made some service threads unresponsive, but user space process should still work. In addition, dmesg and Lustre debug logs were all available for both the OSS and the MDS, then why wasn't the ZFS logs available as well?
          mdiep Minh Diep added a comment -

          sorry for not being clear. after looking at this I think it likely that the test timed out caused the zfs log to not collected.

          Please if you find a case where a test failed but not log, please open a new ticket instead of reopen this. I believe this enhancement is completed.

          mdiep Minh Diep added a comment - sorry for not being clear. after looking at this I think it likely that the test timed out caused the zfs log to not collected. Please if you find a case where a test failed but not log, please open a new ticket instead of reopen this. I believe this enhancement is completed.

          Did you mean that even if the test I mentioned had failed a different way (i.e. not a timeout, so it'd be possible to collect to logs) the zfs logs would still not be collected? If yes, does it apply to all Maloo tests triggered from Gerrit?

          isaac Isaac Huang (Inactive) added a comment - Did you mean that even if the test I mentioned had failed a different way (i.e. not a timeout, so it'd be possible to collect to logs) the zfs logs would still not be collected? If yes, does it apply to all Maloo tests triggered from Gerrit?
          mdiep Minh Diep added a comment -

          the test that you mentioned doesn't follow the test-framework way of start test. this results in zfs log was not called. additional, the test timed out which could also mean that the log would not be collect at the end of the client crashed.

          mdiep Minh Diep added a comment - the test that you mentioned doesn't follow the test-framework way of start test. this results in zfs log was not called. additional, the test timed out which could also mean that the log would not be collect at the end of the client crashed.

          Looks like ZFS info is missing from this report:
          https://testing.hpdd.intel.com/test_sets/9e3a7c26-769b-11e4-ad19-5254006e85c2

          Or have I missed something?

          isaac Isaac Huang (Inactive) added a comment - Looks like ZFS info is missing from this report: https://testing.hpdd.intel.com/test_sets/9e3a7c26-769b-11e4-ad19-5254006e85c2 Or have I missed something?
          yujian Jian Yu added a comment -

          Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/12590

          yujian Jian Yu added a comment - Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/12590

          People

            mdiep Minh Diep
            isaac Isaac Huang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: