Details

    • Improvement
    • Resolution: Fixed
    • Blocker
    • Lustre 2.3.0
    • Lustre 2.3.0
    • None
    • 22,639
    • 4569

    Description

      We should send the timestamp data from the node along with the performance
      counters. This will give us much more accurate performance data on a per-node
      and per-group basis.

      This is originally discussed in Oracle Bug 22639.

      Attachments

        Activity

          [LU-445] Send timestamps with LNet counters

          http://review.whamcloud.com/#change,3514 is the 1st patch for b2_2. Will submit the 2nd patch after landing this one.

          wang Wally Wang (Inactive) added a comment - http://review.whamcloud.com/#change,3514 is the 1st patch for b2_2. Will submit the 2nd patch after landing this one.
          doug Doug Oucharek (Inactive) made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Reopened [ 4 ] New: Resolved [ 5 ]

          Change http://review.whamcloud.com/#change,3192 has landed thereby addressing the backwards compatibility issue.

          doug Doug Oucharek (Inactive) added a comment - Change http://review.whamcloud.com/#change,3192 has landed thereby addressing the backwards compatibility issue.
          liang Liang Zhen (Inactive) made changes -
          Affects Version/s New: Lustre 2.3.0 [ 10117 ]
          Priority Original: Major [ 3 ] New: Blocker [ 1 ]

          Change it to blocker

          liang Liang Zhen (Inactive) added a comment - Change it to blocker

          Please be more cautious about this and use at least 500 as the value, we can't use interval less than one second, even milliseconds stamp could not be so accurate but it shouldn't be less than 500.

          liang Liang Zhen (Inactive) added a comment - Please be more cautious about this and use at least 500 as the value, we can't use interval less than one second, even milliseconds stamp could not be so accurate but it shouldn't be less than 500.

          Ok, checking programmatically makes sense. Based on Liang's comment, I'm assuming I can use 100 as the value to check the timestamp against to determine whether to use local or remote timestamps.

          doug Doug Oucharek (Inactive) added a comment - Ok, checking programmatically makes sense. Based on Liang's comment, I'm assuming I can use 100 as the value to check the timestamp against to determine whether to use local or remote timestamps.
          liang Liang Zhen (Inactive) added a comment - - edited

          Yes we can distinguish this by value, the minimum interval of stat request would be 1 second which is 1000 milliseconds, but number of running tests on a node is almost impossible to be larger than 100.

          I thought about this at the beginning, but felt it could be confusing for code maintainers (OK, we are the maintainers), that's the reason I suggested Doug to fix by current way.

          However, I agree user will like it more if it can automatically decide which timestamp to choose, so if you think it's kind of acceptable style, I will not object to choose the easier way.

          liang Liang Zhen (Inactive) added a comment - - edited Yes we can distinguish this by value, the minimum interval of stat request would be 1 second which is 1000 milliseconds, but number of running tests on a node is almost impossible to be larger than 100. I thought about this at the beginning, but felt it could be confusing for code maintainers (OK, we are the maintainers), that's the reason I suggested Doug to fix by current way. However, I agree user will like it more if it can automatically decide which timestamp to choose, so if you think it's kind of acceptable style, I will not object to choose the easier way.

          Yes we can distinguish this by value, the minimum interval of stat request would be 1 second which is 1000 milliseconds, but number of running tests on a node is almost impossible to be larger than 100.
          I thought about this at the beginning, but think that could be confusing for code maintainer (OK, we are the maintainers), that's the reason I suggest Doug to fix by current way.
          However, I think user will like it more if it can automatically decide which timestamp to choose, so if you think it's kind of acceptable style, I will not object to choose the easier way.

          liang Liang Zhen (Inactive) added a comment - Yes we can distinguish this by value, the minimum interval of stat request would be 1 second which is 1000 milliseconds, but number of running tests on a node is almost impossible to be larger than 100. I thought about this at the beginning, but think that could be confusing for code maintainer (OK, we are the maintainers), that's the reason I suggest Doug to fix by current way. However, I think user will like it more if it can automatically decide which timestamp to choose, so if you think it's kind of acceptable style, I will not object to choose the easier way.

          Doug,
          how are the old timestamps invalid? Looking at the old patch it seems it would be possible to check this programattically. The number of tests is always going to be some small number, but the milliseconds will typically be larger values. Is it reasonable to say if they are so small as to be indistinguishable from the test count it doesn't really matter?

          adilger Andreas Dilger added a comment - Doug, how are the old timestamps invalid? Looking at the old patch it seems it would be possible to check this programattically. The number of tests is always going to be some small number, but the milliseconds will typically be larger values. Is it reasonable to say if they are so small as to be indistinguishable from the test count it doesn't really matter?

          People

            doug Doug Oucharek (Inactive)
            wang Wally Wang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: