Details

    • Improvement
    • Resolution: Fixed
    • Blocker
    • Lustre 2.3.0
    • Lustre 2.3.0
    • None
    • 22,639
    • 4569

    Description

      We should send the timestamp data from the node along with the performance
      counters. This will give us much more accurate performance data on a per-node
      and per-group basis.

      This is originally discussed in Oracle Bug 22639.

      Attachments

        Activity

          [LU-445] Send timestamps with LNet counters

          Please be more cautious about this and use at least 500 as the value, we can't use interval less than one second, even milliseconds stamp could not be so accurate but it shouldn't be less than 500.

          liang Liang Zhen (Inactive) added a comment - Please be more cautious about this and use at least 500 as the value, we can't use interval less than one second, even milliseconds stamp could not be so accurate but it shouldn't be less than 500.

          Ok, checking programmatically makes sense. Based on Liang's comment, I'm assuming I can use 100 as the value to check the timestamp against to determine whether to use local or remote timestamps.

          doug Doug Oucharek (Inactive) added a comment - Ok, checking programmatically makes sense. Based on Liang's comment, I'm assuming I can use 100 as the value to check the timestamp against to determine whether to use local or remote timestamps.
          liang Liang Zhen (Inactive) added a comment - - edited

          Yes we can distinguish this by value, the minimum interval of stat request would be 1 second which is 1000 milliseconds, but number of running tests on a node is almost impossible to be larger than 100.

          I thought about this at the beginning, but felt it could be confusing for code maintainers (OK, we are the maintainers), that's the reason I suggested Doug to fix by current way.

          However, I agree user will like it more if it can automatically decide which timestamp to choose, so if you think it's kind of acceptable style, I will not object to choose the easier way.

          liang Liang Zhen (Inactive) added a comment - - edited Yes we can distinguish this by value, the minimum interval of stat request would be 1 second which is 1000 milliseconds, but number of running tests on a node is almost impossible to be larger than 100. I thought about this at the beginning, but felt it could be confusing for code maintainers (OK, we are the maintainers), that's the reason I suggested Doug to fix by current way. However, I agree user will like it more if it can automatically decide which timestamp to choose, so if you think it's kind of acceptable style, I will not object to choose the easier way.

          Yes we can distinguish this by value, the minimum interval of stat request would be 1 second which is 1000 milliseconds, but number of running tests on a node is almost impossible to be larger than 100.
          I thought about this at the beginning, but think that could be confusing for code maintainer (OK, we are the maintainers), that's the reason I suggest Doug to fix by current way.
          However, I think user will like it more if it can automatically decide which timestamp to choose, so if you think it's kind of acceptable style, I will not object to choose the easier way.

          liang Liang Zhen (Inactive) added a comment - Yes we can distinguish this by value, the minimum interval of stat request would be 1 second which is 1000 milliseconds, but number of running tests on a node is almost impossible to be larger than 100. I thought about this at the beginning, but think that could be confusing for code maintainer (OK, we are the maintainers), that's the reason I suggest Doug to fix by current way. However, I think user will like it more if it can automatically decide which timestamp to choose, so if you think it's kind of acceptable style, I will not object to choose the easier way.

          Doug,
          how are the old timestamps invalid? Looking at the old patch it seems it would be possible to check this programattically. The number of tests is always going to be some small number, but the milliseconds will typically be larger values. Is it reasonable to say if they are so small as to be indistinguishable from the test count it doesn't really matter?

          adilger Andreas Dilger added a comment - Doug, how are the old timestamps invalid? Looking at the old patch it seems it would be possible to check this programattically. The number of tests is always going to be some small number, but the milliseconds will typically be larger values. Is it reasonable to say if they are so small as to be indistinguishable from the test count it doesn't really matter?

          The patch for making the original work backward compatible is: http://review.whamcloud.com/#change,3192

          doug Doug Oucharek (Inactive) added a comment - The patch for making the original work backward compatible is: http://review.whamcloud.com/#change,3192

          With Liang's recommendation, I am reopening this ticket to add a patch to make the change previously done backwards compatible.

          With the change as is, a 2.3 system running against a 2.2 or 2.1 system will have an invalid timestamp for doing bandwidth calculations.

          I plan to add a new flag to "lst stat" which will trigger the use of the remote timestamps. If the flag is not given, then the previous behaviour, using the local timestamp, will be done.

          This change will be set up to change the default from using local timestamps to using remote timestamp when the Lustre version hits 2.8.

          doug Doug Oucharek (Inactive) added a comment - With Liang's recommendation, I am reopening this ticket to add a patch to make the change previously done backwards compatible. With the change as is, a 2.3 system running against a 2.2 or 2.1 system will have an invalid timestamp for doing bandwidth calculations. I plan to add a new flag to "lst stat" which will trigger the use of the remote timestamps. If the flag is not given, then the previous behaviour, using the local timestamp, will be done. This change will be set up to change the default from using local timestamps to using remote timestamp when the Lustre version hits 2.8.

          Integrated in lustre-dev » x86_64,client,el6,inkernel #340
          LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

          Result = SUCCESS
          Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
          Files :

          • lnet/include/lnet/lnetst.h
          • lnet/utils/lst.c
          • lnet/selftest/framework.c
          • lnet/selftest/selftest.h
          hudson Build Master (Inactive) added a comment - Integrated in lustre-dev » x86_64,client,el6,inkernel #340 LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487) Result = SUCCESS Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487 Files : lnet/include/lnet/lnetst.h lnet/utils/lst.c lnet/selftest/framework.c lnet/selftest/selftest.h

          Integrated in lustre-dev » x86_64,server,el5,inkernel #340
          LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

          Result = SUCCESS
          Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
          Files :

          • lnet/selftest/framework.c
          • lnet/utils/lst.c
          • lnet/selftest/selftest.h
          • lnet/include/lnet/lnetst.h
          hudson Build Master (Inactive) added a comment - Integrated in lustre-dev » x86_64,server,el5,inkernel #340 LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487) Result = SUCCESS Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487 Files : lnet/selftest/framework.c lnet/utils/lst.c lnet/selftest/selftest.h lnet/include/lnet/lnetst.h

          Integrated in lustre-dev » i686,client,el5,inkernel #340
          LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

          Result = SUCCESS
          Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
          Files :

          • lnet/utils/lst.c
          • lnet/selftest/selftest.h
          • lnet/include/lnet/lnetst.h
          • lnet/selftest/framework.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-dev » i686,client,el5,inkernel #340 LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487) Result = SUCCESS Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487 Files : lnet/utils/lst.c lnet/selftest/selftest.h lnet/include/lnet/lnetst.h lnet/selftest/framework.c

          Integrated in lustre-dev » x86_64,server,el6,inkernel #340
          LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

          Result = SUCCESS
          Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
          Files :

          • lnet/include/lnet/lnetst.h
          • lnet/selftest/selftest.h
          • lnet/utils/lst.c
          • lnet/selftest/framework.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-dev » x86_64,server,el6,inkernel #340 LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487) Result = SUCCESS Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487 Files : lnet/include/lnet/lnetst.h lnet/selftest/selftest.h lnet/utils/lst.c lnet/selftest/framework.c

          People

            doug Doug Oucharek (Inactive)
            wang Wally Wang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: