Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • Lustre 2.12.2
    • None
    • 2
    • 9223372036854775807

    Description

      We have been setting ko2iblnd timeout = 150 (default of 50) for our cluster. From reading the code this is no longer being used and instead lnet_lnd_timeout is used.

      For example in kiblnd_queue_tx_locked

          timeout_ns = lnet_get_lnd_timeout() * NSEC_PER_SEC;
          tx->tx_queued = 1;
          tx->tx_deadline = ktime_add_ns(ktime_get(), timeout_ns);
      

      and
      lnet_get_lnd_timeout() returns the new default of 5. Does this mean we went from 150 to 5!

      In the documentation it says that lnet_lnd_timeout derived from lnet_transaction_timeout and retry_count. But that is not getting set for tx->tx_deadline.

      Am I reading the code correctly?

      Attachments

        Issue Links

          Activity

            [LU-13020] ko2iblnd tuning
            pjones Peter Jones made changes -
            Resolution New: Cannot Reproduce [ 5 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones made changes -
            Link Original: This issue is related to JFC-10 [ JFC-10 ]
            pjones Peter Jones made changes -
            Link Original: This issue is related to JFC-21 [ JFC-21 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-13145 [ LU-13145 ]
            pjones Peter Jones made changes -
            Link Original: This issue is related to JFC-27 [ JFC-27 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to JFC-27 [ JFC-27 ]
            mdiep Minh Diep made changes -
            Link New: This issue is related to JFC-10 [ JFC-10 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to JFC-21 [ JFC-21 ]
            pjones Peter Jones made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Amir Shehata [ ashehata ]
            mhanafi Mahmoud Hanafi made changes -
            Description Original: We have been setting ko2iblnd timeout = 150 (default of 50) for our cluster. From reading the code this is no longer being used and instead lnet_lnd_timeout is used.

            For example in kiblnd_queue_tx_locked
            {code:java}
                timeout_ns = lnet_get_lnd_timeout() * NSEC_PER_SEC;
                tx->tx_queued = 1;
                tx->tx_deadline = ktime_add_ns(ktime_get(), timeout_ns);
            {code}
            and
            lnet_get_lnd_timeout() returns the new default of 5. Does this mean we went from 150 to 5!

            In the documentation it says that lnet_lnd_timeout derived from lnet_transaction_timeout and retry_count. But that is not getting set for tx->tx_deadline.

            Am I reading the code correctly.
            New: We have been setting ko2iblnd timeout = 150 (default of 50) for our cluster. From reading the code this is no longer being used and instead lnet_lnd_timeout is used.

            For example in kiblnd_queue_tx_locked
            {code:java}
                timeout_ns = lnet_get_lnd_timeout() * NSEC_PER_SEC;
                tx->tx_queued = 1;
                tx->tx_deadline = ktime_add_ns(ktime_get(), timeout_ns);
            {code}
            and
            lnet_get_lnd_timeout() returns the new default of 5. Does this mean we went from 150 to 5!

            In the documentation it says that lnet_lnd_timeout derived from lnet_transaction_timeout and retry_count. But that is not getting set for tx->tx_deadline.

            Am I reading the code correctly?

            People

              ashehata Amir Shehata (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: