Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12212

Often requests timeouts during dbench run

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Lustre 2.13.0
    • Labels:
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      Ordinary dbench run start showing a lot of messages like this:

      Apr 21 03:11:05 nodez kernel: Lustre: 4236:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1555830658/real 1555830658] req@ffff8800a273cc00 x1631406558218272/t0(0) o101->lustre-MDT0000-mdc-ffff8800af2fb800@0@lo:12/10 lens 616/4752 e 0 to 1 dl 1555830665 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
      Apr 21 03:11:05 nodez kernel: Lustre: lustre-MDT0000-mdc-ffff8800af2fb800: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
      Apr 21 03:11:05 nodez kernel: Lustre: lustre-MDT0000: Client 2f22eecb-5055-b4d6-16b5-a958700fcbda (at 0@lo) reconnecting
      Apr 21 03:11:05 nodez kernel: Lustre: lustre-MDT0000: Connection restored to 4e8c3a45-3b8a-cd6a-047f-22363e8171e6 (at 0@lo)
      Apr 21 03:11:05 nodez kernel: Lustre: Skipped 3 previous similar messages
      Apr 21 03:11:48 nodez kernel: Lustre: 4237:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1555830701/real 1555830701] req@ffff8800862f3440 x1631406567790080/t0(0) o101->lustre-MDT0000-mdc-ffff8800af2fb800@0@lo:12/10 lens 616/4752 e 0 to 1 dl 1555830708 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
      Apr 21 03:11:48 nodez kernel: Lustre: lustre-MDT0000-mdc-ffff8800af2fb800: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
      Apr 21 03:11:48 nodez kernel: Lustre: lustre-MDT0000: Client 2f22eecb-5055-b4d6-16b5-a958700fcbda (at 0@lo) reconnecting
      Apr 21 03:12:12 nodez kernel: Lustre: 4238:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1555830725/real 1555830725] req@ffff8800861750c0 x1631406572864272/t0(0) o101->lustre-MDT0000-mdc-ffff8800af2fb800@0@lo:12/10 lens 616/4752 e 0 to 1 dl 1555830732 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
      Apr 21 03:12:12 nodez kernel: Lustre: lustre-MDT0000-mdc-ffff8800af2fb800: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
      Apr 21 03:12:12 nodez kernel: Lustre: lustre-MDT0000: Client 2f22eecb-5055-b4d6-16b5-a958700fcbda (at 0@lo) reconnecting
      Apr 21 03:12:12 nodez kernel: Lustre: lustre-MDT0000: Connection restored to 4e8c3a45-3b8a-cd6a-047f-22363e8171e6 (at 0@lo)
      Apr 21 03:12:12 nodez kernel: Lustre: Skipped 3 previous similar messages

      This started after LU-9193 patch landing (found by git bisect). There is nothing special with test setup, no SELinux, just local run with dbench -D /mnt/lustre/testdir 4

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tappro Mikhail Pershin
                Reporter:
                tappro Mikhail Pershin
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: