Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9861

Client not reconnecting to OST

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.8.0
    • lustre-2.8.0_9.chaos
    • 3
    • 9223372036854775807

    Description

      An OSS evicted a client on Aug 1 during a planned network outage.

       

      [Tue Aug 1 16:43:54 2017] Lustre: lsh-OST0005: haven't heard from client d40a30fc-ef66-94ff-e318-77d2c23e45f8 (at 192.168.137.212@o2ib27) in 227 seconds. I think it's dead, and I am evicting it. exp ffff881d1aaa8c00, cur 1501631035 expire 1501630885 last 1501630808
      

      Two days later the client had still not reconnected, although both sides could lctl ping eachother. The client logged this on the console.

      [Thu Aug  3 12:03:52 2017] LustreError: 11704:0:(import.c:336:ptlrpc_invalidate_import()) lsh-OST0005_UUID: rc = -110 waiting for callback (1 != 0)
      [Thu Aug  3 12:03:52 2017] LustreError: 11704:0:(import.c:336:ptlrpc_invalidate_import()) Skipped 5 previous similar messages
      [Thu Aug  3 12:03:52 2017] LustreError: 11704:0:(import.c:362:ptlrpc_invalidate_import()) @@@ still on sending list  req@ffff880168fc3800 x1574573141267560/t0(0) o3->lsh-OST0005-osc-ffff88203c63f800@172.19.3.22@o2ib600:6/4 lens 488/432 e 0 to 0 dl 1501630582 ref 2 fl Unregistering:ES/0/ffffffff rc -5/-1
      [Thu Aug  3 12:03:52 2017] LustreError: 11704:0:(import.c:362:ptlrpc_invalidate_import()) Skipped 5 previous similar messages
      [Thu Aug  3 12:03:52 2017] LustreError: 11704:0:(import.c:378:ptlrpc_invalidate_import()) lsh-OST0005_UUID: RPCs in "Unregistering" phase found (1). Network is sluggish? Waiting them to error out.
      [Thu Aug  3 12:03:52 2017] LustreError: 11704:0:(import.c:378:ptlrpc_invalidate_import()) Skipped 5 previous similar messages
      

      This seems quite similar to LU-8511, which was closed as a duplicate of LU-7434. That issue had two associated patches, but only https://review.whamcloud.com/#/c/18934/ was landed to 2.8 FE, whereas https://review.whamcloud.com/#/c/19953/ was not.

      Attachments

        Issue Links

          Activity

            People

              jay Jinshan Xiong (Inactive)
              nedbass Ned Bass (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: