Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-953

OST connection lost

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.2.0, Lustre 1.8.6
    • None
    • lustre-1.8.6.81
       OFED1.5.3.1
      NASA AMES
    • 3
    • 7037

    Description

      Upgrading to lustre 1.8.6 and OFED1.5.3.1 we have started to see OST<->MDT connection issue.
      We have checked the IB fabric for errors and have found none.
      Are there any know issues with Lustre1.8.6 and OFED1.5.3?

      === ERROR ON MDS ===
      Dec 28 07:04:56 service100 kernel: Lustre: 6149:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1389011653751232 sent from nbp6-OST0002-osc to NID 10.151.25.157@o2ib 7s ago has timed out (7s prior to deadline).
      Dec 28 07:04:56 service100 kernel: req@ffff81071b30ac00 x1389011653751232/t0 o13->nbp6-OST0002_UUID@10.151.25.157@o2ib:7/4 lens 192/528 e 0 to 1 dl 1325084696 ref 1 fl Rpc:N/0/0 rc 0/0
      Dec 28 07:04:56 service100 kernel: Lustre: 6149:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 258 previous similar messages
      Dec 28 07:04:56 service100 kernel: Lustre: nbp6-OST0002-osc: Connection to service nbp6-OST0002 via nid 10.151.25.157@o2ib was lost; in progress operations using this service will wait for recovery to complete.
      Dec 28 07:04:56 service100 kernel: Lustre: Skipped 2 previous similar messages
      Dec 28 07:05:04 service100 kernel: Lustre: 6151:0:(import.c:517:import_select_connection()) nbp6-OST000a-osc: tried all connections, increasing latency to 11s
      Dec 28 07:05:04 service100 kernel: Lustre: 6151:0:(import.c:517:import_select_connection()) Skipped 220 previous similar messages
      Dec 28 07:05:06 service100 kernel: Lustre: nbp6-OST0042-osc: Connection restored to service nbp6-OST0042 using nid 10.151.25.157@o2ib.
      Dec 28 07:05:06 service100 kernel: Lustre: Skipped 14 previous similar messages
      Dec 28 07:05:06 service100 kernel: LustreError: 30626:0:(quota_ctl.c:473:lov_quota_ctl()) ost 75 is inactive
      Dec 28 07:05:06 service100 kernel: LustreError: 30626:0:(quota_ctl.c:473:lov_quota_ctl()) Skipped 5 previous similar messages Dec 28 07:05:06 service100 kernel: Lustre: MDS nbp6-MDT0000: nbp6-OST0042_UUID now active, resetting orphans
      Dec 28 07:05:06 service100 kernel: Lustre: Skipped 29 previous similar messages
      Dec 28 07:05:07 service100 kernel: LustreError: 30630:0:(quota_master.c:1698:qmaster_recovery_main()) nbp6-MDT0000: qmaster recovery failed for uid 11631 rc:-11)
      Dec 28 07:05:07 service100 kernel: LustreError: 30630:0:(quota_master.c:1698:qmaster_recovery_main()) Skipped 52 previous similar messages

      Attachments

        Issue Links

          Activity

            People

              liang Liang Zhen (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: