Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11601

IR doesn't handle EAGAIN after initial connect when pinger_recov is 0

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      There is a chance that client is connecting to OST before recovery when OST is not configured. In such case OST returns EAGAIN:

       if (target->obd_no_conn) {
                      spin_unlock(&target->obd_dev_lock);
       
                      CDEBUG(D_INFO, "%s: Temporarily refusing client connection "
                                     "from %s\n", target->obd_name,
                                     libcfs_nid2str(req->rq_peer.nid));
                      GOTO(out, rc = -EAGAIN);
              }    
      

      There is no problem when pinger_recov is enabled because ptlrpc_pinger_main will reconnect later.
      But it doesn't reconnect when pinger_recov is 0.

      00002000:00000001:0.0:1459250035.710100:0:56316:0:(ofd_dev.c:2083:ofd_init0()) Process entered
      00002000:00000001:0.0:1459250035.772688:0:56316:0:(ofd_dev.c:2221:ofd_init0()) Process leaving (rc=0 : 0 : 0)
      00010000:00000001:2.0:1459250035.813892:0:34564:0:(ldlm_lib.c:944:target_handle_connect()) Process leaving via out (rc=18446744073709551605 : -11 : 0xfffffffffffffff5)
      00002000:00000001:3.0:1459250035.820015:0:56305:0:(ofd_dev.c:416:ofd_prepare()) Process entered
      00002000:00000001:3.0:1459250035.822878:0:56305:0:(ofd_dev.c:452:ofd_prepare()) Process leaving (rc=0 : 0 : 0)
      00000100:00000001:1.0:1459250035.820231:0:33635:0:(import.c:985:ptlrpc_connect_interpret()) Process leaving via out (rc=18446744073709551605 : -11 : 0xfffffffffffffff5)
      00000100:00080000:1.0:1459250035.820232:0:33635:0:(import.c:1217:ptlrpc_connect_interpret()) ffff88004003d800 lustre-OST0000_UUID: changing import state from CONNECTING to DISCONN
      00000100:00080000:1.0:1459250035.820233:0:33635:0:(import.c:1263:ptlrpc_connect_interpret()) recovery of lustre-OST0000_UUID on 192.168.1.34@tcp failed (-11)

      Attachments

        Issue Links

          Activity

            People

              scherementsev Sergey Cheremencev
              scherementsev Sergey Cheremencev
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: