Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1142

MDS recovery fails due to client evictions

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • None
    • Lustre 2.1.1
    • None
    • Hyperion - RHEL 5
    • 3
    • 6445

    Description

      mds-recovery fails, a single client is evicted.
      Client:
      ---------------

      Lustre: lustre-MDT0000-mdc-ffff81021ccdac00: Connection restored to service lustre-MDT0000 using nid 192.168.120.126@o2ib.
      Lustre: DEBUG MARKER: mds has failed over 2 times, and counting...
      LustreError: 11-0: an error occurred while communicating with 192.168.120.126@o2ib. The ldlm_enqueue operation failed with -107
      Lustre: lustre-MDT0000-mdc-ffff81021ccdac00: Connection to service lustre-MDT0000 via nid 192.168.120.126@o2ib was lost; in progress operations using this service w
      ill wait for recovery to complete.
      LustreError: 167-0: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      Lustre: Server lustre-MDT0000_UUID version (2.1.1.0) is much newer than client version (1.8.7)
      LustreError: 20567:0:(mdc_locks.c:652:mdc_enqueue()) ldlm_cli_enqueue error: -4
      LustreError: 20567:0:(file.c:3329:ll_inode_revalidate_fini()) failure -4 inode 222298113
      LustreError: 20742:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req@ffff810217f9ec00 x1394757178766278/t0 o101->lustre-MDT0000_UUID@192.168.120.126@o
      2ib:12/10 lens 544/1232 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
      Lustre: lustre-MDT0000-mdc-ffff81021ccdac00: Connection restored to service lustre-MDT0000 using nid 192.168.120.126@o2ib.
      Lustre: DEBUG MARKER: Duration: 86400
      LustreError: 17920:0:(o2iblnd_cb.c:2532:kiblnd_rejected()) 192.168.117.3@o2ib rejected: o2iblnd fatal error
      LustreError: 17920:0:(o2iblnd_cb.c:2532:kiblnd_rejected()) Skipped 39 previous similar messages
      

      MDS

      Lustre: DEBUG MARKER: ==== Checking the clients loads AFTER failover -- failure NOT OK
      Lustre: lustre-MDT0000: sending delayed replies to recovered clients
      Lustre: 25439:0:(mds_lov.c:1024:mds_notify()) MDS mdd_obd-lustre-MDT0000: in recovery, not resetting orphans on lustre-OST0000_UUID
      Lustre: 25439:0:(mds_lov.c:1024:mds_notify()) Skipped 7 previous similar messages
      Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST0004_UUID now active, resetting orphans
      Lustre: Skipped 7 previous similar messages
      Lustre: DEBUG MARKER: mds has failed over 2 times, and counting...
      md: rebuild md1 throttled due to IO
      LustreError: 0:0:(ldlm_lockd.c:356:waiting_locks_callback()) ### lock callback timer expired after 150s: evicting client at 192.168.114.116@o2ib  ns: mdt-ffff81091498f800 lock: ffff810fbf9f66c0/0xcb280298ce1d3c25 lrc: 3/0,0 mode: PR/PR res: 222298113/3922531948 bits 0x3 rrc: 217 type: IBT flags: 0x20 remote: 0x3c5e7588abacbec3 expref: 8 pid: 25553 timeout: 4299068451
      LustreError: 0:0:(ldlm_lockd.c:356:waiting_locks_callback()) ### lock callback timer expired after 150s: evicting client at 192.168.114.51@o2ib  ns: mdt-ffff81091498f800 lock: ffff810fbf9f6480/0xcb280298ce1d3c17 lrc: 3/0,0 mode: PR/PR res: 222298113/3922531948 bits 0x3 rrc: 217 type: IBT flags: 0x20 remote: 0x5711f697b9a89693 expref: 8 pid: 25553 timeout: 4299068451
      LustreError: 25588:0:(ldlm_lockd.c:1210:ldlm_handle_enqueue0()) ### lock on destroyed export ffff81054ec6c000 ns: mdt-ffff81091498f800 lock: ffff810cef6a4480/0xcb280298ce1d3f2e lrc: 3/0,0 mode: PR/PR res: 222298113/3922531948 bits 0x3 rrc: 193 type: IBT flags: 0x4000000 remote: 0xfb40c962a891f585 expref: 3 pid: 25588 timeout: 0
      LustreError: 25588:0:(ldlm_lib.c:2129:target_send_reply_msg()) @@@ processing error (-107)  req@ffff810397453000 x1394757210221710/t0(0) o-1->7a66717e-dbe2-1092-ecee-6263c3bca713@NET_0x50000c0a8728f_UUID:0/0 lens 544/536 e 2 to 0 dl 1330146049 ref 1 fl Interpret:/ffffffff/ffffffff rc -107/-1
      LustreError: 25616:0:(ldlm_lockd.c:1210:ldlm_handle_enqueue0()) ### lock on destroyed export ffff810550486000 ns: mdt-ffff81091498f800 lock: ffff8105542e5d80/0xcb280298ce1d40d2 lrc: 3/0,0 mode: PR/PR res: 222298113/3922531948 bits 0x3 rrc: 168 type: IBT flags: 0x4000000 remote: 0x3c5e7588abacbed1 expref: 3 pid: 25616 timeout: 0
      LustreError: 25588:0:(ldlm_lib.c:2129:target_send_reply_msg()) Skipped 96 previous similar messages
      Lustre: 25570:0:(ldlm_lib.c:877:target_handle_connect()) lustre-MDT0000: connection from bb5f6103-fd47-8201-1084-9a41a87168fe@192.168.114.116@o2ib t8590090887 exp 0000000000000000 cur 1330145971 last 0
      Lustre: 25570:0:(ldlm_lib.c:877:target_handle_connect()) Skipped 127 previous similar messages
      Lustre: 25582:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import lustre-MDT0000->NET_0x50000c0a87291_UUID netid 50000: select flavor null
      Lustre: 25582:0:(sec.c:1474:sptlrpc_import_sec_adapt()) Skipped 136 previous similar messages
      Lustre: DEBUG MARKER: Duration: 86400
      md: rebuild md1 throttled due to IO
      md: rebuild md1 throttled due to IO
      md: rebuild md1 throttled due to IO
      

      Attachments

        Activity

          People

            green Oleg Drokin
            cliffw Cliff White (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: