Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2271

recovery-small test 10 does not properly reconnect

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • Lustre 2.4.0
    • 3
    • 5430

    Description

      It appears that recovery-small test 10 could cause eviction of this client from not only MDS, but also all OSTs.
      As the result subsequent test will also fail when it tries to touch one of the not connected OSTs.
      (often manifested in test 11 failing ,but if you skip test 11, test 12 will fail, or test 13 if you skip tests 11 and 12).

      [15398.819135] Lustre: DEBUG MARKER: == recovery-small test 10: finish request on server after client eviction (bug 1521) == 00:21:04 (1351916464)
      [15398.902152] Lustre: *** cfs_fail_loc=305, val=0***
      [15398.904516] Lustre: *** cfs_fail_loc=305, val=0***
      [15399.574089] Lustre: *** cfs_fail_loc=305, val=0***
      [15399.575254] Lustre: Skipped 2 previous similar messages
      [15406.572138] Lustre: 21155:0:(client.c:1912:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1351916465/real 1351916465]  req@ffff8801acc1cbf0 x1417586946343301/t0(0) o104->lustre-OST0000@0@lo:15/16 lens 296/224 e 0 to 1 dl 1351916472 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [15406.572324] LustreError: 138-a: lustre-OST0000: A client on nid 0@lo was evicted due to a lock blocking callback time out: rc -107
      [15406.578774] Lustre: 21155:0:(client.c:1912:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
      [15406.580254] LustreError: 21155:0:(ldlm_lockd.c:684:ldlm_handle_ast_error()) ### client (nid 0@lo) returned 0 from blocking AST ns: filter-ffff88011e018000 lock: ffff880203b29db8/0xb21738563fa1aa1f lrc: 1/0,0 mode: --/PW res: 4/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->2097151) flags: 0x200100a0 nid: 0@lo remote: 0xb21738563fa1aa18 expref: 2 pid: 21155 timeout 4298779689
      [15409.904300] LustreError: 138-a: lustre-MDT0000: A client on nid 0@lo was evicted due to a lock blocking callback time out: rc -107
      [15409.910598] LustreError: Skipped 1 previous similar message
      [15410.061266] LustreError: 21616:0:(mdt_handler.c:3031:mdt_recovery()) operation 101 on unconnected MDS from 12345-0@lo
      [15410.063011] LustreError: 11-0: an error occurred while communicating with 0@lo. The ldlm_enqueue operation failed with -107
      [15410.066075] LustreError: Skipped 1 previous similar message
      [15410.067186] Lustre: lustre-MDT0000-mdc-ffff8801c5fefbf0: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
      [15410.069999] Lustre: Skipped 4 previous similar messages
      [15410.073860] LustreError: 167-0: lustre-MDT0000-mdc-ffff8801c5fefbf0: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      [15410.077472] LustreError: 22383:0:(mdc_locks.c:773:mdc_enqueue()) ldlm_cli_enqueue: -5
      [15410.078167] Lustre: lustre-MDT0000-mdc-ffff8801c5fefbf0: Connection restored to lustre-MDT0000 (at 0@lo)
      [15410.078169] Lustre: Skipped 4 previous similar messages
      [15410.088663] LustreError: 167-0: lustre-OST0001-osc-ffff8801c5fefbf0: This client was evicted by lustre-OST0001; in progress operations using this service will fail.
      

      Attachments

        Issue Links

          Activity

            People

              utopiabound Nathaniel Clark
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: