Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7097

conf-sanity test_84 (check recovery_time_hard) fails on DNE setup

Details

    • Bug
    • Resolution: Incomplete
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      conf-sanity fails on DNE setup.
      Simple reproducer:
      MDSCOUNT=2 ONLY=84 sh ./conf-sanity.sh

      Attachments

        Issue Links

          Activity

            [LU-7097] conf-sanity test_84 (check recovery_time_hard) fails on DNE setup

            Can reopen ticket if new patch is submitted.

            adilger Andreas Dilger added a comment - Can reopen ticket if new patch is submitted.
            scherementsev Sergey Cheremencev added a comment - - edited

            It seems issue is already fixed by "LU-7222 tests: add Mulitple MDTs to test_84".
            But on the other hand my patch introduces mdsfailover_HOST support for test_84.
            Also it starts all possible clients and waits that just one of them will be evicted.

            scherementsev Sergey Cheremencev added a comment - - edited It seems issue is already fixed by " LU-7222 tests: add Mulitple MDTs to test_84". But on the other hand my patch introduces mdsfailover_HOST support for test_84. Also it starts all possible clients and waits that just one of them will be evicted.

            Test failed due to timeout. There is not DNE support in test.

            Lustre: DEBUG MARKER: == conf-sanity test 84: check recovery_time_hard == 12:21:27 (1417263687)
            LustreError: 11-0: lustre-MDT0000-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -11.
            LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19.
            LustreError: Skipped 1 previous similar message
            Lustre: Mounted lustre-client
            LustreError: 11-0: lustre-MDT0001-mdc-ffff88005aa65400: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19.
            LustreError: Skipped 1 previous similar message
            LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19.
            LustreError: Skipped 1 previous similar message
            LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19.
            LustreError: Skipped 1 previous similar message
            LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19.
            LustreError: Skipped 3 previous similar messages
            LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19.
            LustreError: Skipped 7 previous similar messages
            LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19.
            LustreError: Skipped 13 previous similar messages
            LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19.
            LustreError: Skipped 25 previous similar messages
            LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19.
            LustreError: Skipped 51 previous similar messages
            Lustre: lustre-MDT0000-mdc-ffff88005aa65400: Connection to lustre-MDT0000 (at 192.168.112.5@tcp) was lost; in progress operations using this service will wait for recovery to complete
            LustreError: 167-0: lustre-MDT0000-mdc-ffff88005aa65400: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
            LustreError: 5312:0:(ldlm_resource.c:781:ldlm_resource_complain()) lustre-MDT0000-mdc-ffff88005aa65400: namespace resource [0x240000401:0x1:0x0].0 (ffff88003fd94e40) refcount nonzero (1) after lock cleanup; forcing cleanup.
            LustreError: 5312:0:(ldlm_resource.c:1421:ldlm_resource_dump()) --- Resource: [0x240000401:0x1:0x0].0 (ffff88003fd94e40) refcount = 2
            LustreError: 5312:0:(ldlm_resource.c:1424:ldlm_resource_dump()) Granted locks (in reverse order):
            LustreError: 5312:0:(ldlm_resource.c:1427:ldlm_resource_dump()) ### ### ns: lustre-MDT0000-mdc-ffff88005aa65400 lock: ffff88005c258b00/0xf5fcc28fef022372 lrc: 3/0,0 mode: PR/PR res: [0x240000401:0x1:0x0].0 bits 0x1b rrc: 2 type: IBT flags: 0x52f400000000 nid: local remote: 0x73481ecc0160356c expref: -99 pid: 4521 timeout: 0 lvb_type: 0
            Lustre: lustre-MDT0000-mdc-ffff88005aa65400: Connection restored to lustre-MDT0000 (at 192.168.112.5@tcp)
            scherementsev Sergey Cheremencev added a comment - Test failed due to timeout. There is not DNE support in test. Lustre: DEBUG MARKER: == conf-sanity test 84: check recovery_time_hard == 12:21:27 (1417263687) LustreError: 11-0: lustre-MDT0000-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -11. LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19. LustreError: Skipped 1 previous similar message Lustre: Mounted lustre-client LustreError: 11-0: lustre-MDT0001-mdc-ffff88005aa65400: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19. LustreError: Skipped 1 previous similar message LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19. LustreError: Skipped 1 previous similar message LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19. LustreError: Skipped 1 previous similar message LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19. LustreError: Skipped 3 previous similar messages LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19. LustreError: Skipped 7 previous similar messages LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19. LustreError: Skipped 13 previous similar messages LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19. LustreError: Skipped 25 previous similar messages LustreError: 11-0: lustre-MDT0001-mdc-ffff88005beb5000: Communicating with 192.168.112.5@tcp, operation mds_connect failed with -19. LustreError: Skipped 51 previous similar messages Lustre: lustre-MDT0000-mdc-ffff88005aa65400: Connection to lustre-MDT0000 (at 192.168.112.5@tcp) was lost; in progress operations using this service will wait for recovery to complete LustreError: 167-0: lustre-MDT0000-mdc-ffff88005aa65400: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. LustreError: 5312:0:(ldlm_resource.c:781:ldlm_resource_complain()) lustre-MDT0000-mdc-ffff88005aa65400: namespace resource [0x240000401:0x1:0x0].0 (ffff88003fd94e40) refcount nonzero (1) after lock cleanup; forcing cleanup. LustreError: 5312:0:(ldlm_resource.c:1421:ldlm_resource_dump()) --- Resource: [0x240000401:0x1:0x0].0 (ffff88003fd94e40) refcount = 2 LustreError: 5312:0:(ldlm_resource.c:1424:ldlm_resource_dump()) Granted locks (in reverse order): LustreError: 5312:0:(ldlm_resource.c:1427:ldlm_resource_dump()) ### ### ns: lustre-MDT0000-mdc-ffff88005aa65400 lock: ffff88005c258b00/0xf5fcc28fef022372 lrc: 3/0,0 mode: PR/PR res: [0x240000401:0x1:0x0].0 bits 0x1b rrc: 2 type: IBT flags: 0x52f400000000 nid: local remote: 0x73481ecc0160356c expref: -99 pid: 4521 timeout: 0 lvb_type: 0 Lustre: lustre-MDT0000-mdc-ffff88005aa65400: Connection restored to lustre-MDT0000 (at 192.168.112.5@tcp)
            jamesanunez James Nunez (Inactive) added a comment - - edited

            Sergey - What failure message do you get when this test fails? Could you upload any relevant logs?

            jamesanunez James Nunez (Inactive) added a comment - - edited Sergey - What failure message do you get when this test fails? Could you upload any relevant logs?

            Sergey Cheremencev (sergey_cheremencev@xyratex.com) uploaded a new patch: http://review.whamcloud.com/16217
            Subject: LU-7097 tests: add DNE support to conf-sanity_84
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 836e0145ed22bba1b2107523c74f322f630b9588

            gerrit Gerrit Updater added a comment - Sergey Cheremencev (sergey_cheremencev@xyratex.com) uploaded a new patch: http://review.whamcloud.com/16217 Subject: LU-7097 tests: add DNE support to conf-sanity_84 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 836e0145ed22bba1b2107523c74f322f630b9588

            People

              hongchao.zhang Hongchao Zhang
              scherementsev Sergey Cheremencev
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: