Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7344

sanity test_154g test30 fail on cleanup: FAIL: test_154g failed with 1

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.8.0, Lustre 2.10.0
    • None
    • autotest
    • 3
    • 9223372036854775807

    Description

      sanity test 154g subtest 30 fails on removing links the test created. Logs are at https://testing.hpdd.intel.com/test_sets/9608c94e-7c22-11e5-9851-5254006e85c2

      From the test_log:

      Finishing test test30 at 1445869186
      rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0330': Input/output error
      rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0329': Cannot send after transport endpoint shutdown
      rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0254': Cannot send after transport endpoint shutdown
      rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0678': Cannot send after transport endpoint shutdown
      rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0986': Cannot send after transport endpoint shutdown
      rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0309': Cannot send after transport endpoint shutdown
      rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0608': Cannot send after transport endpoint shutdown
      rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0286': Cannot send after transport endpoint shutdown
      rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0479': Cannot send after transport endpoint shutdown
      rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0231': Cannot send after transport endpoint shutdown
      rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0798': Cannot send after transport endpoint shutdown
      rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0824': Cannot send after transport endpoint shutdown
      llapi_fid_test: llapi_fid_test.c:98: cleanup: assertion 'WEXITSTATUS(rc) == 0' failed: rm command returned 1
       sanity test_154g: @@@@@@ FAIL: test_154g failed with 1 
      

      From the client console logs, the client is having connection problems:

      14:19:55:LustreError: 11-0: lustre-MDT0000-mdc-ffff880077e11c00: operation ldlm_enqueue to node 10.1.5.239@tcp failed: rc = -107
      14:19:55:Lustre: lustre-MDT0000-mdc-ffff880077e11c00: Connection to lustre-MDT0000 (at 10.1.5.239@tcp) was lost; in progress operations using this service will wait for recovery to complete
      14:19:55:LustreError: 167-0: lustre-MDT0000-mdc-ffff880077e11c00: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      14:19:55:LustreError: 23082:0:(mdc_locks.c:1176:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -5
      14:19:55:LustreError: 23082:0:(mdc_locks.c:1176:mdc_intent_getattr_async_interpret()) Skipped 4 previous similar messages
      14:19:55:Lustre: lustre-MDT0000-mdc-ffff880077e11c00: Connection restored to 10.1.5.239@tcp (at 10.1.5.239@tcp)
      14:19:55:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_154g: @@@@@@ FAIL: test_154g failed with 1 
      14:19:55:Lustre: DEBUG MARKER: sanity test_154g: @@@@@@ FAIL: test_154g failed with 1
      

      We’ve seen this failure a couple of times this month. Logs are at https://testing.hpdd.intel.com/test_sets/8b07cd46-70a2-11e5-9bcc-5254006e85c2 and
      https://testing.hpdd.intel.com/test_sets/957630d2-75a8-11e5-bac5-5254006e85c2. In the last client console log, we see an addition error message about nonzero refcount:

      09:23:05:LustreError: 11-0: lustre-MDT0000-mdc-ffff88007daeb800: operation ldlm_enqueue to node 10.1.4.105@tcp failed: rc = -107
      09:23:05:Lustre: lustre-MDT0000-mdc-ffff88007daeb800: Connection to lustre-MDT0000 (at 10.1.4.105@tcp) was lost; in progress operations using this service will wait for recovery to complete
      09:23:05:LustreError: 167-0: lustre-MDT0000-mdc-ffff88007daeb800: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      09:23:05:LustreError: 23311:0:(mdc_locks.c:1176:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -5
      09:23:05:LustreError: 12432:0:(ldlm_resource.c:887:ldlm_resource_complain()) lustre-MDT0000-mdc-ffff88007daeb800: namespace resource [0x200004282:0x82c:0x0].0x0 (ffff88007c0a72c0) refcount nonzero (1) after lock cleanup; forcing cleanup.
      09:23:05:LustreError: 12432:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x200004282:0x82c:0x0].0x0 (ffff88007c0a72c0) refcount = 2
      09:23:05:Lustre: lustre-MDT0000-mdc-ffff88007daeb800: Connection restored to 10.1.4.105@tcp (at 10.1.4.105@tcp)
      09:23:05:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_154g: @@@@@@ FAIL: test_154g failed with 1 
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: