[LU-7344] sanity test_154g test30 fail on cleanup: FAIL: test_154g failed with 1 Created: 27/Oct/15 Updated: 11/Apr/17 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0, Lustre 2.10.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
autotest |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
sanity test 154g subtest 30 fails on removing links the test created. Logs are at https://testing.hpdd.intel.com/test_sets/9608c94e-7c22-11e5-9851-5254006e85c2 From the test_log: Finishing test test30 at 1445869186 rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0330': Input/output error rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0329': Cannot send after transport endpoint shutdown rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0254': Cannot send after transport endpoint shutdown rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0678': Cannot send after transport endpoint shutdown rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0986': Cannot send after transport endpoint shutdown rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0309': Cannot send after transport endpoint shutdown rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0608': Cannot send after transport endpoint shutdown rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0286': Cannot send after transport endpoint shutdown rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0479': Cannot send after transport endpoint shutdown rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0231': Cannot send after transport endpoint shutdown rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0798': Cannot send after transport endpoint shutdown rm: cannot remove `/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766/link0824': Cannot send after transport endpoint shutdown llapi_fid_test: llapi_fid_test.c:98: cleanup: assertion 'WEXITSTATUS(rc) == 0' failed: rm command returned 1 sanity test_154g: @@@@@@ FAIL: test_154g failed with 1 From the client console logs, the client is having connection problems: 14:19:55:LustreError: 11-0: lustre-MDT0000-mdc-ffff880077e11c00: operation ldlm_enqueue to node 10.1.5.239@tcp failed: rc = -107 14:19:55:Lustre: lustre-MDT0000-mdc-ffff880077e11c00: Connection to lustre-MDT0000 (at 10.1.5.239@tcp) was lost; in progress operations using this service will wait for recovery to complete 14:19:55:LustreError: 167-0: lustre-MDT0000-mdc-ffff880077e11c00: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. 14:19:55:LustreError: 23082:0:(mdc_locks.c:1176:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -5 14:19:55:LustreError: 23082:0:(mdc_locks.c:1176:mdc_intent_getattr_async_interpret()) Skipped 4 previous similar messages 14:19:55:Lustre: lustre-MDT0000-mdc-ffff880077e11c00: Connection restored to 10.1.5.239@tcp (at 10.1.5.239@tcp) 14:19:55:Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_154g: @@@@@@ FAIL: test_154g failed with 1 14:19:55:Lustre: DEBUG MARKER: sanity test_154g: @@@@@@ FAIL: test_154g failed with 1 We’ve seen this failure a couple of times this month. Logs are at https://testing.hpdd.intel.com/test_sets/8b07cd46-70a2-11e5-9bcc-5254006e85c2 and 09:23:05:LustreError: 11-0: lustre-MDT0000-mdc-ffff88007daeb800: operation ldlm_enqueue to node 10.1.4.105@tcp failed: rc = -107 09:23:05:Lustre: lustre-MDT0000-mdc-ffff88007daeb800: Connection to lustre-MDT0000 (at 10.1.4.105@tcp) was lost; in progress operations using this service will wait for recovery to complete 09:23:05:LustreError: 167-0: lustre-MDT0000-mdc-ffff88007daeb800: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. 09:23:05:LustreError: 23311:0:(mdc_locks.c:1176:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -5 09:23:05:LustreError: 12432:0:(ldlm_resource.c:887:ldlm_resource_complain()) lustre-MDT0000-mdc-ffff88007daeb800: namespace resource [0x200004282:0x82c:0x0].0x0 (ffff88007c0a72c0) refcount nonzero (1) after lock cleanup; forcing cleanup. 09:23:05:LustreError: 12432:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x200004282:0x82c:0x0].0x0 (ffff88007c0a72c0) refcount = 2 09:23:05:Lustre: lustre-MDT0000-mdc-ffff88007daeb800: Connection restored to 10.1.4.105@tcp (at 10.1.4.105@tcp) 09:23:05:Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_154g: @@@@@@ FAIL: test_154g failed with 1 |
| Comments |
| Comment by Saurabh Tandan (Inactive) [ 24/Dec/15 ] |
|
Another instance found for the following config: |