[LU-1542] Failure on sanity.sh, subtest test_132 Created: 19/Jun/12 Updated: 03/Sep/13 Resolved: 03/Sep/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 4076 |
| Description |
|
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/dd62db8a-b9da-11e1-86c2-52540035b04c. The sub-test test_132 failed with the following error:
This may relate to the startup issue in Info required for matching: sanity 132 |
| Comments |
| Comment by Ian Colle (Inactive) [ 28/Jun/12 ] |
|
23:48:19:Lustre: MGS has stopped. |
| Comment by Ian Colle (Inactive) [ 28/Jun/12 ] |
|
https://maloo.whamcloud.com/test_sets/747c424e-c166-11e1-9055-52540035b04c From Client Console |
| Comment by Li Wei (Inactive) [ 31/Jul/12 ] |
|
https://maloo.whamcloud.com/test_sets/fb5f0dea-daf8-11e1-9ebb-52540035b04c |
| Comment by Ian Colle (Inactive) [ 13/Aug/12 ] |
|
https://maloo.whamcloud.com/test_sets/53d27ff8-e561-11e1-ae4e-52540035b04c |
| Comment by Keith Mannthey (Inactive) [ 06/Feb/13 ] |
|
https://maloo.whamcloud.com/test_sessions/13798a36-6f5a-11e2-93c1-52540035b04c Well this may not be 100$ this is the same issue but it an assertion failure in the same spot that causes the MDS to reboot while the test_132 times out. The logs tell me 4/100 failures Feb06. 14:06:17:LustreError: 11-0: lustre-OST0004-osc-MDT0000: Communicating with 10.10.4.195@tcp, operation ost_connect failed with -19. 14:06:18:Lustre: DEBUG MARKER: lctl get_param -n timeout 14:06:19:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20 14:06:19:Lustre: DEBUG MARKER: Using TIMEOUT=20 14:06:19:Lustre: DEBUG MARKER: lctl dl | grep ' IN osc ' 2>/dev/null | wc -l 14:06:19:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.sys.jobid_var=procname_uid 14:07:12:Lustre: MGS: haven't heard from client 2b7f8516-fc0a-afb9-790c-1965aaaa46c2 (at 10.10.4.197@tcp) in 50 seconds. I think it's dead, and I am evicting it. exp ffff880078f2e800, cur 1360015629 expire 1360015599 last 1360015579 14:07:23:Lustre: lustre-MDT0000: haven't heard from client 5fcc94dc-d9c0-7c5c-7665-6b8afe791bb0 (at 10.10.4.197@tcp) in 50 seconds. I think it's dead, and I am evicting it. exp ffff88007832ec00, cur 1360015634 expire 1360015604 last 1360015584 14:07:23:LustreError: 17820:0:(lu_object.c:1982:lu_ucred_assert()) ASSERTION( uc != ((void *)0) ) failed: 14:07:23:LustreError: 17820:0:(lu_object.c:1982:lu_ucred_assert()) LBUG 14:07:23:Pid: 17820, comm: ll_evictor 14:07:23: 14:07:23:Call Trace: 14:07:23: [<ffffffffa04d7895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 14:07:23: [<ffffffffa04d7e97>] lbug_with_loc+0x47/0xb0 [libcfs] 14:07:23: [<ffffffffa0664755>] lu_ucred_assert+0x45/0x50 [obdclass] 14:07:23: [<ffffffffa0c52c66>] mdd_xattr_sanity_check+0x36/0x1f0 [mdd] 14:07:23: [<ffffffffa0c58221>] mdd_xattr_del+0xf1/0x540 [mdd] 14:07:23: [<ffffffffa0e3fe0a>] mdt_som_attr_set+0xfa/0x390 [mdt] 14:07:23: [<ffffffffa0e401ec>] mdt_ioepoch_close_on_eviction+0x14c/0x170 [mdt] 14:07:23: [<ffffffffa0f100c9>] ? osp_key_init+0x59/0x1a0 [osp] 14:07:23: [<ffffffffa0e40c4b>] mdt_ioepoch_close+0x2ab/0x3b0 [mdt] 14:07:23: [<ffffffffa0e411fe>] mdt_mfd_close+0x4ae/0x6e0 [mdt] 14:07:23: [<ffffffffa0e1297e>] mdt_obd_disconnect+0x3ae/0x4d0 [mdt] 14:07:23: [<ffffffffa061cd78>] class_fail_export+0x248/0x580 [obdclass] 14:07:23: [<ffffffffa07f9079>] ping_evictor_main+0x249/0x640 [ptlrpc] 14:07:23: [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20 14:07:23: [<ffffffffa07f8e30>] ? ping_evictor_main+0x0/0x640 [ptlrpc] 14:07:23: [<ffffffff8100c0ca>] child_rip+0xa/0x20 14:07:23: [<ffffffffa07f8e30>] ? ping_evictor_main+0x0/0x640 [ptlrpc] 14:07:23: [<ffffffffa07f8e30>] ? ping_evictor_main+0x0/0x640 [ptlrpc] 14:07:23: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 14:07:23: 14:07:23:Kernel panic - not syncing: LBUG ..... |
| Comment by Keith Mannthey (Inactive) [ 06/Feb/13 ] |
|
Is seem the above may be caused by the patch being tested. http://review.whamcloud.com/5222 |
| Comment by Andreas Dilger [ 03/Sep/13 ] |
|
Closing this old Orion bug for now. I don't think the last comments were related to this problem. |