[LU-6147] sanity-lfsck test_4: '(7) unexpected status' Created: 22/Jan/15 Updated: 15/Jul/15 Resolved: 03/Feb/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 17167 | ||||
| Description |
|
This issue was created by maloo for John Hammond <john.hammond@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/7626441a-a14b-11e4-807a-5254006e85c2. The sub-test test_4 failed with the following error: (7) unexpected status Please provide additional information about the failure here. Info required for matching: sanity-lfsck 4 |
| Comments |
| Comment by John Hammond [ 22/Jan/15 ] |
|
When I runn this locally I see an lfsck thread asleep in lu_object_find_at(): [ 515.234216] Lustre: DEBUG MARKER: == sanity-lfsck test 4: FID-in-dirent can be rebuilt after MDT file-level backup/restore == 10:23:00 (1421943780) [ 521.780085] Lustre: Failing over lustre-MDT0000 [ 524.984187] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 192.168.122.201@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. [ 524.987971] LustreError: Skipped 13 previous similar messages [ 527.956175] Lustre: 5093:0:(client.c:1942:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1421943787/real 1421943787] req@ffff8 80064738188 x1491015708973216/t0(0) o251->MGC192.168.122.200@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1421943793 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [ 527.961754] Lustre: 5093:0:(client.c:1942:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [ 528.510534] Lustre: server umount lustre-MDT0000 complete [ 529.308146] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=on. Opts: [ 531.180857] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=on. Opts: [ 531.902794] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=on. Opts: [ 537.656353] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=on. Opts: [ 539.324451] Lustre: *** cfs_fail_loc=1601, val=0*** [ 547.009622] Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 3 clients reconnect [ 547.011917] Lustre: lustre-MDT0000: Denying connection for new client lustre-MDT0000-lwp-MDT0001_UUID (at 192.168.122.201@tcp), waiting for all 3 known clients (0 recovered, 0 in progress, and 0 evicted) to recover in 1:06 [ 547.128862] Lustre: lustre-MDT0000: Recovery over after 0:01, of 3 clients 3 recovered and 0 were evicted. [ 578.733248] Lustre: DEBUG MARKER: sanity-lfsck test_4: @@@@@@ FAIL: (7) unexpected status [ 724.197393] INFO: task lfsck:5461 blocked for more than 120 seconds. [ 724.198800] Not tainted 2.6.32-431.29.2.el6.lustre.x86_64 #1 [ 724.200136] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 724.201748] lfsck D 0000000000000000 0 5461 2 0x00000080 [ 724.203226] ffff880067f37c00 0000000000000046 0000000000000000 ffff880065abe188 [ 724.204927] ffff880067f37b70 ffffffffa03183fe ffff880067f37c00 ffffffffa04a5cd9 [ 724.206596] ffff880062298cc0 ffff880067f37fd8 000000000000fcc8 ffff880062298cc0 [ 724.208431] Call Trace: [ 724.208981] [<ffffffffa03183fe>] ? cfs_hash_spin_unlock+0xe/0x10 [libcfs] [ 724.210466] [<ffffffffa04a5cd9>] ? lu_object_find_try+0x99/0x2b0 [obdclass] [ 724.211890] [<ffffffffa04a5f2d>] lu_object_find_at+0x3d/0xe0 [obdclass] [ 724.212765] [<ffffffff8155625b>] ? _read_unlock+0x2b/0x40 [ 724.213476] [<ffffffff81061d90>] ? default_wake_function+0x0/0x20 [ 724.214280] [<ffffffffa04a600f>] lu_object_find_slice+0x1f/0x80 [obdclass] [ 724.215198] [<ffffffffa09e0838>] lfsck_master_oit_engine+0x608/0x1f40 [lfsck] [ 724.216154] [<ffffffff810b74bd>] ? trace_hardirqs_on+0xd/0x10 [ 724.216999] [<ffffffffa09e2da1>] lfsck_master_engine+0xc31/0x14b0 [lfsck] [ 724.217933] [<ffffffff81553065>] ? thread_return+0x4e/0x7e9 [ 724.218670] [<ffffffff81061d90>] ? default_wake_function+0x0/0x20 [ 724.219474] [<ffffffffa09e2170>] ? lfsck_master_engine+0x0/0x14b0 [lfsck] [ 724.220382] [<ffffffff8109e856>] kthread+0x96/0xa0 [ 724.220999] [<ffffffff8100c30a>] child_rip+0xa/0x20 [ 724.221665] [<ffffffff815562e0>] ? _spin_unlock_irq+0x30/0x40 [ 724.222430] [<ffffffff8100bb10>] ? restore_args+0x0/0x30 [ 724.223110] [<ffffffff8109e7c0>] ? kthread+0x0/0xa0 [ 724.223796] [<ffffffff8100c300>] ? child_rip+0x0/0x20 |
| Comment by nasf (Inactive) [ 22/Jan/15 ] |
|
It is another failure instance of |
| Comment by nasf (Inactive) [ 22/Jan/15 ] |
|
John, would you please to verify whether the patch http://review.whamcloud.com/#/c/13493/ works for you or not? Thanks! |
| Comment by John Hammond [ 22/Jan/15 ] |
|
Yes. It works for me. |
| Comment by nasf (Inactive) [ 23/Jan/15 ] |
|
Thanks John. |
| Comment by nasf (Inactive) [ 23/Jan/15 ] |
|
The |
| Comment by nasf (Inactive) [ 23/Jan/15 ] |
|
Here is the patch: http://review.whamcloud.com/#/c/13493/ |
| Comment by nasf (Inactive) [ 02/Feb/15 ] |
|
Many sanity-lfsck test_4 failed on Maloo recently: So we need to land this patch ASAP to avoid blocking others. |
| Comment by Jian Yu [ 02/Feb/15 ] |
|
This is blocking patch review testing on master branch: |
| Comment by Gerrit Updater [ 03/Feb/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13493/ |
| Comment by Peter Jones [ 03/Feb/15 ] |
|
Landed for 2.7 |
| Comment by Gerrit Updater [ 23/Feb/15 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13840 |
| Comment by Bruno Faccini (Inactive) [ 27/Feb/15 ] |
|
+1 at https://testing.hpdd.intel.com/test_sets/6e9169d6-a179-11e4-9db6-5254006e85c2 |