[LU-1534] Test failure on test suite lfsck Created: 16/Jun/12 Updated: 27/Sep/12 Resolved: 27/Sep/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.3.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Li Wei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 4586 | ||||||||
| Description |
|
This issue was created by maloo for Li Wei <liwei@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/dc0efbba-b804-11e1-937b-52540035b04c. No logs other than the test output. The test seems to have completed, though the Maloo report contains "TIMEOUT". |
| Comments |
| Comment by Li Wei (Inactive) [ 17/Jun/12 ] |
|
Digging into the conosle log archive revealed the following on the MDS. Although I'm not 100% sure if this really corresponds to the failure above, the timestamp implies so. Lustre: DEBUG MARKER: == lfsck lfsck.sh test complete, duration 101 sec == 13:59:41 (1339880381) LustreError: 28147:0:(osd_internal.h:665:osd_fid2oi()) ASSERTION( !fid_is_igif(fid) ) failed: LustreError: 28147:0:(osd_internal.h:665:osd_fid2oi()) LBUG Pid: 28147, comm: mdt_00 Call Trace: [<ffffffffa043a905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa043af17>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0c3c4a5>] osd_oi_delete+0x2e5/0x470 [osd_ldiskfs] [<ffffffffa0c33933>] osd_object_destroy+0x233/0x420 [osd_ldiskfs] [<ffffffffa0b20e80>] mdd_object_kill+0xb0/0x290 [mdd] [<ffffffffa0b376c9>] mdd_finish_unlink+0x1f9/0x2f0 [mdd] [<ffffffffa0b3d609>] mdd_unlink+0xa09/0xd60 [mdd] [<ffffffffa06e6820>] ? ldlm_completion_ast+0x0/0x730 [ptlrpc] [<ffffffffa0b86a30>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] [<ffffffffa07101d4>] ? lustre_msg_get_versions+0xa4/0x120 [ptlrpc] [<ffffffffa0847027>] cml_unlink+0x97/0x200 [cmm] [<ffffffffa0ba2b2f>] ? mdt_version_get_save+0x8f/0xd0 [mdt] [<ffffffffa0ba47b4>] mdt_reint_unlink+0x634/0x9e0 [mdt] [<ffffffffa0ba1b51>] mdt_reint_rec+0x41/0xe0 [mdt] [<ffffffffa0b9b3aa>] mdt_reint_internal+0x50a/0x810 [mdt] [<ffffffffa0b9b6f4>] mdt_reint+0x44/0xe0 [mdt] [<ffffffffa0b8d2a2>] mdt_handle_common+0x922/0x1740 [mdt] [<ffffffffa0b8e195>] mdt_regular_handle+0x15/0x20 [mdt] [<ffffffffa071d7e2>] ptlrpc_server_handle_request+0x412/0xeb0 [ptlrpc] [<ffffffffa043b65e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa044bd9f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs] [<ffffffffa0716612>] ? ptlrpc_wait_event+0xb2/0x2c0 [ptlrpc] [<ffffffff81051ba3>] ? __wake_up+0x53/0x70 [<ffffffffa071ea57>] ptlrpc_main+0x7d7/0x1610 [ptlrpc] [<ffffffffa071e280>] ? ptlrpc_main+0x0/0x1610 [ptlrpc] [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffffa071e280>] ? ptlrpc_main+0x0/0x1610 [ptlrpc] [<ffffffffa071e280>] ? ptlrpc_main+0x0/0x1610 [ptlrpc] [<ffffffff8100c140>] ? child_rip+0x0/0x20 Kernel panic - not syncing: LBUG Pid: 28147, comm: mdt_00 Not tainted 2.6.32-220.17.1.el6_lustre.x86_64 #1 Call Trace: [<ffffffff814eccea>] ? panic+0x78/0x143 [<ffffffffa043af6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [<ffffffffa0c3c4a5>] ? osd_oi_delete+0x2e5/0x470 [osd_ldiskfs] [<ffffffffa0c33933>] ? osd_object_destroy+0x233/0x420 [osd_ldiskfs] [<ffffffffa0b20e80>] ? mdd_object_kill+0xb0/0x290 [mdd] [<ffffffffa0b376c9>] ? mdd_finish_unlink+0x1f9/0x2f0 [mdd] [<ffffffffa0b3d609>] ? mdd_unlink+0xa09/0xd60 [mdd] [<ffffffffa06e6820>] ? ldlm_completion_ast+0x0/0x730 [ptlrpc] [<ffffffffa0b86a30>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] [<ffffffffa07101d4>] ? lustre_msg_get_versions+0xa4/0x120 [ptlrpc] [<ffffffffa0847027>] ? cml_unlink+0x97/0x200 [cmm] [<ffffffffa0ba2b2f>] ? mdt_version_get_save+0x8f/0xd0 [mdt] [<ffffffffa0ba47b4>] ? mdt_reint_unlink+0x634/0x9e0 [mdt] [<ffffffffa0ba1b51>] ? mdt_reint_rec+0x41/0xe0 [mdt] [<ffffffffa0b9b3aa>] ? mdt_reint_internal+0x50a/0x810 [mdt] [<ffffffffa0b9b6f4>] ? mdt_reint+0x44/0xe0 [mdt] [<ffffffffa0b8d2a2>] ? mdt_handle_common+0x922/0x1740 [mdt] [<ffffffffa0b8e195>] ? mdt_regular_handle+0x15/0x20 [mdt] [<ffffffffa071d7e2>] ? ptlrpc_server_handle_request+0x412/0xeb0 [ptlrpc] [<ffffffffa043b65e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa044bd9f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs] [<ffffffffa0716612>] ? ptlrpc_wait_event+0xb2/0x2c0 [ptlrpc] [<ffffffff81051ba3>] ? __wake_up+0x53/0x70 [<ffffffffa071ea57>] ? ptlrpc_main+0x7d7/0x1610 [ptlrpc] [<ffffffffa071e280>] ? ptlrpc_main+0x0/0x1610 [ptlrpc] [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [<ffffffffa071e280>] ? ptlrpc_main+0x0/0x1610 [ptlrpc] [<ffffffffa071e280>] ? ptlrpc_main+0x0/0x1610 [ptlrpc] [<ffffffff8100c140>] ? child_rip+0x0/0x20 I do not yet know why there was a file with an IGIF FID, but this change in osd_oi_delete() apparently introduced a regression: @@ -586,9 +581,6 @@ int osd_oi_delete(struct osd_thread_info *info,
struct lu_fid *oi_fid = &info->oti_fid;
const struct dt_key *key;
- if (!fid_is_norm(fid))
- return 0;
-
LASSERT(fid_seq(fid) != FID_SEQ_LOCAL_FILE);
if (fid_is_idif(fid) || fid_seq(fid) == FID_SEQ_LLOG)
|
| Comment by Li Wei (Inactive) [ 17/Jun/12 ] |
| Comment by Andreas Dilger [ 20/Jun/12 ] |
|
Patch has been landed to master. I'm not sure if you want to keep this issue open to investigate why an IGIF FID was involved. It would be useful to have the conf-sanity.sh test_32 upgrade test create some files with the pre-upgrade version and then delete them afterward, to catch issues like this. |
| Comment by Li Wei (Inactive) [ 20/Jun/12 ] |
|
Thanks, Andreas. I did some investigation on the origin of the IGIF FID. No conclusion yet, but the recent e2fsprogs refresh looks suspicious. A new ticket is almost definitely needed. However, please keep this open until the new one is filed (as soon as I get back from |
| Comment by Andreas Dilger [ 20/Jun/12 ] |
|
Actually, if this was for lfsck.sh (just noticed this), then it may be OK. The lfsck script is creating files "by hand" on the MDT (locally mounted) so they will not have a FID assigned. Separately, is this (or a similar) fix needed for osd-zfs? |
| Comment by Jodi Levi (Inactive) [ 27/Sep/12 ] |
|
Please reopen ticket if more work is needed. |