[LU-5003] sanity-hsm test_302 llog_write()) ASSERTION( loghandle->lgh_obj != ((void *)0) Created: 03/May/14 Updated: 06/Oct/14 Resolved: 06/Oct/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | James Nunez (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 13852 | ||||||||
| Description |
|
This issue was created by maloo for wangdi <di.wang@intel.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/d021bd8e-d2a0-11e3-a102-52540035b04c. The sub-test test_302 failed with the following error:
Info required for matching: sanity-hsm 302 |
| Comments |
| Comment by Mikhail Pershin [ 05/May/14 ] |
|
I saw the same issue while testing another patch, it happens in test_301. I tend to think that the reason might be commit http://git.whamcloud.com/?p=fs/lustre-release.git;a=commit;h=6760f383f6cd0964e70250723b9d24bce13b5318 |
| Comment by Jian Yu [ 28/May/14 ] |
|
Lustre build: http://build.whamcloud.com/job/lustre-master/2065/ conf-sanity test 57a also hit the same failure: |
| Comment by nasf (Inactive) [ 11/Jul/14 ] |
|
Another failure instance: https://testing.hpdd.intel.com/test_sets/8c42ce96-085b-11e4-8b2d-5254006e85c2 |
| Comment by John Hammond [ 19/Aug/14 ] |
|
Another: https://testing.hpdd.intel.com/test_logs/ef9a8e1e-27e8-11e4-8e75-5254006e85c2 |
| Comment by Alex Zhuravlev [ 17/Sep/14 ] |
|
https://testing.hpdd.intel.com/test_sets/6d49dfa6-3e26-11e4-a0a3-5254006e85c2 |
| Comment by Alexander Zarochentsev [ 17/Sep/14 ] |
|
I have a crash looking exactly the same at replay-vbr test_4k , the llog being processed is a config llog. lgh_hdr = 0xffff88005a8d0000,
lgh_obj = 0x0 <per_cpu__irq_stack_union>,
lgh_last_idx = 249,
lgh_cur_idx = 0,
lgh_cur_offset = 0,
lgh_ctxt = 0xffff88005b433e40,
...
lgh_name = 0x0 <per_cpu__irq_stack_union>,
private_data = 0x0 <per_cpu__irq_stack_union>,
lgh_logops = 0xffffffffa078f260 <llog_client_ops>,
lgh_refcount = {
counter = 1
}
lgh_obj = 0x0 is what caused llog_write() to LBUG and lgh_logops = 0xffffffffa078f260 <llog_client_ops> indicates that the llog_handle is a client one. |
| Comment by Alexander Zarochentsev [ 17/Sep/14 ] |
| Comment by Alexander Zarochentsev [ 18/Sep/14 ] |
|
Seems it is not right to compare error code with -EIO, llog block read functions returns -EINVAL if llog fle format check fail: static int llog_osd_next_block(const struct lu_env *env, struct llog_handle *loghandle, int *cur_idx, int next_idx, __u64 *cur_offset, void *buf, int len) { struct llog_thread_info *lgi = llog_info(env); ... if (rc < sizeof(*tail)) { CERROR("%s: invalid llog block at log id "DOSTID"/%u " "offset "LPU64"\n", o->do_lu.lo_dev->ld_obd->obd_name, POSTID(&loghandle->lgh_id.lgl_oi), loghandle->lgh_id.lgl_ogen, *cur_offset); GOTO(out, rc = -EINVAL); } ... /* this shouldn't happen */ if (tail->lrt_index == 0) { CERROR("%s: invalid llog tail at log id "DOSTID"/%u " "offset "LPU64"\n", o->do_lu.lo_dev->ld_obd->obd_name, POSTID(&loghandle->lgh_id.lgl_oi), loghandle->lgh_id.lgl_ogen, *cur_offset); GOTO(out, rc = -EINVAL); } ... EIO is likely a temporary error (network, disk access, ...), EINVAL – points to a llog corruption. |
| Comment by Peter Jones [ 18/Sep/14 ] |
|
Thanks Zam! James Could you please review this patch? Thanks Peter |
| Comment by Peter Jones [ 06/Oct/14 ] |
|
Landed for 2.7 |