[LU-3294] osp_sync_llog_init(): ASSERTION( lgh != ((void *)0) ) failed Created: 08/May/13 Updated: 11/Jun/13 Resolved: 11/Jun/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Li Wei (Inactive) | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | endianness, sparc | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 8157 | ||||||||
| Description |
|
When starting an MDT on a SPARC MDS, this assertion failure occurred: Lustre: lustre-MDT0000: used disk, loading Lustre: lustre-OST0000-osc-MDT0000: Init llog for 0 - catid 0x2:0:0 LustreError: 11309:0:(osp_sync.c:963:osp_sync_llog_init()) ASSERTION( lgh != ((void *)0) ) failed: LustreError: 11309:0:(osp_sync.c:963:osp_sync_llog_init()) LBUG Pid: 11309, comm: llog_process_th Call Trace: Kernel panic - not syncing: LBUG Call Trace: [0000000010181194] lbug_with_loc+0x94/0xc0 [libcfs] [0000000010dc28c8] osp_sync_llog_init+0xa28/0xc00 [osp] [0000000010dc6d78] osp_sync_init+0x1f8/0xbe0 [osp] [0000000010daf51c] osp_device_alloc+0x4d7c/0x5c40 [osp] [000000001033a500] class_setup+0x6e0/0xf00 [obdclass] [000000001033da58] class_process_config+0x1738/0x5180 [obdclass] [...] According to the "catid" printed, I guess the FID of the log must be [1:2:0]. The problem is in the definition of oat_id: struct ost_id {
union {
struct ostid {
__u64 oi_id;
__u64 oi_seq;
} oi;
struct lu_fid oi_fid;
};
};
When fid_to_logid() assigns a 64-bit sequence number to oi_seq, which 32 bits go to f_oid and f_ver really depends on the endianness of the MDS. On the SPARC MDS, the FID_SEQ_LLOG goes to f_ver, causing oatid_id() to return 0, while the log ID as a whole is nonzero. This combined caused osp_sync_llog_init() to neither open nor re-create the log. |
| Comments |
| Comment by John Hammond [ 08/May/13 ] |
|
After 725f3f8e it looks like we should use logid_id(&osi->osi_cid.lci_logid) instead of ostid_id(&osi->osi_cid.lci_logid.lgl_oi) in osp_sync_llog_init() and similarly elsewhere. Di, can you comment? |
| Comment by Di Wang [ 09/May/13 ] |
|
John, yes, you are right. it should logid_id(), instead of ostid_id. Sigh, I had thought all of it has been revert to logid_id in |
| Comment by John Hammond [ 10/May/13 ] |
|
The proposed change for ops_syn_llog_init() has been rolled into http://review.whamcloud.com/#change,6305. However, there are still some spots where ostid_id() is being applied to lgl_oi, and similar combinations. This only affects big-endian servers and so is not an issue for common setups (including LLNL's x86_64 servers with ppc64 clients). If I understand this code correctly it may be useful to add some assertions/trace that: In logid_id() the seq is FID_SEQ_LLOG or FID_SEQ_LLOG_NAME. Same for logid_set_id(). In ostid_id() the seq is not FID_SEQ_LLOG or FID_SEQ_LLOG_NAME. Same for ostid_set_id(). A double swab of a ost_id/llog_logid for various valid seqs and oids (big and small) is the identity. This may require fixing up POSTID, DOSTID, ... or replacing them with PLOGID(), DLOGID(), ... |
| Comment by John Hammond [ 13/May/13 ] |
|
The proposed change to osp_sync_llog_init() was landed to master as part of http://review.whamcloud.com/6305. |
| Comment by John Hammond [ 11/Jun/13 ] |
|
Patch landed. |