[LU-414] error looking up logfile Created: 14/Jun/11 Updated: 11/Oct/17 Resolved: 11/Oct/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Christopher Morrone | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
CHAOS4.4 (RHEL5.4), lustre 1.8.5.0-5chaos |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 10221 |
| Description |
|
Our admins tried to add 8 new OSS nodes to an existing lustre server cluster running 1.8.5.0-5chaos. There were 16 exiting OSS with 15 OSTs each, for a total of 240 old OSTs. There are also 15 OSTs on each of the new OSS, for a total of 120 new OSTs. When the new OSTs were brought up, it looks like at least 54 of the OSTs failed to be configured correctly on the MDS, and are stuck in the IN (inactive) state according to "lctl dl". I don't see a pattern to which OSTs one which new OSS failed. This looks similar to bug 22658 that we have seen in the past: 2011-06-14 11:57:17 LustreError: 1432:0:(llog_lvfs.c:612:llog_lvfs_create()) error looking up logfile 0x10612404:0x0: rc -2 2011-06-14 11:57:17 LustreError: 1432:0:(llog_obd.c:200:llog_setup()) obd lsd-OST012e-osc ctxt 2 lop_setup=ffffffff885b3dc0 failed -2 2011-06-14 11:57:17 LustreError: 1432:0:(osc_request.c:4242:osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT 2011-06-14 11:57:17 LustreError: 1432:0:(osc_request.c:4258:osc_llog_init()) osc 'lsd-OST012e-osc' tgt 'lsd-MDT0000' rc=-2 2011-06-14 11:57:17 LustreError: 1432:0:(osc_request.c:4260:osc_llog_init()) logid 0x10612404:0x0 2011-06-14 11:57:17 LustreError: 1432:0:(lov_log.c:253:lov_llog_init()) error osc_llog_init idx 302 osc 'lsd-OST012e-osc' tgt 'lsd-MDT0000' (rc=-2) 2011-06-14 11:57:17 LustreError: 1432:0:(llog_lvfs.c:612:llog_lvfs_create()) error looking up logfile 0x62800000028:0x10612404: rc -2 2011-06-14 11:57:17 LustreError: 1432:0:(llog_obd.c:200:llog_setup()) obd lsd-OST0130-osc ctxt 2 lop_setup=ffffffff885b3dc0 failed -2 2011-06-14 11:57:17 LustreError: 1444:0:(lov_log.c:161:lov_llog_origin_connect()) error osc_llog_connect tgt 302 (-107) 2011-06-14 11:57:17 LustreError: 1444:0:(mds_lov.c:1044:__mds_lov_synchronize()) lsd-MDT0000: lsd-OST012e_UUID failed at llog_origin_connect: -107 2011-06-14 11:57:17 Lustre: lsd-OST012e_UUID: Sync failed deactivating: rc -107 The admins decided to reboot the MDS, the MDS is still unable to activate those OSTs (at least, I assume that it is the same set of OSTs): 2011-06-14 12:49:32 LustreError: 9611:0:(lov_log.c:161:lov_llog_origin_connect()) error osc_llog_connect tgt 258 (-107) 2011-06-14 12:49:32 LustreError: 9611:0:(mds_lov.c:1044:__mds_lov_synchronize()) lsd-MDT0000: lsd-OST0102_UUID failed at llog_origin_connect: -107 2011-06-14 12:49:32 Lustre: lsd-OST0102_UUID: Sync failed deactivating: rc -107 2011-06-14 12:49:32 LustreError: 9612:0:(lov_log.c:161:lov_llog_origin_connect()) error osc_llog_connect tgt 259 (-107) 2011-06-14 12:49:32 LustreError: 9646:0:(mds_lov.c:1044:__mds_lov_synchronize()) lsd-MDT0000: lsd-OST0125_UUID failed at llog_origin_connect: -107 2011-06-14 12:49:32 LustreError: 9646:0:(mds_lov.c:1044:__mds_lov_synchronize()) Skipped 20 previous similar messages 2011-06-14 12:49:32 Lustre: lsd-OST0125_UUID: Sync failed deactivating: rc -107 Notice there is no warning about "error looking up logfile", but the lov_llog_origin_connect() is still failing. I suspect that lov_llog_origin_connect() is getting error code 107, ENOTCONN, from llog_obd2ops(), meaning that the llog_ctxt *ctxt is NULL. I say that, because watching the logs, I see an RPC between the MDS and OSS nodes complete successfully, but I can't see an RPC being sent after the "lov_llog_origin_connect()) connect 256/360" lines in the log. It appears that at the ptlrpc level, the mdt and ost are in fact fully connected. The import/export appear to be set up. I am beginning to suspect that the "fix" for bug 22658 that allows the mds to start up when there are missing log files just lets the server get stuck at this next point in the code. Also, I think there is pretty clearly some bug in Lustre's initial creation of ost llog files on the mds. I am attaching the mds console log for now. I can package up some more detailed lustre logs tomorrow. |
| Comments |
| Comment by Peter Jones [ 15/Jun/11 ] |
|
Hongchao Could you please look at this one? Thanks Peter |
| Comment by Hongchao Zhang [ 15/Jun/11 ] |
|
this error is a little different from the 22658, and there is still two "error looking up logfile" after the MDS reboot in your attached console log, Are the same count of OST stuck in the IN (inactive) state after you reboot the MDS? for the llog, I'm afraid whether there is a problem which some uninitialized data is read in the MDS's CATLOG? |
| Comment by Christopher Morrone [ 15/Jun/11 ] |
|
Yes, as far as I can tell they are all larger than existing ones. OSTs 0000-00ef are on the original 16 nodes, and 00f0-0167 are on the new 8 nodes. I do believe that it was the same inactive OSTs after reboot. But the logs squash some messages, making it a little hard to verify 100%. The sysadmin is pretty certain that it was the same count of OSTs that was inactive before and after reboot, and some spot checking of the logs verifies at least those OST messages that are not squashed are the same. |
| Comment by Christopher Morrone [ 15/Jun/11 ] |
|
See the new attachment. I translated the MDS's CATALOGS file into a readable form, and it is pretty clear which entries are incorrect. Any that look like "10612404:0" or "64000000028:1580334" are bad. They have no matching file in the OBJECTS directory, and they correspond with the OSTs that are stuck in the inactive state. |
| Comment by Hongchao Zhang [ 15/Jun/11 ] |
|
yes, the MDS's CATALOGS is corrupted! and I suspect it is caused by the disorderly call of mds_lov_update_desc for the |
| Comment by Hongchao Zhang [ 16/Jun/11 ] |
|
Hi Chris, btw, I have written the following patch and will push it to Gerrit after testing it locally, diff --git a/lustre/obdclass/llog_lvfs.c b/lustre/obdclass/llog_lvfs.c + filesize = i_size_read(file->f_dentry->d_inode); + |
| Comment by Christopher Morrone [ 16/Jun/11 ] |
|
Yes, I'll zero the records and see if it will work. We have a downtime scheduled for Tuesday, June 21, so I will try it then. |
| Comment by Hongchao Zhang [ 20/Jun/11 ] |
|
the rhel5 kernel version for b1_8 has been updated recently(update to 2.6.18-238.12.1), and some time is spent on |
| Comment by Hongchao Zhang [ 21/Jun/11 ] |
|
the patch is at http://review.whamcloud.com/#change,987 |
| Comment by Christopher Morrone [ 21/Jun/11 ] |
|
Zeroing the CATALOGS file got everything working this morning. |
| Comment by Hongchao Zhang [ 08/Aug/11 ] |
|
as per the code line, the newly allocated block in the CATLOG file is initialized with zero in the current b1_8 and master, Hi Chris, |
| Comment by Christopher Morrone [ 08/Aug/11 ] |
|
https://github.com/chaos/lustre/tree/1.8.5.0-5chaos I will need to check the block size. |
| Comment by Hongchao Zhang [ 10/Aug/11 ] |
|
in osc_llog_init, "llog_get_cat_list" and "llog_put_cat_list" is protected by the mutex "obd_llog_cat_process" of MDT, Hi Chris, |
| Comment by Christopher Morrone [ 15/Aug/11 ] |
|
No, it is CHAOS4.4, so some changes from RHEL5.4. The MDS is using a 3ware card and the array is configured as RAID10. Stripe size is 64k. |
| Comment by Hongchao Zhang [ 23/Aug/11 ] |
|
Hi Chris, could you please tell me where can I get the CHAOS4.4? Thanks! |
| Comment by Hongchao Zhang [ 23/Aug/11 ] |
|
I have tested the 1.8.5.0-5chaos against 2.6.18-194.17.1, it used 2 OSS(one has 252 OSTs, the other has 50 OSTs) and FYI, I have encountered an error "EXTRA_DIST: variable 'bin_SCRIPTS' is used but 'bin_SCRIPTS' is undefined" during |
| Comment by Christopher Morrone [ 26/Aug/11 ] |
|
ftp://gdo-lc.ucllnl.org/pub/projects/chaos/ Can you tell me more about your test? Why did you have the OST count so imbalanced? We saw this with only 15 OSTs on each of 8 OSS, so I wouldn't think such a high count would be necessary. Did you statically allocate the index numbers to the osts, and alternate them between nodes? Did you issue the mount commands in parallel? |
| Comment by Hongchao Zhang [ 30/Aug/11 ] |
|
the test is run on my two virtual nodes using loop device, one node runs MDT+(252 OSTs), the other runs 50 OSTs the reason of testing it in this case is the operations of CATALOGS "llog_get_cat_list" and "llog_put_cat_list", |
| Comment by Christopher Morrone [ 30/Aug/11 ] |
|
Ah, I think thats part of your problem. We DO specify our index values. If you aren't doing that, then the index numbers will just be sequentially allocated as the OSTs connect the first time. And if you are not mounting the first time in parallel, then there is no chance to reproduce the issue we are seeing. |
| Comment by Hongchao Zhang [ 02/Sep/11 ] |
|
I specify the index of the OSTs and mount them not sequentially, the problem still not occurs. and I can't mount these OSTs |
| Comment by John Fuchs-Chesney (Inactive) [ 04/Mar/14 ] |
|
Chris – I'm doing some cleanup work on JIRA issues. |
| Comment by Christopher Morrone [ 04/Mar/14 ] |
|
I think it is very likely that this is still a problem. We seem to hit it in every version of Lustre. Our admins now have a permanent rule of manually adding OSTs sequentially to Lustre because this code is so broken. That is not acceptable. It turns an operation that should take under a minute into one that takes well over an hour. |
| Comment by Hongchao Zhang [ 18/Jul/14 ] |
|
As of b2_4 (starting to use LOD/OSP instead of LOV/OSC at MDT), this issue was fixed in "osp_sync_llog_init", it will recreate static int osp_sync_llog_init(const struct lu_env *env, struct osp_device *d) { ... ctxt = llog_get_context(obd, LLOG_MDS_OST_ORIG_CTXT); LASSERT(ctxt); if (likely(logid_id(&osi->osi_cid.lci_logid) != 0)) { rc = llog_open(env, ctxt, &lgh, &osi->osi_cid.lci_logid, NULL, LLOG_OPEN_EXISTS); /* re-create llog if it is missing */ if (rc == -ENOENT) logid_set_id(&osi->osi_cid.lci_logid, 0); else if (rc < 0) GOTO(out_cleanup, rc); } if (unlikely(logid_id(&osi->osi_cid.lci_logid) == 0)) { rc = llog_open_create(env, ctxt, &lgh, NULL, NULL); if (rc < 0) GOTO(out_cleanup, rc); osi->osi_cid.lci_logid = lgh->lgh_id; } LASSERT(lgh != NULL); ctxt->loc_handle = lgh; rc = llog_cat_init_and_process(env, lgh); if (rc) GOTO(out_close, rc); rc = llog_osd_put_cat_list(env, d->opd_storage, d->opd_index, 1, &osi->osi_cid); ... } |