Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.2.0, Lustre 2.3.0
-
None
-
Lustre Tag: v2_2_0_0_RC1
Lustre Build: https://build.whamcloud.com/job/lustre-b2_2/11
Distro/Arch: RHEL6.2/x86_64 (kernel version: 2.6.32-220.4.2.el6)
OSSCOUNT=2
OSTCOUNT=2000 (with 1000 OSTs per OSS)
NETTYPE=o2ib
ENABLE_QUOTA=yes
-
3
-
10883
Description
While running ost-pools test 5 with 2000 OSTs, after adding 2000 OSTs to one OST pool and then removing the OSTs from the pool, the test failed as follows:
<~snip~> client-19-ib: Warning, OST lustre-OST041f_UUID still found in pool lustre.testpool client-19-ib: Warning, OST lustre-OST0420_UUID still found in pool lustre.testpool <~snip~>
Console log on the combined MGS/MDS showed that:
LustreError: 16312:0:(mgs_llog.c:752:record_lcfg()) failed -28
LustreError: 16340:0:(mgs_llog.c:752:record_lcfg()) failed -28
LustreError: 16340:0:(mgs_llog.c:788:record_base()) error -28: lcfg lustre-MDT0000-mdtlov 0xce022 lustre testpool lustre-OST041f_UUID (null)
LustreError: 16340:0:(mgs_llog.c:788:record_base()) error -28: lcfg lustre-clilov 0xce022 lustre testpool lustre-OST041f_UUID (null)
LustreError: 16369:0:(mgs_llog.c:752:record_lcfg()) failed -28
LustreError: 16369:0:(mgs_llog.c:752:record_lcfg()) Skipped 5 previous similar messages
LustreError: 16369:0:(mgs_llog.c:788:record_base()) error -28: lcfg lustre-MDT0000-mdtlov 0xce022 lustre testpool lustre-OST0420_UUID (null)
Maloo report: https://maloo.whamcloud.com/test_sets/a610c4b2-71cd-11e1-9716-5254004bbbd3
By running llog_reader on CONFIGS/lustre-MDT0000 file on the MGS/MDS node, I found there were 63293 records in that file and 1474 bits were not set. The last several records are:
#64763 (224)marker 2043193 (flags=0x01, v2.2.0.0) lustre-MDT0000-mdtlov 'rem lustre.testpool.lustre-OST041d_UUID' Mon Mar 19 03:01:52 2012-
#64764 (144)pool remove 0:lustre-MDT0000-mdtlov 1:lustre 2:testpool 3:lustre-OST041d_UUID
#64765 (224)marker 2043193 (flags=0x02, v2.2.0.0) lustre-MDT0000-mdtlov 'rem lustre.testpool.lustre-OST041d_UUID' Mon Mar 19 03:01:52 2012-
#64766 (224)marker 2043195 (flags=0x01, v2.2.0.0) lustre-MDT0000-mdtlov 'rem lustre.testpool.lustre-OST041e_UUID' Mon Mar 19 03:02:02 2012-
#64767 (144)pool remove 0:lustre-MDT0000-mdtlov 1:lustre 2:testpool 3:lustre-OST041e_UUID
The OST pool operations consumed most of the records and caused the record count reach to the following limitation:
/* if it's the last idx in log file, then return -ENOSPC */
if (loghandle->lgh_last_idx >= LLOG_BITMAP_SIZE(llh) - 1)
RETURN(-ENOSPC);
/* (8192 - 88 - 8) * 8 = 64768 */
#define LLOG_BITMAP_SIZE(llh) ((llh->llh_hdr.lrh_len - \
llh->llh_bitmap_offset - \
sizeof(llh->llh_tail)) * 8)
Please find the attached lustre-MDT0000.log for the output of "llog_reader lustre-MDT0000" and see how to resolve this issue.