Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.4.1
-
None
-
3
-
11621
Description
At TACC we are running into an issue where the config log is overflowing due to the large number of OSTs and conf_param on the mdt and clients is failing with ENOSPC. I tracked down the issue to llog_osd_write_rec:
00000040:00000001:3.0:1384532145.376864:0:12518:0:(llog_osd.c:336:llog_osd_write_rec()) Process entered 00000040:00001000:3.0:1384532145.376865:0:12518:0:(llog_osd.c:346:llog_osd_write_rec()) new record 10620000 to [0xa:0x419:0x0] 00000040:00000001:3.0:1384532145.376867:0:12518:0:(llog_osd.c:440:llog_osd_write_rec()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
It looks like it's hitting this:
/* if it's the last idx in log file, then return -ENOSPC */ if (loghandle->lgh_last_idx >= LLOG_BITMAP_SIZE(llh) - 1) RETURN(-ENOSPC);
Here is the CONFIGs directory for reference:
total 31212 -rw-r--r-- 1 root root 11674048 Sep 24 13:39 gsfs-client -rw-r--r-- 1 root root 11980704 Sep 24 13:39 gsfs-MDT0000 -rw-r--r-- 1 root root 9432 Sep 24 13:48 gsfs-OST0000 -rw-r--r-- 1 root root 9432 Sep 24 13:49 gsfs-OST0001 ... -rw-r--r-- 1 root root 9432 Oct 1 12:15 gsfs-OST029f -rw-r--r-- 1 root root 12288 Sep 24 13:39 mountdata
Is there a way to increase the BITMAP_SIZE? It looks like the bitmap itself is based on the CHUNK_SIZE but the BITMAP_SIZE macro doesn't reference it at all.
Thanks.
ok thanks Manish