[LU-1051] removing file failed due to no space left on device Created: 29/Jan/12 Updated: 23/Feb/12 Resolved: 23/Feb/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.2.0 |
| Fix Version/s: | Lustre 2.2.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Shuichi Ihara (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 4717 | ||||||||
| Description |
|
Hi, I'm testing 2.1.55 and just created small MDT (500MB) and 28 OSTs (16TB per OST) on it. Here is what I did on the client.
we can't see any error messages on the client's log file, but there are some error messages on the MDS below. Jan 29 16:17:35 dl01 kernel: Lustre: 30022:0:(ldlm_lib.c:909:target_handle_connect()) MGS: connection from e388fe06-b86b-228b-ea1c-0834ea5009ae@192.168.20.126@tcp t0 exp (null) cur 1327871855 last 0 If we have make the more bigger MDT, creating and removing were no problem. Any minimum MDT size limitation changes in 2.x? |
| Comments |
| Comment by Andreas Dilger [ 29/Jan/12 ] |
|
Thais relates to the OSD API change. I'm not sure why it thinks unblinking the file needs 3300 credits, but this is too large for the minimum-sized journal that was created on this small MDT. Normally, I would report that the MDT will never be so small in production compared to the number of OSTs, but with the large number of credits being reserved I'm concerned that this could cause problems during normal operation as well. The high OST count is likely a factor, but it is definitely worthwhile to investigate why so many credits are reserved. At a minimum, there needs to be a limit on the number of reserved bitmaps and group descriptor blocks for the actual number of bitmaps and descriptor blocks in the filesystem. This will probably reduce the reservation for such small filesystems noticeably. There may be other places where the credit calculation can be improved. |
| Comment by Peter Jones [ 30/Jan/12 ] |
|
Niu Could you please look into this one? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 01/Feb/12 ] |
|
Most reserved credits came from unlink llog records (3164), there are 28 stripes, for each unlink llog record, we need to reserve credits for 4 block writes and 1 object create:
so the total number for unlink llog records is (14 * 4 + 27) * 28 = 3164. Actually, not every llog write needs to create new plain log, we just reserve the max number conservatively, in the old master (before OSD API landing), we just reserve 14 blocks for each unlink records. (even with the old code, unlink llog needs 14 * 28 = 392 credits, it's greater than 256 too) Since the credits caculation is in OSD layer, seems there isn't any clean way to optimize the calculation for llog (OSD isn't aware of llog). Maybe we just should change the mdd_declare_llog_record() to make it reserve a less conservative value for this moment (like old master code does)? |
| Comment by Alex Zhuravlev [ 01/Feb/12 ] |
|
like I said before, there are a lot of ways to improve this within OSD: |
| Comment by Andreas Dilger [ 01/Feb/12 ] |
|
Similarly, the insert declarations for the llog records should all be for the same parent directory, so the number of credits needed for all of the inserts can be reduced because the parent inode only needs to be modified once. This can be done by tracking the inode numbers that are already included in the transaction, and not accounting for them twice. The number of parent leaf blocks modified is min(2x current leaf blocks, number of insertions). In the worst case each one would split, if this is less than the number of insertions. There should only be either a new llog created OR an llog header update (assuming that a new llog creation already includes credits for the header). |
| Comment by Andreas Dilger [ 01/Feb/12 ] |
|
Note that in the orion branch there is an additional line in osd_trans_start() that shrinks the transaction to the maximum allowed size to avoid failing outright in this case: @@ -579,6 +579,8 @@ int osd_trans_start(const struct lu_env *env, struct dt_device *d,
CWARN("%s: too many transaction credits (%d > %d)\n",
d->dd_lu_dev.ld_obd->obd_name, oh->ot_credits,
osd_journal(dev)->j_max_transaction_buffers);
+ /* XXX */
+ oh->ot_credits = osd_journal(dev)->j_max_transaction_buffers;
#ifdef OSD_TRACK_DECLARES
CERROR(" attr_set: %d, punch: %d, xattr_set: %d,\n",
oh->ot_declare_attr_set, oh->ot_declare_punch,
This won't actually solve the problem if the transaction is too large (unlikely, even for a small MDT journal), and doesn't fix the performance impact of the code over-allocating the number of transaction credits, but it would at least allow the system to continue. I would recommend that we land such a patch to 2.1.1 so that we don't cause the MDT to fail just based on speculation. The underlying journal code would catch this error (with an assertion) if the actual number of credits used exceeds the transaction size. Since production MDTs will always have large enough journals, I don't think this is a serious risk. I'm of course NOT suggesting that this will resolve the issue, since the huge number of transaction credits reserved can significantly impact the performance, even if no error message is printed for large journals, so the credit reservation still needs to be improved. |
| Comment by Niu Yawei (Inactive) [ 01/Feb/12 ] |
I'm not sure if I follow you correctly. Actually, looks we now only reserve LDISKFS_QUOTA_INIT_BLOCKS for the whole transaction (in osd_trans_start()), which looks not sufficient to me, I think we at least need to reserve LDISKFS_MAXQUOTAS_INIT_BLOCKS (or even more for rename tranasction). We may optimize for root uid/gid, unless we know exactly how many uids/gids are affected in the whole transaction (like the new quota design in Orion), which requires us to record which uid/gids are affected in the declare stage, and calculate the total credits for quota in osd_trans_start(), is it fine with you? Anyway, the amount credits for quota is much small comparing with the huge amount of unlink llog.
Yes, we know for sure that the catlog header is already allocated. The write to catlog header could be optimized. |
| Comment by Niu Yawei (Inactive) [ 01/Feb/12 ] |
ok, thanks.
llog creation just call normal declare_create to reserve credits, I'm afraid that it doesn't includes the credits for header. |
| Comment by Niu Yawei (Inactive) [ 02/Feb/12 ] |
|
http://review.whamcloud.com/2082 I didn't figure out an easy way to get the dt_object of llog file, so that patch only reduce the credits for llog in following aspects:
it can reduce ~40% credits for unlink llog so far. This patch changed quota credits calculation as well. |
| Comment by Build Master (Inactive) [ 13/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 13/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 13/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 13/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 13/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 13/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 13/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 13/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 13/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 13/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 13/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 13/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 13/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 13/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Peter Jones [ 14/Feb/12 ] |
|
Landed for 2.2 |
| Comment by Niu Yawei (Inactive) [ 14/Feb/12 ] |
|
patche for reducing llog credits: http://review.whamcloud.com/#change,2100 |
| Comment by Build Master (Inactive) [ 17/Feb/12 ] |
|
Integrated in Result = FAILURE
|
| Comment by Build Master (Inactive) [ 17/Feb/12 ] |
|
Integrated in Result = FAILURE
|
| Comment by Build Master (Inactive) [ 17/Feb/12 ] |
|
Integrated in Result = ABORTED
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 22/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Peter Jones [ 23/Feb/12 ] |
|
Landed for 2.2 |