[LU-1586] no free catalog slots for log - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.1.1
Labels:
- llnl
Environment:
https://github.com/chaos/lustre/commits/2.1.1-15chaos

Severity:
3
Rank (Obsolete):
4003

Description

It seems the MDT catalog file may be damaged on our test filesystem. We were doing recovery testing with the patch for ~~LU-1352~~. Sometime after power-cycling the MDS and letting it go through recovery, clients started getting EFAULT writing to lustre. These failures are accompanied by the following console errors on the MDS.

Jun 28 12:08:45 zwicky-mds2 kernel: LustreError: 11841:0:(llog_cat.c:81:llog_cat_new_log()) no free catalog slots for log...
Jun 28 12:08:45 zwicky-mds2 kernel: LustreError: 11841:0:(llog_cat.c:81:llog_cat_new_log()) Skipped 3 previous similar messages
Jun 28 12:08:45 zwicky-mds2 kernel: LustreError: 11841:0:(llog_obd.c:454:llog_obd_origin_add()) write one catalog record failed: -28
Jun 28 12:08:45 zwicky-mds2 kernel: LustreError: 11841:0:(llog_obd.c:454:llog_obd_origin_add()) Skipped 3 previous similar messages
Jun 28 12:08:45 zwicky-mds2 kernel: LustreError: 11841:0:(mdd_object.c:1330:mdd_changelog_data_store()) changelog failed: rc=-28 op17 t[0x200de60af:0x17913:0x0]

I mentioned this in ~~LU-1570~~, but I figured a new ticket was needed.

Attachments

Issue Links

duplicates

LU-7340 ChangeLogs catalog full condition should be handled more gracefully

Resolved

is related to

LU-9055 MDS crash due to changelog being full

Open

LU-1570 llog_cat.c:428:llog_cat_process_flags() catlog 0x27500007 crosses index zero

Resolved

Activity

[LU-1586] no free catalog slots for log

Andreas Dilger added a comment - 01/Mar/17 12:02 PM

Close as a duplicate of ~~LU-7340~~.

Andreas Dilger added a comment - 01/Mar/17 12:02 PM Close as a duplicate of LU-7340 .

Kilian Cavalotti added a comment - 09/Jan/15 5:09 PM - edited

As a matter of fact, it happened to us on a production filesystem. I wouldn't say the workload is non-pathological, though.

Anyway, we noticed at some point that a MD operation such as "chown" could lead too ENOSPC:

# chown userA /scratch/users/userA
chown: changing ownership of `/scratch/users/userA/': No space left on device

The related MDS messages are:

LustreError: 8130:0:(llog_cat.c:82:llog_cat_new_log()) no free catalog slots for log...
LustreError: 8130:0:(mdd_dir.c:783:mdd_changelog_ns_store()) changelog failed: rc=-28, op1 test c[0x20000b197:0x108d0:0x0] p[0x200002efb:0x155d5:0x0]

Any tip on how to solve this? Would consuming (or clearing) the changelogs be sufficient?

Kilian Cavalotti added a comment - 09/Jan/15 5:09 PM - edited As a matter of fact, it happened to us on a production filesystem. I wouldn't say the workload is non-pathological, though. Anyway, we noticed at some point that a MD operation such as "chown" could lead too ENOSPC: # chown userA /scratch/users/userA chown: changing ownership of `/scratch/users/userA/': No space left on device The related MDS messages are: LustreError: 8130:0:(llog_cat.c:82:llog_cat_new_log()) no free catalog slots for log... LustreError: 8130:0:(mdd_dir.c:783:mdd_changelog_ns_store()) changelog failed: rc=-28, op1 test c[0x20000b197:0x108d0:0x0] p[0x200002efb:0x155d5:0x0] Any tip on how to solve this? Would consuming (or clearing) the changelogs be sufficient?

Ned Bass (Inactive) added a comment - 22/Feb/13 11:22 AM

Aurelien, we're concerned about filling the changelog catalog, not the device. We actually had that happen on our our test system when Robinhood was down and I was testing metadata peformance (hence this Jira issue). It's far less likely on a production system with non-pathological workloads, but not outside the realm of possibility.

Ned Bass (Inactive) added a comment - 22/Feb/13 11:22 AM Aurelien, we're concerned about filling the changelog catalog, not the device. We actually had that happen on our our test system when Robinhood was down and I was testing metadata peformance (hence this Jira issue). It's far less likely on a production system with non-pathological workloads, but not outside the realm of possibility.

Aurelien Degremont (Inactive) added a comment - 22/Feb/13 3:19 AM

FYI we had Robinhood setup on a filesystem with 100 millions of inodes, and MDS RPC rate between 1k/s and 30k/s peak. We had Robinhood stopped for days and we had millions of record changelog to be consumed. It has required also days to close the gap but the MDS was very, very far from being filled. (MDS size was 2 TB). I think we did not consume even 1% of this device.
Do not worry

Aurelien Degremont (Inactive) added a comment - 22/Feb/13 3:19 AM FYI we had Robinhood setup on a filesystem with 100 millions of inodes, and MDS RPC rate between 1k/s and 30k/s peak. We had Robinhood stopped for days and we had millions of record changelog to be consumed. It has required also days to close the gap but the MDS was very , very far from being filled. (MDS size was 2 TB). I think we did not consume even 1% of this device. Do not worry

Ned Bass (Inactive) added a comment - 21/Feb/13 2:14 PM

Sorry, I was filling the device not the changelog catalog. I specified MDSDEV1=/dev/sda thinking it would use the whole device, but I also need to set MDSSIZE. So it will take days not minutes to hit this limit, making it less worrisome but still something that should be addressed.

The reason I'm now picking this thread up again is that we have plans to enable changelogs on our production systems for use by Robinhood. We're concerned about being exposed to the problems under discussion here if Robinhood goes down for an extended period.

Ned Bass (Inactive) added a comment - 21/Feb/13 2:14 PM Sorry, I was filling the device not the changelog catalog. I specified MDSDEV1=/dev/sda thinking it would use the whole device, but I also need to set MDSSIZE. So it will take days not minutes to hit this limit, making it less worrisome but still something that should be addressed. The reason I'm now picking this thread up again is that we have plans to enable changelogs on our production systems for use by Robinhood. We're concerned about being exposed to the problems under discussion here if Robinhood goes down for an extended period.

Ned Bass (Inactive) added a comment - 21/Feb/13 3:26 AM

It only took about 1.3 million changelog entries to fill the catalog. My test case was something like

MDSDEV1=/dev/sda llmount.sh
lctl --device lustre-MDT0000 changelog_register
while createmany -m /mnt/lustre/%d 1000 ; do
    unlinkmany /mnt/lustre/%d 1000
done

and it made it through about 670 iterations before failing.

Ned Bass (Inactive) added a comment - 21/Feb/13 3:26 AM It only took about 1.3 million changelog entries to fill the catalog. My test case was something like MDSDEV1=/dev/sda llmount.sh lctl --device lustre-MDT0000 changelog_register while createmany -m /mnt/lustre/%d 1000 ; do unlinkmany /mnt/lustre/%d 1000 done and it made it through about 670 iterations before failing.

Andreas Dilger added a comment - 21/Feb/13 1:58 AM

Ned, I agree this should be handled more gracefully. I think it is preferable to unregister the oldest consumer as the catalog approaches full, which should cause old records to be released (need to check this). That is IMHO better than setting the mask to zero and no longer recording new events.

In both cases the consumer will have to do some scanning to find new changes. However, in the first case, it is more likely that the old consumer is no longer in use and no harm is done, while in the second case even a well-behaved consumer is punished.

On a related note, do you know how many files were created before the catalog was full? In theory about 4B Changelog entries should be possible (approx 64000^2), but this might be reduced by some small factor if there are multiple records per file (e.g. create + setattr).

Andreas Dilger added a comment - 21/Feb/13 1:58 AM Ned, I agree this should be handled more gracefully. I think it is preferable to unregister the oldest consumer as the catalog approaches full, which should cause old records to be released (need to check this). That is IMHO better than setting the mask to zero and no longer recording new events. In both cases the consumer will have to do some scanning to find new changes. However, in the first case, it is more likely that the old consumer is no longer in use and no harm is done, while in the second case even a well-behaved consumer is punished. On a related note, do you know how many files were created before the catalog was full? In theory about 4B Changelog entries should be possible (approx 64000^2), but this might be reduced by some small factor if there are multiple records per file (e.g. create + setattr).

Ned Bass (Inactive) added a comment - 20/Feb/13 9:16 PM

It seems like lots of bad things can happen if the changelog catalog is allowed to become full: ~~LU-2843~~ ~~LU-2844~~ ~~LU-2845~~. Besides these crashes the MDS service fails to start due to EINVAL errors from mdd_changelog_llog_init(), and the only way I've found to recover is manually deleting the changelog_catalog file.

I'm interested in adding safety mechanisms to prevent this situation. Perhaps the MDS could automatically unregister changelog users or set the changelog mask to zero based on a tunable threshold of unprocessed records. Does anyone have other ideas for how to handle this more gracefully?

Ned Bass (Inactive) added a comment - 20/Feb/13 9:16 PM It seems like lots of bad things can happen if the changelog catalog is allowed to become full: LU-2843 LU-2844 LU-2845 . Besides these crashes the MDS service fails to start due to EINVAL errors from mdd_changelog_llog_init(), and the only way I've found to recover is manually deleting the changelog_catalog file. I'm interested in adding safety mechanisms to prevent this situation. Perhaps the MDS could automatically unregister changelog users or set the changelog mask to zero based on a tunable threshold of unprocessed records. Does anyone have other ideas for how to handle this more gracefully?

Peter Jones added a comment - 23/Jul/12 11:31 AM

Adding those involved with HSM for comment

Peter Jones added a comment - 23/Jul/12 11:31 AM Adding those involved with HSM for comment

Andreas Dilger added a comment - 13/Jul/12 11:37 PM

Yes, the changelogs could definitely be a factor. Once there is a registered changelog user, the changelogs are kept on disk until they are consumed. That ensures that if e.g. Robinhood crashes, or has some other problem for a day or four, that it won't have to do a full scan just to recover the state again.

However, if the ChangeLog user is not unregistered, the changelogs will be kept until they run out of space. I suspect that is the root cause here, and should be investigated further. This bug should be CC'd to Jinshan and Aurelien Degremont, who are working on HSM these days.

Cheers, Andreas

Andreas Dilger added a comment - 13/Jul/12 11:37 PM Yes, the changelogs could definitely be a factor. Once there is a registered changelog user, the changelogs are kept on disk until they are consumed. That ensures that if e.g. Robinhood crashes, or has some other problem for a day or four, that it won't have to do a full scan just to recover the state again. However, if the ChangeLog user is not unregistered, the changelogs will be kept until they run out of space. I suspect that is the root cause here, and should be investigated further. This bug should be CC'd to Jinshan and Aurelien Degremont, who are working on HSM these days. Cheers, Andreas

People

Assignee:: Bob Glossman (Inactive)

Reporter:: Ned Bass (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 29/Jun/12 6:50 PM

Updated:: 01/Mar/17 12:02 PM

Resolved:: 01/Mar/17 12:02 PM