[LU-270] LDisk-fs warning (device md30): ldisk_multi_mount_protect: fsck is running on filesystem - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 1.8.6
Affects Version/s: Lustre 1.8.6
Labels:
None
Environment:
RHEL 5.5 and Lustre 1.8.0.1 on J4400's

Severity:
3
Rank (Obsolete):
10266

Description

OST 10 /dev/md30 resident on OSS3
From /var/log/messages
LDisk-fs warning (device md30): ldisk_multi_mount_protect: fsck is running on filesystem
LDisk-fs warning (device md30): ldisk_multi_mount_protect: MMP failure info: <time in unix seconds>, last update node: OSS3, last update device /dev/md30

This is a scenario that keeps sending the customer in circles. They know for certain that an fsck is not running. Since they know that they can try to turn the mmp bit off vi the following commands:

To manually disable MMP, run:
tune2fs -O ^mmp <device>
To manually enable MMP, run:
tune2fs -O mmp <device>

These commands fail saying that valid superblock does not exist, but they can see their valid superblock (with mmp set) by running the following command:

Tune2fs -l /dev/md30

It is their understanding that a fix for this issue was released with a later version of Lustre, but aside from that, is there a way to do this?

Customer contact is tyler.s.wiegers@lmco.com

Attachments

Activity

[LU-270] LDisk-fs warning (device md30): ldisk_multi_mount_protect: fsck is running on filesystem

Andreas Dilger added a comment - 04/May/11 8:29 PM

Tyler, I left a VM for you on the number you provided in email.

For the OOPS message, the easiest way to handle that would be to take a photo of the screen and attach it. Otherwise, having the actual error message (e.g. NULL pointer dereference at ...), the process name, and the list of function names from the top of the stack (i.e. those functions most recently called) would help debug that problem.

Normall, if e2fsck is successful for the OST, then Lustre should generally be able to mount the filesystem and run with it, regardless of what corruptions there were in the past, but of course I can't know what other kinds of corruptions there might be that are causing strange problems.

I definitely would not classify such problems as something that happens often, so while understanding what is going wrong and fixing it is useful to us, you need to make a decision on the value of the data in the filesystem to the users vs. the downtime it is taking to debug this problem. Of course it would be easier and faster to debug with direct access to the logs, but there are many such sites disconnected from the internet that are running Lustre, so this is nothing new.

Depending on the site's tolerance for letting data out, there are a number of ways we've worked with such sites in the past. One way is to print the logs and then scan them on an internet-connected system and attach them to the bug. This maintains an "air gap" for the system while still being relatively high bandwidth, if there is nothing sensitive in the log files themselves.

If you are not already in a production situation, I would strongly recommend upgrading to Lustre 1.8.5. This is running stably on many systems, and given the difficulty in diagnosing some of the problems you have already seen, it would be unfortunate to have to diagnose problems that were already fixed in under more difficult circumstances. Conversely, I know of very few 1.8.x sites that are still running 1.8.0.1 anymore.

Andreas Dilger added a comment - 04/May/11 8:29 PM Tyler, I left a VM for you on the number you provided in email. For the OOPS message, the easiest way to handle that would be to take a photo of the screen and attach it. Otherwise, having the actual error message (e.g. NULL pointer dereference at ...), the process name, and the list of function names from the top of the stack (i.e. those functions most recently called) would help debug that problem. Normall, if e2fsck is successful for the OST, then Lustre should generally be able to mount the filesystem and run with it, regardless of what corruptions there were in the past, but of course I can't know what other kinds of corruptions there might be that are causing strange problems. I definitely would not classify such problems as something that happens often, so while understanding what is going wrong and fixing it is useful to us, you need to make a decision on the value of the data in the filesystem to the users vs. the downtime it is taking to debug this problem. Of course it would be easier and faster to debug with direct access to the logs, but there are many such sites disconnected from the internet that are running Lustre, so this is nothing new. Depending on the site's tolerance for letting data out, there are a number of ways we've worked with such sites in the past. One way is to print the logs and then scan them on an internet-connected system and attach them to the bug. This maintains an "air gap" for the system while still being relatively high bandwidth, if there is nothing sensitive in the log files themselves. If you are not already in a production situation, I would strongly recommend upgrading to Lustre 1.8.5. This is running stably on many systems, and given the difficulty in diagnosing some of the problems you have already seen, it would be unfortunate to have to diagnose problems that were already fixed in under more difficult circumstances. Conversely, I know of very few 1.8.x sites that are still running 1.8.0.1 anymore.

Tyler Wiegers (Inactive) added a comment - 04/May/11 7:44 PM

Also, there is no data on this system that we absolutely need to recover, it is purely a high speed data store for temporary data. Do you believe there is any value to continueing this troubleshooting, or is rebuilding lustre filesystems at this point a good idea?

We will be delivering this system into operations as a new technology within the next couple of weeks, so our concern is that we have an opportunity to learn something that may help in future operations. Is this situation something that can happen often and that we need to plan for, or is this a huge fluke that we shouldn't ever expect?

Thanks!

Tyler Wiegers (Inactive) added a comment - 04/May/11 7:44 PM Also, there is no data on this system that we absolutely need to recover, it is purely a high speed data store for temporary data. Do you believe there is any value to continueing this troubleshooting, or is rebuilding lustre filesystems at this point a good idea? We will be delivering this system into operations as a new technology within the next couple of weeks, so our concern is that we have an opportunity to learn something that may help in future operations. Is this situation something that can happen often and that we need to plan for, or is this a huge fluke that we shouldn't ever expect? Thanks!

Tyler Wiegers (Inactive) added a comment - 04/May/11 7:33 PM

There were no logs in the oss while attempting to mount

The client messages file has the following (minus date stamps to save typing time):

lustre-clilov-ffff81036703fc00.lov: set parameter stripesize=1048576
Skipped 4 previous similar messages
setting import lustre-OST000b_UUID INACTIVE by administrator request
Skipped 1 previous similar message
LustreError: 7116:0:(lov_obd.c:325:lov_connect_obd()) not connecting OSC lustre-OST000b_UUID; administratively disabled
Skipped 1 previous similar message
Client lustre-client has started
general protection fault: 0000 [4] SMP
last sysfs file: /class/infiniband/mlx4_1/node_desc
CPU 0
Modules linked in: ~~~~ lots of modules

After this we did a df command and it segmentation faults

Also, we see different sizes for the OST's using a normal df command on the OSS. doing a lfs df on the clients show different %'s for the good OSTs, but it comes back with the Bad address (-14) error when it gets to ost11, so I can't tell what that would say. lfs df -i shows 0%, but still fails at ost11

Tyler Wiegers (Inactive) added a comment - 04/May/11 7:33 PM There were no logs in the oss while attempting to mount The client messages file has the following (minus date stamps to save typing time): lustre-clilov-ffff81036703fc00.lov: set parameter stripesize=1048576 Skipped 4 previous similar messages setting import lustre-OST000b_UUID INACTIVE by administrator request Skipped 1 previous similar message LustreError: 7116:0:(lov_obd.c:325:lov_connect_obd()) not connecting OSC lustre-OST000b_UUID; administratively disabled Skipped 1 previous similar message Client lustre-client has started general protection fault: 0000 [4] SMP last sysfs file: /class/infiniband/mlx4_1/node_desc CPU 0 Modules linked in: ~~~~ lots of modules After this we did a df command and it segmentation faults Also, we see different sizes for the OST's using a normal df command on the OSS. doing a lfs df on the clients show different %'s for the good OSTs, but it comes back with the Bad address (-14) error when it gets to ost11, so I can't tell what that would say. lfs df -i shows 0%, but still fails at ost11

Tyler Wiegers (Inactive) added a comment - 04/May/11 7:25 PM - edited

ost15 had a fairly large amount of filesystem corruption when running the e2fsck. We used a lustre restore from lost and found command to attempt to restore that data. ost11 did not have corruption I don't beleive.

The recovery status using lctl get_param obdfilter.*.recovery_status on the oss shows everything as COMPLETE, which is good.

Using lctl get_param osc.*.import (not state):

The mds shows state as FULL for all OSTs, which is good

The client shows state as NEW for OST 11 and 15, but FULL for all others. There are also 3 entries for OST11 and 15 in this listing

We're working on the log output for attempting to mount

Tyler Wiegers (Inactive) added a comment - 04/May/11 7:25 PM - edited ost15 had a fairly large amount of filesystem corruption when running the e2fsck. We used a lustre restore from lost and found command to attempt to restore that data. ost11 did not have corruption I don't beleive. The recovery status using lctl get_param obdfilter.*.recovery_status on the oss shows everything as COMPLETE, which is good. Using lctl get_param osc.*.import (not state): The mds shows state as FULL for all OSTs, which is good The client shows state as NEW for OST 11 and 15, but FULL for all others. There are also 3 entries for OST11 and 15 in this listing We're working on the log output for attempting to mount

Tyler Wiegers (Inactive) added a comment - 04/May/11 7:15 PM

We're getting those logs for you now, we have to re-type them since they are on a segregated system. We are strapped for time so as soon as you can respond that would be great, if we don't have this back up tomorrow morning we get to rebuild lustre to get the system up.

If you are available for a phone call that would be great as well, we are available all night if necessary.

Thanks!

Tyler Wiegers (Inactive) added a comment - 04/May/11 7:15 PM We're getting those logs for you now, we have to re-type them since they are on a segregated system. We are strapped for time so as soon as you can respond that would be great, if we don't have this back up tomorrow morning we get to rebuild lustre to get the system up. If you are available for a phone call that would be great as well, we are available all night if necessary. Thanks!

Andreas Dilger added a comment - 04/May/11 7:05 PM

Did ost11 and ost15 have any filesystem corruption when you ran e2fsck on them?

When you report that the %used is different, is that from "lfs df" or "lfs df -i", or from "df" on the OSS node for the local OST mounpoints?

You can check the recovery state of all OSTs on an OSS via "lctl get_param obdfilter.*.recovery_status". They should all report "status: COMPLETE" (or "INACTIVE" if recovery was never done since the OST was mounted).

As for the OSTs being marked inactive, you can check the status of the connections on the MDS and clients via "lctl get_param osc.*.state". All of the connections should report "current_state: FULL" meaning that the OSCs are connected to the OSTs. Even so, if the OSTs are not started for some reason, it shouldn't prevent the clients from mounting.

Can you please attach an excerpt from the syslog for a client trying to mount, and also from OST11 and OST15.

Andreas Dilger added a comment - 04/May/11 7:05 PM Did ost11 and ost15 have any filesystem corruption when you ran e2fsck on them? When you report that the %used is different, is that from "lfs df" or "lfs df -i", or from "df" on the OSS node for the local OST mounpoints? You can check the recovery state of all OSTs on an OSS via "lctl get_param obdfilter.*.recovery_status". They should all report "status: COMPLETE" (or "INACTIVE" if recovery was never done since the OST was mounted). As for the OSTs being marked inactive, you can check the status of the connections on the MDS and clients via "lctl get_param osc.*.state". All of the connections should report "current_state: FULL" meaning that the OSCs are connected to the OSTs. Even so, if the OSTs are not started for some reason, it shouldn't prevent the clients from mounting. Can you please attach an excerpt from the syslog for a client trying to mount, and also from OST11 and OST15.

Tyler Wiegers (Inactive) added a comment - 04/May/11 6:13 PM

Some additional data points.

After unmounting and resetting, ost11 and 15 complete recovery ok, but we still aren't able to mount lustre on a client.

OST 11 and 15 are showing very different % used values than all of our other OSTs (they should all be even because of the stripes we use).

In messages on our MDT server (mds2) we get messages stating that ost11 is "INACTIVE" by administrator request.

We also see eviction messages when trying to mount a client for ost11 and 15:
This client was evicted by lustre-OST000b; in progress operations using this service will fail

Tyler Wiegers (Inactive) added a comment - 04/May/11 6:13 PM Some additional data points. After unmounting and resetting, ost11 and 15 complete recovery ok, but we still aren't able to mount lustre on a client. OST 11 and 15 are showing very different % used values than all of our other OSTs (they should all be even because of the stripes we use). In messages on our MDT server (mds2) we get messages stating that ost11 is "INACTIVE" by administrator request. We also see eviction messages when trying to mount a client for ost11 and 15: This client was evicted by lustre-OST000b; in progress operations using this service will fail

Tyler Wiegers (Inactive) added a comment - 04/May/11 4:59 PM

Thanks Andreas

Where we are at right now is that all the OST's can be mounted, however lustre cannot be successfully mounted.

After having issues initially, we shut down all of our lustre clients, and cleanly rebooted all of our OSSs and MDSs. After bringing all the OSTs up, we had 2 OSTs (11 and 15) be in a "recovering" state that never finished (about 15 minutes after bringing up the client). We used lctl to abort recovery, and attempted mounting, which apeared to be successful. Running a df on /lustre after that segmentation faults.

Additionally, when running lfs df throws the following error when it gets to ost11:
error: llapi_obd_statfs failed: Bad address (-14)

Doing an lctl dl on a client have all the OSTs as "UP", but the last number on each line is different for OST11 and OST15 (it's 5 for all OSTs, 4 for OST11/15)

The mds's were showing that all the OSTs were "UP" as well, but the last numbers show all OSTs as 5

Tyler Wiegers (Inactive) added a comment - 04/May/11 4:59 PM Thanks Andreas Where we are at right now is that all the OST's can be mounted, however lustre cannot be successfully mounted. After having issues initially, we shut down all of our lustre clients, and cleanly rebooted all of our OSSs and MDSs. After bringing all the OSTs up, we had 2 OSTs (11 and 15) be in a "recovering" state that never finished (about 15 minutes after bringing up the client). We used lctl to abort recovery, and attempted mounting, which apeared to be successful. Running a df on /lustre after that segmentation faults. Additionally, when running lfs df throws the following error when it gets to ost11: error: llapi_obd_statfs failed: Bad address (-14) Doing an lctl dl on a client have all the OSTs as "UP", but the last number on each line is different for OST11 and OST15 (it's 5 for all OSTs, 4 for OST11/15) The mds's were showing that all the OSTs were "UP" as well, but the last numbers show all OSTs as 5

Andreas Dilger added a comment - 04/May/11 2:36 PM

You are correct - my sincere apologies. I was counting 2-byte fields starting in the second row instead of 4-byte fields starting in the first row. I've corrected the instructions in this bug in case they are re-used for similar problems in the future. We've discussed in the past to have a tool to repair this file automatically in case of corruption, and that is underscored by this issue.

It looks like you (correctly) modified the 5th column, so all is well and no further action is needed.

It looks like you couldn't have modified the 7th column, or the OST would have failed to mount. I did an audit of the code to see what is using these fields (the correct ldd_svindex field and the incorrect ldd_mount_type field). I found that the ldd_svindex field is only used in case the configuration database on the MGS is rewritten (due to --writeconf) and the OST is reconnecting to the MGS to recreate the configuration record. The ldd_mount_type field is used to determine the backing filesystem type (usually "ldiskfs" for type = 0x0001, but would have been "reiserfs" with type = 0x0003).

If you want to be a bit safer in the future, you could use the "debugfs" command posted earlier to dump this file from all of the OSTs (it can safely be done while the OST is mounted) and save them to a safe location.

Again, apologies for the mixup.

Andreas Dilger added a comment - 04/May/11 2:36 PM You are correct - my sincere apologies. I was counting 2-byte fields starting in the second row instead of 4-byte fields starting in the first row. I've corrected the instructions in this bug in case they are re-used for similar problems in the future. We've discussed in the past to have a tool to repair this file automatically in case of corruption, and that is underscored by this issue. It looks like you (correctly) modified the 5th column, so all is well and no further action is needed. It looks like you couldn't have modified the 7th column, or the OST would have failed to mount. I did an audit of the code to see what is using these fields (the correct ldd_svindex field and the incorrect ldd_mount_type field). I found that the ldd_svindex field is only used in case the configuration database on the MGS is rewritten (due to --writeconf) and the OST is reconnecting to the MGS to recreate the configuration record. The ldd_mount_type field is used to determine the backing filesystem type (usually "ldiskfs" for type = 0x0001, but would have been "reiserfs" with type = 0x0003). If you want to be a bit safer in the future, you could use the "debugfs" command posted earlier to dump this file from all of the OSTs (it can safely be done while the OST is mounted) and save them to a safe location. Again, apologies for the mixup.

Tyler Wiegers (Inactive) added a comment - 04/May/11 1:33 PM

Andreas, your procedure worked flawlessly and our OST is back up and running. We verified that the mountdata file was indeed zero length.

One clarification that I would like to make though, we copied from ost7 and the following line to edit was different that what you had provided (for the entry to edit):

0000010: 0200 0000 0200 0000 0700 0000 0100 0000

This line you had indicated to modify the 7th entry, when we copied from ost07 it looked like the 5th entry should be modified instead.

Tyler Wiegers (Inactive) added a comment - 04/May/11 1:33 PM Andreas, your procedure worked flawlessly and our OST is back up and running. We verified that the mountdata file was indeed zero length. One clarification that I would like to make though, we copied from ost7 and the following line to edit was different that what you had provided (for the entry to edit): 0000010: 0200 0000 0200 0000 0700 0000 0100 0000 This line you had indicated to modify the 7th entry, when we copied from ost07 it looked like the 5th entry should be modified instead.

Tyler Wiegers (Inactive) added a comment - 04/May/11 1:30 PM - edited

Thanks Peter, I was actually in the process of updating the bugs with our most up to date status and actions taken (the site was down earlier this morning when I tried).

Again, we appreciate your support with all this!

Tyler Wiegers (Inactive) added a comment - 04/May/11 1:30 PM - edited Thanks Peter, I was actually in the process of updating the bugs with our most up to date status and actions taken (the site was down earlier this morning when I tried). Again, we appreciate your support with all this!

People

Assignee:: Andreas Dilger

Reporter:: Dan Ferber (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 03/May/11 11:31 AM

Updated:: 26/Oct/11 7:54 PM

Resolved:: 09/May/11 7:48 AM