[LU-10646] Lost access to storage hardware during fsck Created: 08/Feb/18  Updated: 11/Feb/18  Resolved: 11/Feb/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question/Request Priority: Critical
Reporter: Joe Mervini Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Dell/DDN hardware


Rank (Obsolete): 9223372036854775807

 Description   

After updating the firmware and OS on the hardware I was running a fsck -fy against the MDT and OSTs in a large file system when the multipath devices became unaccessible. Currently the processes are in a "D" state but I believe that they had progressed past the pass 5 stage and were all in the process of updating quota inconsistencies.

I am working with the vendors to determine the reason for the multipath failures but my real concern at the moment is the states of the file systems. I am reluctant to simply reboot the systems because I just don't want to risk damage.

I'm hoping that Andreas can weigh in here and give me some advise.



 Comments   
Comment by Peter Jones [ 09/Feb/18 ]

Fan Yong

Can you please advise?

Thanks

Peter

Comment by nasf (Inactive) [ 09/Feb/18 ]

jamervi,

Andreas is on vacation. I hope I can help some.
Generally, I will NOT suggest the user to break the in-processing e2fsck by force to avoid damage. For your case, let's check the system status firstly. Please "echo t > /proc/sysrq-trigger", then attach the "dmesg" information.

Comment by Joe Mervini [ 09/Feb/18 ]

We pretty much moved past the "breaking the in-process e2fsck" stage. There was no way we could re-establish the paths to the devices.

On reboot we were able to successfully run fsck -n on all devices and they all came up clean but on one server it reported that it was skipping journal recovery because of the read-only nature of the fsck.

The problem that we are encountering is that any time we run a fsck -fy it will cause a disruption with multipath causing all paths to the device being checked to fail. It is unclear whether the problem is with e2fsck, dm-mapper-multipath, the underlying ib_srp subsystem or a combination of all three.

Currently I am running a generic fsck against one of the OSTs (fsck /dev/mapper/<device>) and it doesn't appear to be doing anything. Although the path is still active, I am not seeing any IO via iostat on the server or via the monitoring services on the storage controllers.

Although I have only checked one other OST on that same system, I was to mount the device ldiskfs.

Any assistance would be greatly appreciated.

Comment by nasf (Inactive) [ 09/Feb/18 ]

On reboot we were able to successfully run fsck -n on all devices and they all came up clean but on one server it reported that it was skipping journal recovery because of the read-only nature of the fsck.

I am not sure whether I understand the issue clearly or not. You said that you have ever run "fsck -n" on all OSTs successfully, and only one OST reported "skipping journal recovery". That means the data paths are available (at least for read) for all OSTs, right? Otherwise, the readonly mode fsck should fail.

Then you said that once you run "fsck -fy" on some OST, then the fsck will break because of all data paths to such device are unavailable, right? If yes, it seems that the data paths to such OST is downgrade to readonly, not for write (just suspect). The OST is the one with "skipping journal recovery"? Or some OST that report clean when "fsck -n"? Or all OSTs will fail when "fsck -fy"?

Currently I am running a generic fsck against one of the OSTs (fsck /dev/mapper/<device>) and it doesn't appear to be doing anything. Although the path is still active, I am not seeing any IO via iostat on the server or via the monitoring services on the storage controllers.

You mean neither read nor write are detected during the "fsck -fy", right? The monitoring service see nothing from the beginning of the "fsck -fy"? What will the monitoring service see if run "fsck -n"? normally?

Although I have only checked one other OST on that same system, I was to mount the device ldiskfs.

Sorry, not clear about that. You mean you can mount the device as ldiskfs? If yes, then such OST data path is still available for read.

Anyway, it seems that you have already broken the "fsck -fy" via reboot, right?

Have you configured HA for the OST? If yes, what is the status when access the OST via another OSS node?

Comment by Joe Mervini [ 09/Feb/18 ]

Yes - All the systems were rebooted yesterday because the ib_srp paths could not be re-established.

After rebooting all the systems 'fsck -n' was run on all OSTs and MDT. All target reported clean in the read-only fsck. On one OSS all OSTs reported skipping journal recovery during the 'fsck -n'.

When the 'fsck -fy' was run on 1 OST after a period of time both paths to the device failed. After the fact the device was completely inaccessible from the host running the fsck. There is a possibility that access through the individual /dev/sd device is still available. The reason I say this is because if I go to the failover node and try to access the device I get errors due to multimount protection. (Now that you mention that, it seems that it is pointing more to a dm-mapper-mulitpath problem.) The experience of Wednesday night was that all targets failed when fsck -fy was run.

When I run fsck -n against one of the OST I can see read activity on the storage controller.

So to reiterate, it appears that the file systems are intact on all targets. fsck in read-only mode will run without error on all targets. fsck -fy will nominally hang in "D" state on all targets due to failure of paths to devices under multipath control.

Hopefully this offers a little more clarity.

Comment by Joe Mervini [ 09/Feb/18 ]

I am bypassing dm-mapper-multipath and running fsck. It is not disrupting the path.

The device that I am running fsck against is one that I could not mount ldiskfs. I ran fsck without any flags and it came back after flushing the journal. I then ran fsck -p and got the message:
root@goss13 ~]# fsck -fp /dev/sde
fsck from util-linux 2.23.2
gscratch-OST007c: Interior extent node level 0 of inode 3:
Logical start 0 does not match logical start 73 at next level.

gscratch-OST007c: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)

I am now running the fsck without flags and am seeing this:

[root@goss13 ~]# fsck /dev/sde
fsck from util-linux 2.23.2
e2fsck 1.42.13.wc5 (15-Apr-2016)
gscratch-OST007c contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Interior extent node level 0 of inode 3:
Logical start 0 does not match logical start 73 at next level. Fix<y>? no
Inode 3, i_blocks is 600, should be 16. Fix<y>? no

Please advise.

Comment by nasf (Inactive) [ 10/Feb/18 ]

The inode <3> is ldiskfs internal inode, for user quota. That means the user quota is broken. There seems no other better way, have to choose "yes" to fix the corruption. The worst case is that the user quota become inconsistent. But it is not fatal, we can rebuild it later.

To be safe, if possible, please do device-level backup such OST before the repairing (such as dd).

Comment by Joe Mervini [ 10/Feb/18 ]

The other affected OST had the same error and was unwittingly run with the -fy option. Everything came up fine. I repair the OST mentioned about and I was able to mount both OSTs ldiskfs.

We discovered the reason why the fsck was causing the multipath paths to disappear: There was an inconsistency in the values of max_sectors_kb between the dm device and the associated sd devices. This inconsistency has been lurking for quite some time but somehow the fsck tickled the system in such a way that cause it to rear its head.

We are holding off mounting lustre until we replace a faulty storage controller but I believe that everything is on track to bring the file system back online.

Thanks for your assistance. Please close this ticket.

Comment by nasf (Inactive) [ 11/Feb/18 ]

jamervi,
glad knowing the system recovered.

Generated at Sat Feb 10 02:36:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.