[LU-9569] Unable to mount OSTs after power event Created: 28/May/17  Updated: 28/May/17  Resolved: 28/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Jesse Hanley Assignee: Peter Jones
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre 2.8. Redhat 6 - 2.6.32-642.6.2.el6.


Severity: 1
Rank (Obsolete): 9223372036854775807

 Description   

We had a power event last night that impacted Atlas. We're currently unable to mount some of the OSTs as part of the bringup. We tried to mount as ldiskfs and that is not working either.

{{
[root@atlas-oss4b7 ~]# mount -t lustre /dev/mapper/atlas-ddn4b-l43 /tmp/tmpost
mount.lustre: mount /dev/mapper/atlas-ddn4b-l43 at /tmp/tmpost failed: Invalid argument
This may have multiple causes.
Are the mount options correct?
Check the syslog for more info.
[root@atlas-oss4b7 ~]# mount -t ldiskfs /dev/mapper/atlas-ddn4b-l43 /tmp/tmpost
mount: wrong fs type, bad option, bad superblock on /dev/mapper/atlas-ddn4b-l43,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
[root@atlas-oss4b7 ~]# dmesg | tail
[ 1589.056034] LustreError: 51009:0:(osd_handler.c:6365:osd_mount()) atlas2-OST00e6-osd: can't mount /dev/mapper/atlas-ddn4b-l43: -22
[ 1589.081213] LustreError: 51009:0:(obd_config.c:578:class_setup()) setup atlas2-OST00e6-osd failed (-22)
[ 1589.120021] LustreError: 51009:0:(obd_mount.c:203:lustre_start_simple()) atlas2-OST00e6-osd setup error -22
[ 1589.150512] LustreError: 51009:0:(obd_mount_server.c:1764:server_fill_super()) Unable to start osd on /dev/mapper/atlas-ddn4b-l43: -22
[ 1589.190570] LustreError: 51009:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount (-22)
[ 1609.130825] LustreError: 51284:0:(osd_handler.c:6365:osd_mount()) atlas2-OST00e6-osd: can't mount /dev/mapper/atlas-ddn4b-l43: -22
[ 1609.158986] LustreError: 51284:0:(obd_config.c:578:class_setup()) setup atlas2-OST00e6-osd failed (-22)
[ 1609.197816] LustreError: 51284:0:(obd_mount.c:203:lustre_start_simple()) atlas2-OST00e6-osd setup error -22
[ 1609.228156] LustreError: 51284:0:(obd_mount_server.c:1764:server_fill_super()) Unable to start osd on /dev/mapper/atlas-ddn4b-l43: -22
[ 1609.268254] LustreError: 51284:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount (-22)
[root@atlas-oss4b7 ~]#
}}

The lun is still readable:
{{
[root@atlas-oss4b7 ~]# dumpe2fs -h /dev/mapper/atlas-ddn4b-l43 | head
dumpe2fs 1.42.13.wc5 (15-Apr-2016)
Filesystem volume name: atlas2-OST00e6
Last mounted on: /
Filesystem UUID: d4045c22-5929-4c58-bf58-4251bb21e07b
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent mmp flex_bg sparse_super large_file huge_file uninit_bg dir_nlink quota
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
}}

We encountered issues when trying to do an `e2fsck`:

{{
[root@atlas-oss4b7 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l43
e2fsck 1.42.13.wc5 (15-Apr-2016)
Superblock has invalid MMP magic. Fix? no

atlas2-OST00e6: ********** WARNING: Filesystem still has errors **********

}}

Systems are currently down due to this. There are multiple OSTs in this state.

Thanks,

Jesse



 Comments   
Comment by Peter Jones [ 28/May/17 ]

Jesse

From a first pass this seems to indicate that some internal structures have been left in an inconsistent state due to the power event. I have reached out to our experts in this area to see who can most quickly advise.

Peter

Comment by Bruno Faccini (Inactive) [ 28/May/17 ]

Hello Jesse,
Did you post the full dmesg outputs that have occurred during both your Lustre and ldiskfs mount attempts? If not, can you attach the full dmesg?

Also, how many OSTs are impacted ? And did you run "e2fsck -n" on all of them ? And if yes, is "Superblock has invalid MMP magic." the only error ?

Comment by Oleg Drokin [ 28/May/17 ]

also does it mount if you just mount it as mount -t ldiskfs ? If it does work like this, unmount it and try to enable full lustre debug, attempt a lustre mount and then collect lustre debug logs.
Something like this:

echo -1 >/proc/sys/lnet/debug
mount ostdevice /ostmountpoint -t lustre
lctl dk >/tmp/lustrelog.txt
Comment by Jesse Hanley [ 28/May/17 ]

I just tried again and got a different message in dmesg (ldiskfs, then lustre):

{{[44696.245449] LDISKFS-fs (dm-3): INFO: recovery required on readonly filesystem
[44696.263415] LDISKFS-fs (dm-3): write access unavailable, cannot proceed
[44701.394832] LustreError: 61316:0:(osd_handler.c:6365:osd_mount()) atlas2-OST0197-osd: can't mount /dev/mapper/atlas-ddn4f-l51: -13
[44701.425617] LustreError: 61316:0:(obd_config.c:578:class_setup()) setup atlas2-OST0197-osd failed (-13)
[44701.455521] LustreError: 61316:0:(obd_mount.c:203:lustre_start_simple()) atlas2-OST0197-osd setup error -13
[44701.494570] LustreError: 61316:0:(obd_mount_server.c:1764:server_fill_super()) Unable to start osd on /dev/mapper/atlas-ddn4f-l51: -13
[44701.534686] LustreError: 61316:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount (-13)
}}

Collecting the output from the remaining luns, but so far they're consistent:
{{[root@atlas-oss4b1 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l1
e2fsck 1.42.13.wc5 (15-Apr-2016)
Superblock has invalid MMP magic. Fix? no

atlas2-OST00e0: ********** WARNING: Filesystem still has errors **********

[root@atlas-oss4b4 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l22
e2fsck 1.42.13.wc5 (15-Apr-2016)
Superblock has invalid MMP magic. Fix? no

atlas2-OST00e3: ********** WARNING: Filesystem still has errors **********

[root@atlas-oss4b5 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l29
e2fsck 1.42.13.wc5 (15-Apr-2016)
Superblock has invalid MMP magic. Fix? no

atlas2-OST00e4: ********** WARNING: Filesystem still has errors **********

[root@atlas-oss4b7 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l43
e2fsck 1.42.13.wc5 (15-Apr-2016)
Superblock has invalid MMP magic. Fix? no

atlas2-OST00e6: ********** WARNING: Filesystem still has errors **********

[root@atlas-oss4b1 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l2
e2fsck 1.42.13.wc5 (15-Apr-2016)
Superblock has invalid MMP magic. Fix? no

atlas2-OST0170: ********** WARNING: Filesystem still has errors **********
}}

Comment by Jesse Hanley [ 28/May/17 ]

Here's some more:

[root@atlas-oss4b3 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l16
{{ e2fsck 1.42.13.wc5 (15-Apr-2016)}}
{{ Superblock has invalid MMP magic. Fix? no}}atlas2-OST0172: ********** WARNING: Filesystem still has errors **********
{{ [root@atlas-oss4b4 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l23}}
{{ e2fsck 1.42.13.wc5 (15-Apr-2016)}}
{{ Superblock has invalid MMP magic. Fix? no}}atlas2-OST0173: ********** WARNING: Filesystem still has errors **********
{{ [root@atlas-oss4b5 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l30}}
{{ e2fsck 1.42.13.wc5 (15-Apr-2016)}}
{{ Superblock has invalid MMP magic. Fix? no}}atlas2-OST0174: ********** WARNING: Filesystem still has errors **********
{{ [root@atlas-oss4b6 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l37}}
{{ e2fsck 1.42.13.wc5 (15-Apr-2016)}}
{{ Superblock has invalid MMP magic. Fix? no}}atlas2-OST0175: ********** WARNING: Filesystem still has errors **********[root@atlas-oss4f8 ~]# mount -t ldiskfs /dev/mapper/atlas-ddn4f-l51 /tmp/testing
{{ mount: block device /dev/mapper/atlas-ddn4f-l51 is write-protected, mounting read-only}}
{{ mount: wrong fs type, bad option, bad superblock on /dev/mapper/atlas-ddn4f-l51,}}
{{ missing codepage or helper program, or other error}}
{{ In some cases useful info is found in syslog - try}}
{{ dmesg | tail or so}}[root@atlas-oss4f8 ~]# mount -t lustre /dev/mapper/atlas-ddn4f-l51 /tmp/testing
{{ mount.lustre: mount /dev/mapper/atlas-ddn4f-l51 at /tmp/testing failed: Permission denied}}
{{ [45594.893488] LustreError: 83174:0:(osd_handler.c:6365:osd_mount()) atlas2-OST0197-osd: can't mount /dev/mapper/atlas-ddn4f-l51: -13}}
{{ [45594.925046] LustreError: 83174:0:(obd_config.c:578:class_setup()) setup atlas2-OST0197-osd failed (-13)}}
{{ [45594.955601] LustreError: 83174:0:(obd_mount.c:203:lustre_start_simple()) atlas2-OST0197-osd setup error -13}}
{{ [45594.985770] LustreError: 83174:0:(obd_mount_server.c:1764:server_fill_super()) Unable to start osd on /dev/mapper/atlas-ddn4f-l51: -13}}
{{ [45595.025817] LustreError: 83174:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount (-13)}}[root@atlas-oss4f8 ~]# e2fsck -n /dev/mapper/atlas-ddn4f-l51
{{ e2fsck 1.42.13.wc5 (15-Apr-2016)}}
{{ MMP interval is 10 seconds and total wait time is 42 seconds. Please wait...}}
{{ Warning: skipping journal recovery because doing a read-only filesystem check.}}
{{ atlas2-OST0197: clean, 1495476/29343744 files, 1860393612/3755999232 blocks}}
{{ [root@atlas-oss4b1 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l3}}
{{ e2fsck 1.42.13.wc5 (15-Apr-2016)}}
{{ Superblock has invalid MMP magic. Fix? no}}atlas2-OST0200: ********** WARNING: Filesystem still has errors **********
{{ [root@atlas-oss4b2 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l10}}
{{ e2fsck 1.42.13.wc5 (15-Apr-2016)}}
{{ Superblock has invalid MMP magic. Fix? no}}atlas2-OST0201: ********** WARNING: Filesystem still has errors **********
{{ [root@atlas-oss4b5 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l31}}
{{ e2fsck 1.42.13.wc5 (15-Apr-2016)}}
{{ Superblock has invalid MMP magic. Fix? no}}atlas2-OST0204: ********** WARNING: Filesystem still has errors **********
{{ [root@atlas-oss4b1 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l4}}
{{ e2fsck 1.42.13.wc5 (15-Apr-2016)}}
{{ Superblock has invalid MMP magic. Fix? no}}atlas2-OST0290: ********** WARNING: Filesystem still has errors **********
{{ [root@atlas-oss4b4 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l25}}
{{ e2fsck 1.42.13.wc5 (15-Apr-2016)}}
{{ Superblock has invalid MMP magic. Fix? no}}atlas2-OST0293: ********** WARNING: Filesystem still has errors **********
{{ [root@atlas-oss4f4 ~]# mount -t ldiskfs /dev/mapper/atlas-ddn4f-l25 /tmp/testmount/}}
{{ mount: block device /dev/mapper/atlas-ddn4f-l25 is write-protected, mounting read-only}}
{{ mount: wrong fs type, bad option, bad superblock on /dev/mapper/atlas-ddn4f-l25,}}
{{ missing codepage or helper program, or other error}}
{{ In some cases useful info is found in syslog - try}}
{{ dmesg | tail or so}}[root@atlas-oss4f4 ~]# mount -t lustre /dev/mapper/atlas-ddn4f-l25 /tmp/testmount/
{{ mount.lustre: mount /dev/mapper/atlas-ddn4f-l25 at /tmp/testmount failed: Permission denied}}
{{ [root@atlas-oss4f4 ~]# dmesg | tail}}
{{ [44836.958507] LustreError: Skipped 7583 previous similar messages}}
{{ [45437.169759] LustreError: 137-5: atlas2-OST0197_UUID: not available for connect from 126@gni4 (no target). If you are running an HA pair check that the target is mounted on the other server.}}
{{ [45437.220944] LustreError: Skipped 7556 previous similar messages}}
{{ [45881.960245] LDISKFS-fs (dm-4): INFO: recovery required on readonly filesystem}}
{{ [45881.975748] LDISKFS-fs (dm-4): write access unavailable, cannot proceed}}
{{ [45889.482829] LustreError: 84427:0:(osd_handler.c:6365:osd_mount()) atlas2-OST02b3-osd: can't mount /dev/mapper/atlas-ddn4f-l25: -13}}
{{ [45889.509308] LustreError: 84427:0:(obd_config.c:578:class_setup()) setup atlas2-OST02b3-osd failed (-13)}}
{{ [45889.539407] LustreError: 84427:0:(obd_mount.c:203:lustre_start_simple()) atlas2-OST02b3-osd setup error -13}}
{{ [45889.578254] LustreError: 84427:0:(obd_mount_server.c:1764:server_fill_super()) Unable to start osd on /dev/mapper/atlas-ddn4f-l25: -13}}
{{ [45889.618048] LustreError: 84427:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount (-13)}}
{{ [root@atlas-oss4f4 ~]# e2fsck -n /dev/mapper/atlas-ddn4f-l25}}
{{ e2fsck 1.42.13.wc5 (15-Apr-2016)}}
{{ MMP interval is 10 seconds and total wait time is 42 seconds. Please wait...}}
{{ Warning: skipping journal recovery because doing a read-only filesystem check.}}
{{ atlas2-OST02b3: clean, 1455480/29343744 files, 1955258331/3755999232 blocks}}[root@atlas-oss4i4 ~]# mount -t lustre /dev/mapper/atlas-ddn4i-l25 /tmp/testmount
{{ mount.lustre: mount /dev/mapper/atlas-ddn4i-l25 at /tmp/testmount failed: Permission denied}}
{{ [root@atlas-oss4i4 ~]# dmesg | tail}}
{{ [44848.274048] LustreError: Skipped 3515 previous similar messages}}
{{ [45448.824340] LustreError: 137-5: atlas2-OST02cb_UUID: not available for connect from 161@gni4 (no target). If you are running an HA pair check that the target is mounted on the other server.}}
{{ [45448.871994] LustreError: Skipped 3503 previous similar messages}}
{{ [45947.748609] LDISKFS-fs (dm-0): INFO: recovery required on readonly filesystem}}
{{ [45947.761293] LDISKFS-fs (dm-0): write access unavailable, cannot proceed}}
{{ [45952.060037] LustreError: 89095:0:(osd_handler.c:6365:osd_mount()) atlas2-OST02cb-osd: can't mount /dev/mapper/atlas-ddn4i-l25: -13}}
{{ [45952.083629] LustreError: 89095:0:(obd_config.c:578:class_setup()) setup atlas2-OST02cb-osd failed (-13)}}
{{ [45952.122171] LustreError: 89095:0:(obd_mount.c:203:lustre_start_simple()) atlas2-OST02cb-osd setup error -13}}
{{ [45952.152586] LustreError: 89095:0:(obd_mount_server.c:1764:server_fill_super()) Unable to start osd on /dev/mapper/atlas-ddn4i-l25: -13}}
{{ [45952.192600] LustreError: 89095:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount (-13)}}
{{ [root@atlas-oss4i4 ~]# e2fsck -n /dev/mapper/atlas-ddn4i-l25}}
{{ e2fsck 1.42.13.wc5 (15-Apr-2016)}}
{{ MMP interval is 10 seconds and total wait time is 42 seconds. Please wait...}}
{{ Warning: skipping journal recovery because doing a read-only filesystem check.}}
{{ atlas2-OST02cb: clean, 1385323/29343744 files, 1952052226/3755999232 blocks}}

Comment by Jesse Hanley [ 28/May/17 ]

And the remaining 3:

 

[root@atlas-oss4b1 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l5
e2fsck 1.42.13.wc5 (15-Apr-2016)
Superblock has invalid MMP magic. Fix? no

atlas2-OST0320: ********** WARNING: Filesystem still has errors **********

[root@atlas-oss4b1 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l6
e2fsck 1.42.13.wc5 (15-Apr-2016)
Superblock has invalid MMP magic. Fix? no

atlas2-OST03b0: ********** WARNING: Filesystem still has errors **********
[root@atlas-oss4b4 ~]# e2fsck -n /dev/mapper/atlas-ddn4b-l27
e2fsck 1.42.13.wc5 (15-Apr-2016)
Superblock has invalid MMP magic. Fix? no

atlas2-OST03b3: ********** WARNING: Filesystem still has errors **********

Comment by Jesse Hanley [ 28/May/17 ]

Oleg, I'm unable to mount it ldiskfs.  It looks like something happened to the superblock containing the multimount protection info.

Comment by Oleg Drokin [ 28/May/17 ]

Is the LUN actually write-accessible now?

"mount: block device /dev/mapper/atlas-ddn4f-l25 is write-protected, mounting read-only" just sounds like the block device itself is read-only, above the mmp check.

Comment by Jesse Hanley [ 28/May/17 ]

Oleg, I'm only seeing that message on 3 of the luns.  I'm investigating at the block layer, but they look healthy at the moment.

 

 

Comment by Jian Yu [ 28/May/17 ]

Hi Jesse,
The following command can be used to reset the MMP block back to the clean state:

# tune2fs -f -E clear-mmp <device>
Comment by Andreas Dilger [ 28/May/17 ]

I assume that you are 100% sure the LUNs are not mounted on the backup OSS nodes? I find it very odd that there is a bad MMP magic on all of the OSTs. It seems either the kernel or e2fsck have an incorrect value for the MMP magic, rather than them all being corrupted, unless you also can't dump any of them with dumpe2fs.

One thing to try is "debugfs -c -R 'dump_mmp' /dev/XXX". This will dump the MMP block that is listed in the superblock. You could verify that block number is correct by checking the dumpe2fs output for a few different superblocks using "-b" with values 32768 * {3,5,7}^n.

Comment by Jesse Hanley [ 28/May/17 ]

Yes, these are not mounted on the backups.  We had quite a power event last night.  I tried on one OST and it was able to successfully mount.  I'll check the MMP block on a few more OSTs before running the tune2fs command.  Thanks guys.

Comment by Jesse Hanley [ 28/May/17 ]

Doesn't look like I can list that block:

 

[root@atlas-oss4b6 ~]# debugfs -c -R 'dump_mmp' /dev/mapper/atlas-ddn4b-l37
debugfs 1.42.13.wc5 (15-Apr-2016)
/dev/mapper/atlas-ddn4b-l37: catastrophic mode - not reading inode or group bitmaps
dump_mmp: MMP: invalid magic number reading MMP block.

 

[root@atlas-oss4b6 ~]# dumpe2fs -h /dev/mapper/atlas-ddn4b-l37 | grep -i mmp
dumpe2fs 1.42.13.wc5 (15-Apr-2016)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent mmp flex_bg sparse_super large_file huge_file uninit_bg dir_nlink quota
MMP block number: 5638
MMP update interval: 5
[root@atlas-oss4b6 ~]#

Comment by Andreas Dilger [ 28/May/17 ]

I'd like to get a dump of a couple of the MMP blocks to see what the magic is before you clear them. Either the MMP blocks are garbage (maybe because of loss of cache in the DDN?), or the superblock is garbage (pointing to a bad MMP block), or because there is a bug in either ldiskfs or e2fsck with the wrong magic value. I can't do anything about the cache issues, but I want to eliminate the chance of the last issue, unlikely as it is.

Comment by Jesse Hanley [ 28/May/17 ]

Hey Andreas,

 

Any pointers or command examples you could point me to?  I typically just use dumpe2fs for the header info, so I'm unfamiliar with the superblock options.

Comment by Andreas Dilger [ 28/May/17 ]

Can you then "dd if=/dev/XXX of=/tmp/mmp.ostname bs=4096 skip=5638 count=1" (and similar for a couple other OSTs). Please check if the MMP block is reported to be the same in multiple superblocks and is not corrupted (though 5638 looks like a sane block number).

Ideally you should also run a full e2fsck on those filesystems. It is possible that this was a loss of the MMP block because of loss of cache on the DDN? There may still be other inconsistencies in the filesystem due to other cached writes being lost. I guess this means there is no UPS on the DDNs, that would have allowed them to flush their cache?

Comment by Andreas Dilger [ 28/May/17 ]

To debug the MMP block contents, you would need to run "od -Ax -tx4 -a /tmp/mmp.ostname" and compare that to struct mmp_struct:

#define EXT4_MMP_MAGIC     0x004D4D50U /* ASCII for MMP */
#define EXT4_MMP_SEQ_CLEAN 0xFF4D4D50U /* mmp_seq value for clean unmount */
#define EXT4_MMP_SEQ_FSCK  0xE24D4D50U /* mmp_seq value when being fscked */
#define EXT4_MMP_SEQ_MAX   0xE24D4D4FU /* maximum valid mmp_seq value */

struct mmp_struct {
	__le32	mmp_magic;		/* Magic number for MMP */
	__le32	mmp_seq;		/* Sequence no. updated periodically */

	/*
	 * mmp_time, mmp_nodename & mmp_bdevname are only used for information
	 * purposes and do not affect the correctness of the algorithm
	 */
	__le64	mmp_time;		/* Time last updated */
	char	mmp_nodename[64];	/* Node which last updated MMP block */
	char	mmp_bdevname[32];	/* Bdev which last updated MMP block */

	/*
	 * mmp_check_interval is used to verify if the MMP block has been
	 * updated on the block device. The value is updated based on the
	 * maximum time to write the MMP block during an update cycle.
	 */
	__le16	mmp_check_interval;

	__le16	mmp_pad1;
	__le32	mmp_pad2[226];
	__le32	mmp_checksum;		/* crc32c(uuid+mmp_block) */
};
Comment by Jesse Hanley [ 28/May/17 ]
 
0000000   @   ;   9 can nul nul nul soh soh   ]   V   } stx nul ack   7
           3bc0    9839    0000    0100    5d01    7dd6    0002    3706
               98393bc0        01000000        7dd65d01        37060002
          @   ;   9 can nul nul nul soh soh   ]   V   } stx nul ack   7
0000020 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
           0000    0000    0000    0000    0000    0000    0000    0000
               00000000        00000000        00000000        00000000
        nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
0000040 nul nul nul nul   ` nul   %  si nul nul nul stx dc1 nul  ff ack
           0000    0000    8060    0f25    0000    0200    0091    068c
               00000000        0f258060        02000000        068c0091
        nul nul nul nul   ` nul   %  si nul nul nul stx dc1 nul  ff ack
0000060 nul nul nul stx nul nul nul nul nul nul nul stx etx nul nul   C
           0000    0200    0000    0000    0000    0200    0003    4300
               02000000        00000000        02000000        43000003
        nul nul nul stx nul nul nul nul nul nul nul stx etx nul nul   C
0000100 nul nul nul stx nul nul nul  cr nul nul nul stx nul nul nul  ht
           0000    0200    0000    0d00    0000    0200    0000    0900
               02000000        0d000000        02000000        09000000
        nul nul nul stx nul nul nul  cr nul nul nul stx nul nul nul  ht
0000120 nul nul nul stx stx nul soh   C nul nul nul stx stx nul ack   j
           0000    0200    0002    4301    0000    0200    0002    ea06
               02000000        43010002        02000000        ea060002
        nul nul nul stx stx nul soh   C nul nul nul stx stx nul ack   j
0000140 nul nul nul stx dc1 nul syn etx nul nul nul stx stx nul ack  em
           0000    0200    0091    0396    0000    0200    0002    9906
               02000000        03960091        02000000        99060002
        nul nul nul stx dc1 nul syn etx nul nul nul stx stx nul ack  em
0000160 nul nul nul stx   ` nul sub   } nul nul nul stx stx nul soh   I
           0000    0200    0060    7d1a    0000    0200    0002    4901
               02000000        7d1a0060        02000000        49010002
        nul nul nul stx   ` nul sub   } nul nul nul stx stx nul soh   I
0000200 nul nul nul stx stx nul ack   T nul nul nul stx   ` nul   #   x
           0000    0200    0002    d406    0000    0200    8060    f823
               02000000        d4060002        02000000        f8238060
        nul nul nul stx stx nul ack   T nul nul nul stx   ` nul   #   x
0000220 nul nul nul stx stx nul soh   M nul nul nul stx stx nul ack   m
           0000    0200    0002    4d01    0000    0200    0002    ed06
               02000000        4d010002        02000000        ed060002
        nul nul nul stx stx nul soh   M nul nul nul stx stx nul ack   m
0000240 nul nul nul stx   ` nul   ,   c nul nul nul stx etx nul nul etb
           0000    0200    0060    e32c    0000    0200    0003    9700
               02000000        e32c0060        02000000        97000003
        nul nul nul stx   ` nul   ,   c nul nul nul stx etx nul nul etb
0000260 nul nul nul stx nul nul nul  so nul nul nul stx stx nul soh   N
           0000    0200    0000    0e00    0000    0200    0002    4e01
               02000000        0e000000        02000000        4e010002
        nul nul nul stx nul nul nul  so nul nul nul stx stx nul soh   N
0000300 nul nul nul stx stx nul ack   S nul nul nul stx   ` nul  sp   C
           0000    0200    0002    d306    0000    0200    8060    4320
               02000000        d3060002        02000000        43208060
        nul nul nul stx stx nul ack   S nul nul nul stx   ` nul  sp   C
0000320 nul nul nul stx etx nul nul esc nul nul nul stx stx nul ack   C
           0000    0200    0003    9b00    0000    0200    0002    4306
               02000000        9b000003        02000000        43060002
        nul nul nul stx etx nul nul esc nul nul nul stx stx nul ack   C
0000340 nul nul nul stx  em nul nul soh nul nul nul stx nul nul nul   f
           0000    0200    0019    8100    0000    0200    0000    6600
               02000000        81000019        02000000        66000000
        nul nul nul stx  em nul nul soh nul nul nul stx nul nul nul   f
0000360 nul nul nul stx dc1 nul sub  vt nul nul nul stx stx nul ack   :
           0000    0200    0091    0b9a    0000    0200    0002    3a06
               02000000        0b9a0091        02000000        3a060002
        nul nul nul stx dc1 nul sub  vt nul nul nul stx stx nul ack   :
0000400 nul nul nul stx nul dle   _   l nul nul nul stx stx nul ack   I
           0000    0200    1000    ec5f    0000    0200    0002    4906
               02000000        ec5f1000        02000000        49060002
        nul nul nul stx nul dle   _   l nul nul nul stx stx nul ack   I
0000420 nul nul nul stx dc1 nul   " etx nul nul nul stx stx nul ack dc4
           0000    0200    0091    03a2    0000    0200    0002    9406
               02000000        03a20091        02000000        94060002
        nul nul nul stx dc1 nul   " etx nul nul nul stx stx nul ack dc4
0000440 nul nul nul stx etx nul nul   B nul nul nul stx dc1 nul syn eot
           0000    0200    0003    4200    0000    0200    0091    0496
               02000000        42000003        02000000        04960091
        nul nul nul stx etx nul nul   B nul nul nul stx dc1 nul syn eot
0000460 nul nul nul  nl nul nul nul nul nul nul nul nul nul nul nul nul
           0000    0a00    0000    0000    0000    0000    0000    0000
               0a000000        00000000        00000000        00000000
        nul nul nul  nl nul nul nul nul nul nul nul nul nul nul nul nul
0000500 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
           0000    0000    0000    0000    0000    0000    0000    0000
               00000000        00000000        00000000        00000000
        nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
*
0010000
Comment by Jesse Hanley [ 28/May/17 ]

There are UPSs on the DDNs, but we're not sure the extent of the power outage was. Some of them went down hard.

Comment by Jesse Hanley [ 28/May/17 ]
[root@atlas-oss4b1 ~]# dumpe2fs -h /dev/mapper/atlas-ddn4b-l4 | grep -i mmp
dumpe2fs 1.42.13.wc5 (15-Apr-2016)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent mmp flex_bg sparse_super large_file huge_file uninit_bg dir_nlink quota
MMP block number:         5638
MMP update interval:      5
[root@atlas-oss4b1 ~]# dd if=/dev/mapper/atlas-ddn4b-l4 of=/tmp/mmp.ost656 bs=4096 skip=5638 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000219282 s, 18.7 MB/s
[root@atlas-oss4b1 ~]# od -ax -tx4 -a /tmp/mmp.ost656
0000000   z  em   [ nul dle nul ack soh   9   6   8   6   0   5 dle nul
           19fa    005b    0010    0106    3639    3638    3530    0010
               005b19fa        01060010        36383639        00103530
          z  em   [ nul dle nul ack soh   9   6   8   6   0   5 dle nul
0000020   u   z eot nul dle nul  bs soh   3   5   2   3   4   9   4   1
           7af5    0004    0010    0108    3533    3332    3934    3134
               00047af5        01080010        33323533        31343934
          u   z eot nul dle nul  bs soh   3   5   2   3   4   9   4   1
0000040   1  so   ; nul dle nul  bs soh   3   5   9   2   1   7   2   5
           0eb1    003b    0010    0108    3533    3239    3731    3532
               003b0eb1        01080010        32393533        35323731
          1  so   ; nul dle nul  bs soh   3   5   9   2   1   7   2   5
0000060   ;   J   ) nul dle nul  bs soh   4   3   9   5   6   9   8   9
           4abb    0029    0010    0108    3334    3539    3936    3938
               00294abb        01080010        35393334        39383936
          ;   J   ) nul dle nul  bs soh   4   3   9   5   6   9   8   9
0000100   `   0 stx nul dle nul  bs soh   4   6   1   7   5   3   2   5
           b0e0    0002    0010    0108    3634    3731    3335    3532
               0002b0e0        01080010        37313634        35323335
          `   0 stx nul dle nul  bs soh   4   6   1   7   5   3   2   5
0000120 ack   :  em nul dle nul  bs soh   4   5   2   3   4   7   1   7
           3a06    0019    0010    0108    3534    3332    3734    3731
               00193a06        01080010        33323534        37313734
        ack   :  em nul dle nul  bs soh   4   5   2   3   4   7   1   7
0000140   p  ff   ` nul dle nul bel soh   1   0   6   5   8   8   5 nul
           0c70    0060    0010    0107    3031    3536    3838    0035
               00600c70        01070010        35363031        00353838
          p  ff   ` nul dle nul bel soh   1   0   6   5   8   8   5 nul
0000160   z soh stx nul dle nul bel soh   2   4   8   0   5   4   1   8
           01fa    0002    0010    0107    3432    3038    3435    3831
               000201fa        01070010        30383432        38313435
          z soh stx nul dle nul bel soh   2   4   8   0   5   4   1   8
0000200   ;   (  gs nul dle nul  bs soh   3   9   8   3   6   9   8   9
           283b    009d    0010    0108    3933    3338    3936    3938
               009d283b        01080010        33383933        39383936
          ;   (  gs nul dle nul  bs soh   3   9   8   3   6   9   8   9
0000220   u etx   " nul dle nul  bs soh   4   0   0   0   2   3   0   1
           03f5    00a2    0010    0108    3034    3030    3332    3130
               00a203f5        01080010        30303034        31303332
          u etx   " nul dle nul  bs soh   4   0   0   0   2   3   0   1
0000240   x   a  em nul dle nul  bs soh   4   5   4   4   6   5   2   5
           6178    0019    0010    0108    3534    3434    3536    3532
               00196178        01080010        34343534        35323536
          x   a  em nul dle nul  bs soh   4   5   4   4   6   5   2   5
0000260  em  fs  rs nul dle nul bel soh   2   9   2   9   3   7   3 nul
           1c19    001e    0010    0107    3932    3932    3733    0033
               001e1c19        01070010        39323932        00333733
         em  fs  rs nul dle nul bel soh   2   9   2   9   3   7   3 nul
0000300   ?   < stx nul dle nul  bs soh   4   6   2   5   9   1   9   7
           bcbf    0002    0010    0108    3634    3532    3139    3739
               0002bcbf        01080010        35323634        37393139
          ?   < stx nul dle nul  bs soh   4   6   2   5   9   1   9   7
0000320   [ nak etx nul dle nul  bs soh   4   6   5   4   1   1   8   1
           15db    0003    0010    0108    3634    3435    3131    3138
               000315db        01080010        34353634        31383131
          [ nak etx nul dle nul  bs soh   4   6   5   4   1   1   8   1
0000340 esc soh enq nul dle nul bel soh   2   6   7   3   8   5   3   9
           019b    0005    0010    0107    3632    3337    3538    3933
               0005019b        01070010        33373632        39333538
        esc soh enq nul dle nul bel soh   2   6   7   3   8   5   3   9
0000360   F dc3   \ nul dle nul  bs soh   3   9   5   3   0   0   4   5
           13c6    005c    0010    0108    3933    3335    3030    3534
               005c13c6        01080010        33353933        35343030
          F dc3   \ nul dle nul  bs soh   3   9   5   3   0   0   4   5
0000400   S  vt etx nul dle nul  bs soh   4   7   4   0   7   6   1   3
           8b53    0003    0010    0108    3734    3034    3637    3331
               00038b53        01080010        30343734        33313637
          S  vt etx nul dle nul  bs soh   4   7   4   0   7   6   1   3
0000420   <   " etx nul dle nul  bs soh   4   6   5   4   3   1   6   5
           22bc    0003    0010    0108    3634    3435    3133    3536
               000322bc        01080010        34353634        35363133
          <   " etx nul dle nul  bs soh   4   6   5   4   3   1   6   5
...
...
...

There's quite a bit more to that one if you want me to post these somewhere.

Comment by Jesse Hanley [ 28/May/17 ]

I've collected this on ~7 of the OSTs so far. Is this something we can investigate at a later point? I'd like to run the tune2fs command and return these OSTs to service if possible.

Comment by Andreas Dilger [ 28/May/17 ]

By all means, run tune2fs and reset the MMP block. It looks like the MMP block is total garbage from what I can see.

A valid MMP block should have "MMP" in the first word, since that is part of the magic value.

Comment by Andreas Dilger [ 28/May/17 ]

If I had to guess, it looks like the MMP block was overwritten by a directory leaf block. I'm not sure why that would happen, since the MMP block is allocated once at filesystem format time, and then never changed again, so it should never be used for anything else.

Comment by Jesse Hanley [ 28/May/17 ]

The good news is that it looks like the tune2fs command fixed most of the OSTs. I'm debugging the remaining 3 OSTs (the ones that appear to be read only).

[root@atlas-oss4i4 ~]# dumpe2fs -h /dev/mapper/atlas-ddn4i-l25 | head
dumpe2fs 1.42.13.wc5 (15-Apr-2016)
Filesystem volume name:   atlas2-OST02cb
Last mounted on:          /
Filesystem UUID:          3f800de9-de2d-4cc3-b820-c01a906d8054
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent mmp flex_bg sparse_super large_file huge_file uninit_bg dir_nlink quota
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
[root@atlas-oss4i4 ~]# tune2fs -f -E clear-mmp /dev/mapper/atlas-ddn4i-l25
tune2fs 1.42.13.wc5 (15-Apr-2016)
tune2fs: Operation not permitted while trying to open /dev/mapper/atlas-ddn4i-l25
Couldn't find valid filesystem superblock.
Comment by Andreas Dilger [ 28/May/17 ]

Can you see if the block device is writable at all? The first 1KB of the block device is typically unused (unless you have a DOS partition, which I don't think you do), so you could do something like:

dd if=/dev/XXX of=/tmp/sector bs=512 count=1
dd of=/dev/XXX if=/tmp/sector bs=512 count=1

to copy and then rewrite the first sector of the drive (I'm assuming you have 512-byte sector devices). If you get an error from the second "dd" then there is a problem with the controller or device. It may be that a reboot would help that, but that is just a guess.

Comment by Andreas Dilger [ 28/May/17 ]

Are the read-only OSTs all attached to the same DDN controller?

Comment by Jesse Hanley [ 28/May/17 ]

It looks like that OST is presenting as read-only. I'm going to give the controller a reboot.

[root@atlas-oss4i4 ~]# dd of=/dev/mapper/atlas-ddn4i-l25 if=/root/jahanley/20170528/atlas-ddn4i-l25.sector bs=512 count=1
dd: writing `/dev/mapper/atlas-ddn4i-l25': Operation not permitted
1+0 records in
0+0 records out
0 bytes (0 B) copied, 6.3729e-05 s, 0.0 kB/s

It's 3 luns across 2 controllers. Thanks Andreas.

Comment by Jesse Hanley [ 28/May/17 ]

That didn't help the read-only part so I'm continuing to look for the cause while I engage DDN.

Comment by Andreas Dilger [ 28/May/17 ]

Did you also reboot the OSS?

Comment by Jesse Hanley [ 28/May/17 ]

Doing that now.

Comment by Oleg Drokin [ 28/May/17 ]

if it's still readonly after reboot - see the kernel messages for a possible explanation why. I am not sure what ddn-level diagnostic messages you might get too.
Also since this is a multiported setup so you might want to see if the failover node has the access?

Comment by Jesse Hanley [ 28/May/17 ]

The reboot was able to restore connectivity to the LUNs. We've got a couple OSTs that we're going to run e2fsck's against. I'll send an update when those finish. Thank you all so much for the help.

Comment by Jesse Hanley [ 28/May/17 ]

Atlas2 is back up! Thanks for the pointers and help everyone.

Comment by Peter Jones [ 28/May/17 ]

That's excellent news Jesse. Do you need any further analysis or can we close out this ticket?

Comment by Jesse Hanley [ 28/May/17 ]

I think we're good. Please resolve it. Thanks again.

Comment by Oleg Drokin [ 28/May/17 ]

Did you determine if DDN lost the dirty caches or not?
If all caches were flushed correctly then technically speaking there sould be no fs inconsistencies so it would be interesting to see those fsck logs to see what was broken and how.

mmp block does not go though the journal so I imagine that one is off limits, but the rest of metadata is what is interesting.

Comment by Andreas Dilger [ 28/May/17 ]

It is true that the MMP block avoids the journal, but this is on purpose to avoid blocking MMP writes to disk if there is a bug in the journal code, or it is overloaded.

That said, since the MMP block is written continuously to the same location on disk, at least an old version of that data should be there on disk instead of garbage (or a directory block, as it appears).

I vaguely recall that DDN may have optimized the MMP block write so that it just goes to cache, to avoid causing a full stroke disk seek every second or five, but even then it should have made it out to disk at some point since the filesystem was mounted, or at least at format time.

Comment by Peter Jones [ 28/May/17 ]

ok I will close the ticket but we can always chat in person this week if there are any more open questions

Generated at Sat Feb 10 02:27:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.