[LU-4450] Not able to mount mdt device Created: 07/Jan/14  Updated: 14/Jan/14  Resolved: 14/Jan/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Mahmoud Hanafi Assignee: Niu Yawei (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 4
Rank (Obsolete): 12200

 Description   

I ran
tune2fs -O ^quota /dev/nbp7-vg/mdt7
tune2fs -O quota /dev/nbp7-vg/mdt7

After which the mdt device is not mounting.

nbp7-mds1 login: LDISKFS-fs (dm-1): recovery complete
LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts:
LDISKFS-fs (dm-2): warning: mounting fs with errors, running e2fsck is recommended
LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts:
LustreError: 137-5: nbp7-MDT0000_UUID: not available for connect from 10.151.51.135@o2ib (no target)
LustreError: 137-5: nbp7-MDT0000_UUID: not available for connect from 10.151.32.217@o2ib (no target)
LustreError: Skipped 3 previous similar messages
Lustre: nbp7-MDT0000: Not available for connect from 10.151.56.154@o2ib (not set up)
Lustre: nbp7-MDT0000: Not available for connect from 10.151.29.224@o2ib (not set up)
Lustre: Skipped 9 previous similar messages
Lustre: nbp7-MDT0000: used disk, loading
Lustre: 5049:0:(mdt_handler.c:4960:mdt_process_config()) For interoperability, skip this mdt.group_upcall. It is obsolete.
Lustre: 5049:0:(mdt_handler.c:4960:mdt_process_config()) For interoperability, skip this mdt.quota_type. It is obsolete.
LDISKFS-fs error (device dm-2): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 20corrupted: 2107 blocks free in bitmap, 2105 - in gd

Aborting journal on device dm-2-8.
LDISKFS-fs (dm-2): Remounting filesystem read-only
LDISKFS-fs error (device dm-2) in ldiskfs_free_blocks: IO failure
LDISKFS-fs error (device dm-2) in ldiskfs_free_blocks: Journal has aborted
LDISKFS-fs error (device dm-2) in ldiskfs_free_blocks: Journal has aborted
LDISKFS-fs error (device dm-2) in ldiskfs_free_blocks: Journal has aborted
LDISKFS-fs error (device dm-2) in ldiskfs_free_blocks: Journal has aborted
LDISKFS-fs error (device dm-2) in ldiskfs_free_blocks: Journal has aborted
LDISKFS-fs error (device dm-2) in ldiskfs_free_blocks: Journal has aborted
LDISKFS-fs error (device dm-2) in ldiskfs_free_blocks: Journal has aborted
LDISKFS-fs error (device dm-2) in ldiskfs_free_blocks: Journal has aborted
LDISKFS-fs error (device dm-2) in ldiskfs_free_blocks: Journal has aborted
LDISKFS-fs error (device dm-2) in ldiskfs_reserve_inode_write: Journal has aborted
LDISKFS-fs error (device dm-2) in ldiskfs_truncate: Journal has aborted
LDISKFS-fs error (device dm-2) in ldiskfs_reserve_inode_write: Journal has aborted
LDISKFS-fs error (device dm-2) in ldiskfs_orphan_del: Journal has aborted
LDISKFS-fs error (device dm-2) in ldiskfs_reserve_inode_write: Journal has aborted
LustreError: 5049:0:(llog.c:159:llog_cancel_rec()) nbp7-OST0020-osc-MDT0000: fail to write header for llog #0x2:1#00000000: rc = -30
LustreError: 4960:0:(osd_handler.c:738:osd_trans_commit_cb()) transaction @0xffff880ff9a32ac0 commit error: 2
LustreError: 5049:0:(osp_sync.c:1031:osp_sync_init()) nbp7-OST0020-osc-MDT0000: can't initialize llog: rc = -30
LustreError: 5049:0:(obd_config.c:572:class_setup()) setup nbp7-OST0020-osc-MDT0000 failed (-30)
LustreError: 5049:0:(obd_config.c:1550:class_config_llog_handler()) MGC10.151.27.38@o2ib: cfg command failed: rc = -30
Lustre: cmd=cf003 0:nbp7-OST0020-osc-MDT0000 1:nbp7-OST0020_UUID 2:10.151.27.45@o2ib
LustreError: 15c-8: MGC10.151.27.38@o2ib: The configuration from log 'nbp7-MDT0000' failed (-30). This may be the result of communication errors between this node and the MGS, a b.
LustreError: 4950:0:(obd_mount_server.c:1253:server_start_targets()) failed to start server nbp7-MDT0000: -30
LustreError: 4950:0:(obd_mount_server.c:1695:server_fill_super()) Unable to start targets: -30
LustreError: 4950:0:(obd_mount_server.c:844:lustre_disconnect_lwp()) nbp7-MDT0000-lwp-MDT0000: Can't end config log nbp7-client.
LustreError: 4950:0:(obd_mount_server.c:1422:server_put_super()) nbp7-MDT0000: failed to disconnect lwp. (rc=-2)
Lustre: Failing over nbp7-MDT0000
Lustre: nbp7-MDT0000: Not available for connect from 10.151.43.177@o2ib (stopping)
Lustre: Skipped 151 previous similar messages
LustreError: 137-5: nbp7-MDT0000_UUID: not available for connect from 10.151.32.62@o2ib (no target)
LustreError: Skipped 81 previous similar messages
VFS: cannot write quota structure on device dm-2 (error -30). Quota may get out of sync!
VFS: cannot write quota structure on device dm-2 (error -30). Quota may get out of sync!
LDISKFS-fs error (device dm-2): ldiskfs_put_super: Couldn't clean up the journal
Lustre: server umount nbp7-MDT0000 complete

Please advice next step.



 Comments   
Comment by Mahmoud Hanafi [ 07/Jan/14 ]

SHOULD WE GO HEAD AND RUN FSCK

DRY RUN FSCK OUTPUT FOLLOWS
nbp7-mds1 ~ # e2fsck -v -n /dev/nbp7-vg/mdt7
e2fsck 1.42.7.wc1 (12-Apr-2013)
Warning: skipping journal recovery because doing a read-only filesystem check.
nbp7-MDT0000 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 223551113 has zero dtime. Fix? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -714837 -255423491
Fix? no

Free blocks count wrong for group #0 (2880, counted=2888).
Fix? no

Free blocks count wrong for group #20 (2105, counted=2107).
Fix? no

Free blocks count wrong for group #15590 (16382, counted=16384).
Fix? no

Free blocks count wrong for group #15595 (16383, counted=16384).
Fix? no

Free blocks count wrong for group #15610 (16383, counted=16384).
Fix? no

Free blocks count wrong for group #15612 (16383, counted=16384).
Fix? no

Free blocks count wrong (197822759, counted=197822774).
Fix? no

Inode bitmap differences: -223551113
Fix? no

nbp7-MDT0000: ********** WARNING: Filesystem still has errors **********

53143594 inodes used (9.90%, out of 536870912)
542 non-contiguous files (0.0%)
35450 non-contiguous directories (0.1%)

  1. of inodes with ind/dind/tind blocks: 8529/37/0
    70612697 blocks used (26.31%, out of 268435456)
    0 bad blocks
    8072 large files

51906395 regular files
437044 directories
0 character device files
0 block device files
0 fifos
1961 links
800143 symbolic links (360530 fast symbolic links)
2 sockets
------------
53145545 files

Comment by Oleg Drokin [ 07/Jan/14 ]

Yes, safe to run e2fsck.

Comment by Mahmoud Hanafi [ 07/Jan/14 ]

ran e2fsck it fixed the issue. Close case.

Comment by Peter Jones [ 07/Jan/14 ]

Thanks Mahmoud. Before we close this ticket we should probably assess whether there is some kind of bug in tunefs that we should address.

Niu, could you please comment on this?

Comment by Niu Yawei (Inactive) [ 08/Jan/14 ]

"tune2fs -O quota" command just use some standard interface to unlink/create/write quota files, it's unlikely that it could corrupt the block bitmap.

Comment by John Fuchs-Chesney (Inactive) [ 08/Jan/14 ]

Can I mark this as resolved? Thanks.

Comment by Mahmoud Hanafi [ 08/Jan/14 ]

I think so. The block bitmap errors were most likely there before we ran the quota options.

Comment by Peter Jones [ 14/Jan/14 ]

ok thanks Mahmoud

Generated at Sat Feb 10 01:42:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.