[LU-4373] MDS corruption - e2fsck issues Created: 10/Dec/13 Updated: 28/Jul/14 Resolved: 28/Jul/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Daire Byrne (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 11972 |
| Description |
|
Hi, We had some corruption appear on our MDS (possibly as a result of testing an SSD caching solution - Virident) so we took it offline to run an e2fsck. The check ran out of memory (we have 256Gigs + 90G swap) but not before it managed to do something screwy with the quotas. We could at least mount the FS with the corruption before the e2fsck but now it just bombed out instantly with errors on the quota. I then disabled the ext4 quota feature (we don't use quotas) and tried to mount but it goes read-only instantly now (originally it took a while to hit the corrupt area of the disk). I will probably try e2fsck one more time adding another disk as swap (with ~500 million 4k inodes - how much do we need?) and then resort to reformatting (we can repopulate the fs in a week or so). I'm attaching the relevant output in case there is anything useful here for you. First see the corruption: Dec 5 02:55:49 bmds1 kernel: LDISKFS-fs error (device vgca0_vcache1): ldiskfs_xattr_block_get: inode 299912086: bad block 299969421 Dec 5 02:55:49 bmds1 kernel: Aborting journal on device sdb. Dec 5 02:55:49 bmds1 kernel: LDISKFS-fs error (device vgca0_vcache1): ldiskfs_journal_start_sb: Detected aborted journal Dec 5 02:55:49 bmds1 kernel: LDISKFS-fs (vgca0_vcache1): Remounting filesystem read-only .. .. Dec 5 05:23:10 bmds1 kernel: VFS: cannot write quota structure on device vgca0_vcache1 (error -30). Quota may get out of sync! Dec 5 05:23:16 bmds1 kernel: LDISKFS-fs (vgca0_vcache1): Quota write (off=99328, len=1024) cancelled because transaction is not started Dec 5 05:23:16 bmds1 kernel: VFS: Can't insert quota data block (97) to free entry list. .. .. Dec 5 06:39:49 bmds1 kernel: LDISKFS-fs warning (device vgca0_vcache1): dx_probe: Unrecognised inode hash code 32 for directory #244352420 Dec 5 06:39:49 bmds1 kernel: LDISKFS-fs warning (device vgca0_vcache1): dx_probe: Corrupt dir inode 244352420, running e2fsck is recommended. Dec 5 06:39:49 bmds1 kernel: LDISKFS-fs warning (device vgca0_vcache1): dx_probe: Unrecognised inode hash code 32 for directory #244352420 Remount without e2fsck and wait until: Dec 5 18:53:49 bmds1 kernel: LDISKFS-fs error (device md0): mb_free_blocks: double-free of inode 0's block 857326879(bit 17695 in group 26163)
Dec 5 18:53:49 bmds1 kernel: Aborting journal on device sdb.
Dec 5 18:53:49 bmds1 kernel: LDISKFS-fs error (device md0): ldiskfs_journal_start_sb: Detected aborted journal
Dec 5 18:53:49 bmds1 kernel: LDISKFS-fs (md0): Remounting filesystem read-only
Dec 5 18:53:49 bmds1 kernel: LDISKFS-fs (md0): Remounting filesystem read-only
So I guess we better run e2fsck then: bmds1 /root # e2fsck -fy /dev/md0 e2fsck 1.42.7.wc1 (12-Apr-2013) Pass 1: Checking inodes, blocks, and sizes Deleted inode 857268317 has zero dtime. Fix? yes Deleted inode 2613636819 has zero dtime. Fix? yes Deleted inode 2961739541 has zero dtime. Fix? yes Running additional passes to resolve blocks claimed by more than one inode... Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s) in inode 268975905: 269046865 Multiply-claimed block(s) in inode 268975912: 269046838 Multiply-claimed block(s) in inode 268998892: 269046838 Multiply-claimed block(s) in inode 268998896: 269046865 Pass 1C: Scanning directories for inodes with multiply-claimed blocks Pass 1D: Reconciling multiply-claimed blocks (There are 4 inodes containing multiply-claimed blocks.) File /ROOT/ARCHIVE/dirvish/filers/nfs11/20131203/tree/user_data/FL/rdev_tars/photoshop (inode #268975905, mod time Wed Apr 24 15:31:27 2013) has 1 multiply-claimed block(s), shared with 1 file(s): ... (inode #268998896, mod time Wed Jan 9 12:59:03 2013) Clone multiply-claimed blocks? yes File /ROOT/ARCHIVE/dirvish/filers/nfs11/20131203/tree/user_data/FL/rdev_tars/REF/paperwork (inode #268975912, mod time Wed Apr 24 15:31:26 2013) has 1 multiply-claimed block(s), shared with 1 file(s): ... (inode #268998892, mod time Wed Jan 9 12:59:03 2013) Clone multiply-claimed blocks? yes File ... (inode #268998892, mod time Wed Jan 9 12:59:03 2013) has 1 multiply-claimed block(s), shared with 1 file(s): /ROOT/ARCHIVE/dirvish/filers/nfs11/20131203/tree/user_data/FL/rdev_tars/REF/paperwork (inode #268975912, mod time Wed Apr 24 15:31:26 2013) Multiply-claimed blocks already reassigned or cloned. File ... (inode #268998896, mod time Wed Jan 9 12:59:03 2013) has 1 multiply-claimed block(s), shared with 1 file(s): /ROOT/ARCHIVE/dirvish/filers/nfs11/20131203/tree/user_data/FL/rdev_tars/photoshop (inode #268975905, mod time Wed Apr 24 15:31:27 2013) Multiply-claimed blocks already reassigned or cloned. Pass 2: Checking directory structure Error allocating icount structure: Memory allocation failed bravo-MDT0000: ***** FILE SYSTEM WAS MODIFIED ***** [QUOTA WARNING] Usage inconsistent for ID 0:actual (352486866944, 75720160) != expected (287050371072, 62379940) [QUOTA WARNING] Usage inconsistent for ID 4078:actual (5853184, 1468) != expected (0, 1193) [QUOTA WARNING] Usage inconsistent for ID 5255:actual (32276480, 7735) != expected (24416256, 5884) [QUOTA WARNING] Usage inconsistent for ID 5305:actual (578752512, 2100806) != expected (3155456000, 3376042) [QUOTA WARNING] Usage inconsistent for ID 4080:actual (56586240, 13815) != expected (55222272, 13750) [QUOTA WARNING] Usage inconsistent for ID 6731:actual (241618944, 56584) != expected (263524352, 59180) [QUOTA WARNING] Usage inconsistent for ID 3449:actual (3298160640, 679894) != expected (2682540032, 596461) [QUOTA WARNING] Usage inconsistent for ID 4052:actual (1331068928, 273724) != expected (1242320896, 218991) [QUOTA WARNING] Usage inconsistent for ID 4088:actual (246837248, 60468) != expected (27856896, 19173) [QUOTA WARNING] Usage inconsistent for ID 5339:actual (1207443456, 254756) != expected (1176674304, 270763) [QUOTA WARNING] Usage inconsistent for ID 6699:actual (140206080, 37540) != expected (10919936, 12748) [QUOTA WARNING] Usage inconsistent for ID 3695:actual (68915200, 15943) != expected (56963072, 13213) .. .. [QUOTA WARNING] Usage inconsistent for ID 7349:actual (64790528, 19964) != expected (52305920, 17445) [QUOTA WARNING] Usage inconsistent for ID 188:actual (20480, 5) != expected (16384, 4) [QUOTA WARNING] Usage inconsistent for ID 1056:actual (15908864, 3640) != expected (15458304, 3552) [QUOTA WARNING] Usage inconsistent for ID 615:actual (180224, 44) != expected (172032, 42) [QUOTA WARNING] Usage inconsistent for ID 513:actual (364544, 89) != expected (32768, 8) [QUOTA WARNING] Usage inconsistent for ID 999:actual (16240640, 3965) != expected (15622144, 3814) [QUOTA WARNING] Usage inconsistent for ID 5001:actual (21745664, 5229) != expected (21049344, 5063) [QUOTA WARNING] Usage inconsistent for ID 1003:actual (0, 0) != expected (36864, 9) [QUOTA WARNING] Usage inconsistent for ID 462:actual (233472, 132) != expected (225280, 123) Update quota info for quota type 1? yes [ERROR] quotaio_tree.c:357:free_dqentry:: Quota structure has offset to other block (0) than it should (34). e2fsck: aborted bravo-MDT0000: ***** FILE SYSTEM WAS MODIFIED ***** Try to mount now: Dec 9 09:56:42 bmds1 kernel: LDISKFS-fs (md0): warning: mounting fs with errors, running e2fsck is recommended Dec 9 09:56:42 bmds1 kernel: LDISKFS-fs (md0): Ignoring delalloc option - requested data journaling mode Dec 9 09:56:45 bmds1 kernel: LDISKFS-fs (md0): recovery complete Dec 9 09:56:45 bmds1 kernel: LDISKFS-fs (md0): Can't enable usage tracking on a filesystem with the QUOTA feature set Dec 9 09:56:45 bmds1 kernel: LDISKFS-fs (md0): mount failed Dec 9 09:56:45 bmds1 kernel: ------------[ cut here ]------------ Dec 9 09:56:45 bmds1 kernel: WARNING: at fs/proc/generic.c:847 remove_proc_entry+0x24f/0x260() (Tainted: P --------------- ) Dec 9 09:56:45 bmds1 kernel: Hardware name: PowerEdge R620 Dec 9 09:56:45 bmds1 kernel: remove_proc_entry: removing non-empty directory 'ldiskfs/md0', leaking at least 'prealloc_table' Dec 9 09:56:45 bmds1 kernel: Modules linked in: ldiskfs(U) jbd2 mptctl mptbase ipmi_devintf dell_rbu nfsd exportfs autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc uinput ipv6 raid1 power_meter sg vgcinit(P)(U) vgcdebug(P)(U) shpchp bnx2x libcrc32c mdio dcdbas microcode sb_edac edac_core iTCO_wdt iTCO_vendor_support ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif mpt2sas scsi_transport_sas raid_class ahci wmi megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Dec 9 09:56:45 bmds1 kernel: Pid: 8092, comm: mount Tainted: P --------------- 2.6.32-358.18.1.el6_lustre.x86_64 #1 Dec 9 09:56:45 bmds1 kernel: Call Trace: Dec 9 09:56:45 bmds1 kernel: [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0 Dec 9 09:56:45 bmds1 kernel: [<ffffffff8106e4d6>] ? warn_slowpath_fmt+0x46/0x50 Dec 9 09:56:45 bmds1 kernel: [<ffffffff811ef7bd>] ? xlate_proc_name+0x4d/0xd0 Dec 9 09:56:45 bmds1 kernel: [<ffffffff811efb1f>] ? remove_proc_entry+0x24f/0x260 Dec 9 09:56:45 bmds1 kernel: [<ffffffff8116e625>] ? pcpu_free_area+0x165/0x1e0 Dec 9 09:56:45 bmds1 kernel: [<ffffffff8116e755>] ? free_percpu+0xb5/0x140 Dec 9 09:56:45 bmds1 kernel: [<ffffffffa049acec>] ? ldiskfs_fill_super+0x23c/0x2a10 [ldiskfs] Dec 9 09:56:45 bmds1 kernel: [<ffffffff8118477e>] ? get_sb_bdev+0x18e/0x1d0 Dec 9 09:56:45 bmds1 kernel: [<ffffffffa049aab0>] ? ldiskfs_fill_super+0x0/0x2a10 [ldiskfs] Dec 9 09:56:45 bmds1 kernel: [<ffffffffa0495388>] ? ldiskfs_get_sb+0x18/0x20 [ldiskfs] Dec 9 09:56:45 bmds1 kernel: [<ffffffff81183beb>] ? vfs_kern_mount+0x7b/0x1b0 Dec 9 09:56:45 bmds1 kernel: [<ffffffff81183d92>] ? do_kern_mount+0x52/0x130 Dec 9 09:56:45 bmds1 kernel: [<ffffffff811a3f52>] ? do_mount+0x2d2/0x8d0 Dec 9 09:56:45 bmds1 kernel: [<ffffffff81139ff4>] ? strndup_user+0x64/0xc0 Dec 9 09:56:45 bmds1 kernel: [<ffffffff811a45e0>] ? sys_mount+0x90/0xe0 Dec 9 09:56:45 bmds1 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Dec 9 09:56:45 bmds1 kernel: ---[ end trace dfa074843fd85142 ]--- Okay try without the quota feature set: Dec 9 15:47:56 bmds1 kernel: LDISKFS-fs (md0): warning: mounting fs with errors, running e2fsck is recommended Dec 9 15:47:56 bmds1 kernel: LDISKFS-fs (md0): Ignoring delalloc option - requested data journaling mode Dec 9 15:47:59 bmds1 kernel: LDISKFS-fs (md0): recovery complete Dec 9 15:47:59 bmds1 kernel: LDISKFS-fs (md0): mounted filesystem with journalled data mode. quota=off. Opts: Dec 9 15:48:30 bmds1 kernel: LNet: HW CPU cores: 32, npartitions: 4 Dec 9 15:48:30 bmds1 kernel: alg: No test for crc32 (crc32-table) Dec 9 15:48:30 bmds1 kernel: alg: No test for adler32 (adler32-zlib) Dec 9 15:48:30 bmds1 kernel: alg: No test for crc32 (crc32-pclmul) Dec 9 15:48:34 bmds1 kernel: padlock: VIA PadLock Hash Engine not detected. Dec 9 15:48:34 bmds1 modprobe: FATAL: Error inserting padlock_sha (/lib/modules/2.6.32-358.18.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko): No such device Dec 9 15:48:38 bmds1 kernel: Lustre: Lustre: Build Version: 2.4.1-RC2--PRISTINE-2.6.32-358.18.1.el6_lustre.x86_64 Dec 9 15:48:38 bmds1 kernel: LNet: Added LNI 10.21.22.50@tcp [8/256/0/180] Dec 9 15:48:38 bmds1 kernel: LNet: Accept secure, port 988 Dec 9 15:48:39 bmds1 kernel: LDISKFS-fs (md0): barriers disabled Dec 9 15:48:39 bmds1 kernel: LDISKFS-fs (md0): warning: mounting fs with errors, running e2fsck is recommended Dec 9 15:48:39 bmds1 kernel: LDISKFS-fs (md0): Ignoring delalloc option - requested data journaling mode Dec 9 15:48:42 bmds1 kernel: LDISKFS-fs (md0): mounted filesystem with journalled data mode. quota=off. Opts: Dec 9 15:48:43 bmds1 kernel: Lustre: bravo-MDT0000: used disk, loading Dec 9 15:48:43 bmds1 kernel: LDISKFS-fs error (device md0): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 37corrupted: 525 blocks free in bitmap, 524 - in gd Dec 9 15:48:43 bmds1 kernel: Dec 9 15:48:43 bmds1 kernel: Aborting journal on device sdb. Dec 9 15:48:43 bmds1 kernel: LDISKFS-fs (md0): Remounting filesystem read-only |
| Comments |
| Comment by Daire Byrne (Inactive) [ 10/Dec/13 ] |
|
And another e2fsck run: bmds1 /root # e2fsck -fvy /dev/md0 e2fsck 1.42.7.wc1 (12-Apr-2013) bravo-MDT0000: recovering journal Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Invalid inode number for '.' in directory inode 268998892. Fix? yes Entry 'lineup' in /ROOT/ARCHIVE/dirvish/filers/nfs11/20131203/tree/user_data/FL/rdev_tars/REF/??? (268998892) is a link to directory /ROOT/ARCHIVE/dirvish/filers/nfs11/20131203/tree/user_data/FL/rdev_tars/REF/paperwork/lineup (268975913). Clear? yes Invalid inode number for '.' in directory inode 268998896. Fix? yes Entry '..' in .../??? (268998889) has deleted/unused inode 266936598. Clear? yes Entry 'prop-base' in .../??? (268998889) has deleted/unused inode 269521254. Clear? yes Entry 'props' in .../??? (268998889) is a link to directory /ROOT/ARCHIVE/dirvish/servers/jenkins1/20131204/tree/usr/src/debug/kernel-2.6.32-279.5.2.el6/linux-2.6.32-279.5.2.el6.x86_64/include/drm (270036736). Clear? yes Entry '..' in .../??? (902387214) has deleted/unused inode 882443083. Clear? yes ext2fs_get_mem: Cannot allocate memory |
| Comment by Peter Jones [ 04/Mar/14 ] |
|
Niu Could you please look into this one? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 05/Mar/14 ] |
|
Looks your memory (256 + 90)G is insufficient to hold all directory structures, you need to enable the "scratch_files" to make e2fsck stores the dir structures in a file instead of memory. (see man e2fsck.conf), hope that'll complete the fsck. |
| Comment by Daire Byrne (Inactive) [ 05/Mar/14 ] |
|
I believe I tried scratch_files too at the time but still ran out of memory (with a 900G drive?). I guess the question is roughly how much RAM do we need to fsck 300 million dirs/files? I'd rather not find this out by trial and error (900Gigs... 4TB?). 4k X 300 million (dirs) =~ 1.2 TB? |
| Comment by Niu Yawei (Inactive) [ 06/Mar/14 ] |
|
I don't know exactly if it requires 1.2TB or more, I suggest you use the largest drive you have (4TB?), and monitor the disk usage while running fsck. ('tt' option prints the memory usage for each phase) |
| Comment by Peter Jones [ 28/Apr/14 ] |
|
Hi Daire Have you tried this procedure yet? Peter |
| Comment by Daire Byrne (Inactive) [ 29/Apr/14 ] |
|
Peter, No I haven't had the chance yet but we still have the RAID and data so I still plan to do the work soon. |
| Comment by Peter Jones [ 29/Apr/14 ] |
|
ok Daire. Thanks for the update. Please keep us posted. |
| Comment by Daire Byrne (Inactive) [ 24/Jul/14 ] |
|
Hey, I finally got around to setting up some hardware (5TB disk for e2fsck scratch) to re-run the e2fsck on this corrupted RAID. I had to remove the external journal but now it looks like e2fsck isn't getting as far as it did before: [root@localhost ~]# e2fsck -tt -fvy /dev/md0 e2fsck 1.42.7.wc1 (12-Apr-2013) Pass 1: Checking inodes, blocks, and sizes Signal (11) SIGSEGV si_code=SEGV_MAPERR fault addr=0x7ff5bac5fffc e2fsck[0x42dbcd] /lib64/libc.so.6(+0x329a0)[0x7ff4b991d9a0] /lib64/libc.so.6(memcpy+0x35)[0x7ff4b9974985] /lib64/libext2fs.so.2(+0x2c046)[0x7ff4ba84e046] /lib64/libext2fs.so.2(+0x2cc3e)[0x7ff4ba84ec3e] /lib64/libext2fs.so.2(ext2fs_tdb_store+0x747)[0x7ff4ba8510d7] /lib64/libext2fs.so.2(+0x20c79)[0x7ff4ba842c79] /lib64/libext2fs.so.2(ext2fs_icount_store+0x46)[0x7ff4ba842d26] e2fsck(e2fsck_pass1+0xb58)[0x415468] e2fsck(e2fsck_run+0x4f)[0x40efff] e2fsck(main+0xeac)[0x40d1cc] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ff4b9909d1d] e2fsck[0x409eb9] |
| Comment by Niu Yawei (Inactive) [ 25/Jul/14 ] |
|
It could be |
| Comment by Daire Byrne (Inactive) [ 28/Jul/14 ] |
|
I guess it may have been an issue with using e2fsck's scratch files option. It ran to completion using 4TB of swap space. We are happy that we can at least fsck our MDT in future with enough swap configured. We can close this ticket now. Pass 5: Memory used: 18014398508741340k/4240k (18014398507398050k/1343291k), time: 442.38/111.08/ 0.00
Pass 5: I/O read: 270MB, write: 1MB, rate: 0.61MB/s
bravo-MDT0000: ***** FILE SYSTEM WAS MODIFIED *****
429147617 inodes used (13.32%, out of 3221225472)
189 non-contiguous files (0.0%)
4422586 non-contiguous directories (0.0%)
# of inodes with ind/dind/tind blocks: 712347/4105/0
833979888 blocks used (25.89%, out of 3221225472)
0 bad blocks
1 large file
280344591 regular files
146575289 directories
21 character device files
32 block device files
123 fifos
1959990622 links
2223722 symbolic links (1652788 fast symbolic links)
3830 sockets
------------
2389138230 files
Memory used: 18014398508741340k/4240k (18014398507398050k/1343291k), time: 57393.48/10071.86/3111.58
I/O read: 2102044MB, write: 265MB, rate: 36.63MB/s
|
| Comment by Peter Jones [ 28/Jul/14 ] |
|
ok thanks Daire |