[LU-7169] conf-sanity 84 restart mds1 failed Created: 16/Sep/15 Updated: 30/Nov/15 Resolved: 30/Nov/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Niu Yawei (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
https://testing.hpdd.intel.com/test_sessions/e089506c-5bf0-11e5-9dac-5254006e85c2 LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000 Lustre: Skipped 79 previous similar messages Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt Lustre: Skipped 26 previous similar messages Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1 Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2>/dev/null Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1 Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly LustreError: 14276:0:(osd_handler.c:1380:osd_ro()) *** setting lustre-MDT0000 read-only *** Turning device dm-0 (0xfd00000) read-only Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: lctl set_param fail_loc=0x20000709 fail_val=5 Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts Lustre: DEBUG MARKER: umount -d /mnt/mds1 Lustre: Failing over lustre-MDT0000 Removing read-only on unknown block (0xfd00000) Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' Lustre: DEBUG MARKER: hostname Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1 Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o recovery_time_hard=60,recovery_time_soft=60 /dev/lvm-Role_MDS/P1 /mnt/mds1 LDISKFS-fs (dm-0): recovery complete LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs error (device dm-0): ldiskfs_lookup: deleted inode referenced: 75023 Aborting journal on device dm-0-8. LDISKFS-fs (dm-0): Remounting filesystem read-only LDISKFS-fs error (device dm-0): ldiskfs_put_super: Couldn't clean up the journal LustreError: 14732:0:(obd_config.c:575:class_setup()) setup lustre-MDT0000-osd failed (-30) LustreError: 14732:0:(obd_mount.c:203:lustre_start_simple()) lustre-MDT0000-osd setup error -30 LustreError: 14732:0:(obd_mount_server.c:1760:server_fill_super()) Unable to start osd on /dev/mapper/lvm--Role_MDS-P1: -30 LustreError: 14732:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount (-30) Lustre: DEBUG MARKER: /usr/sbin/lctl mark conf-sanity test_84: @@@@@@ FAIL: Restart of mds1 failed! Looks the filesystem is corrupted somehow. |
| Comments |
| Comment by Andreas Dilger [ 16/Sep/15 ] |
|
The corruption here may be related to the problem seen in |
| Comment by James Nunez (Inactive) [ 16/Sep/15 ] |
|
Another occurance |
| Comment by Joseph Gmitter (Inactive) [ 16/Sep/15 ] |
|
Fan Yong, |
| Comment by Andreas Dilger [ 16/Sep/15 ] |
|
The MDS console log reports: 18:38:45:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 18:38:45:LDISKFS-fs error (device dm-0): ldiskfs_lookup: deleted inode referenced: 75023 18:38:45:Aborting journal on device dm-0-8. 18:38:45:LDISKFS-fs (dm-0): Remounting filesystem read-only On a later test when the MDS was remounted it crashed: 18:38:45:LDISKFS-fs error (device dm-0): ldiskfs_lookup: deleted inode referenced: 75023 18:38:45:Aborting journal on device dm-0-8. 18:38:45:LDISKFS-fs (dm-0): Remounting filesystem read-only 18:38:45:BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 18:38:45:IP: [<ffffffffa0b9833d>] osd_scrub_refresh_mapping+0x39d/0x410 [osd_ldiskfs] 18:38:45:Oops: 0000 [#1] SMP 18:38:56:Pid: 15532, comm: mount.lustre Not tainted 2.6.32-573.3.1.el6_lustre.g3eb333f.x86_64 #1 Red Hat KVM 18:38:56:Call Trace: 18:38:56: [<ffffffffa0b9c616>] osd_scrub_setup+0xfb6/0x1170 [osd_ldiskfs] 18:38:56: [<ffffffffa0b707e9>] osd_device_alloc+0x6c9/0x960 [osd_ldiskfs] 18:38:56: [<ffffffffa05bf8df>] obd_setup+0x1bf/0x290 [obdclass] 18:38:56: [<ffffffffa05bfc0f>] class_setup+0x25f/0x940 [obdclass] 18:38:56: [<ffffffffa05c6df1>] class_process_config+0x1151/0x2840 [obdclass] 18:38:56: [<ffffffffa05d0abb>] do_lcfg+0x2cb/0x640 [obdclass] 18:38:56: [<ffffffffa05d0ec4>] lustre_start_simple+0x94/0x200 [obdclass] 18:38:56: [<ffffffffa0605ce1>] server_fill_super+0xfd1/0x1a70 [obdclass] 18:38:56: [<ffffffffa05d33e8>] lustre_fill_super+0x348/0x990 [obdclass] 18:38:56: [<ffffffff8119532f>] get_sb_nodev+0x5f/0xa0 18:38:56: [<ffffffffa05cb8a5>] lustre_get_sb+0x25/0x30 [obdclass] 18:38:56: [<ffffffff8119496b>] vfs_kern_mount+0x7b/0x1b0 18:38:56: [<ffffffff81194b12>] do_kern_mount+0x52/0x130 18:38:56: [<ffffffff811b697b>] do_mount+0x2fb/0x930 18:38:56: [<ffffffff811b7040>] sys_mount+0x90/0xe0 |
| Comment by nasf (Inactive) [ 17/Sep/15 ] |
|
The MDS crash at "osd_scrub_refresh_mapping+0x39d" is a duplication of |
| Comment by Joseph Gmitter (Inactive) [ 17/Sep/15 ] |
|
Fan Yong, |
| Comment by nasf (Inactive) [ 19/Sep/15 ] |
|
It is NOT the same at LDISKFS-fs error (device dm-0): ldiskfs_lookup: deleted inode referenced: 75023 ldiskfs_lookup() found a deleted inode, then its error handler set the system as read-only, and then caused cascaded failures for subsequent operations. In fact, before the ldiskfs_lookup() failure, the system already crashed. Here is the MDS debug log: 00100000:00000001:0.0:1442342253.462893:0:14732:0:(osd_scrub.c:2423:osd_scrub_setup()) Process entered 00080000:00000001:0.0:1442342253.462936:0:14732:0:(osd_handler.c:2401:osd_ea_fid_set()) Process entered 00080000:00000001:0.0:1442342253.462944:0:14732:0:(osd_handler.c:2429:osd_ea_fid_set()) Process leaving (rc=0 : 0 : 0) 00100000:10000000:0.0:1442342253.463278:0:14732:0:(osd_scrub.c:277:osd_scrub_file_reset()) lustre-MDT0000: reset OI scrub file, old flags = 0x0, add flags = 0x2 Related source code is as following: int osd_scrub_setup(const struct lu_env *env, struct osd_device *dev) { ... } else { if (memcmp(sf->sf_uuid, es->s_uuid, 16) != 0) { osd_scrub_file_reset(scrub, es->s_uuid,SF_INCONSISTENT); dirty = 1; ... } The logs shows that during the MDT0000 mount, the osd_scrub_setup() found the super block's uuid has been changed. Usually, such case only happens when MDT file-level backup/restore. But in our conf-sanity test cases, there were no backup/restore operations. So the local file-system should have been crashed during the MDT failover. As for what caused the super block corruption, I have no idea yet. |
| Comment by Gerrit Updater [ 29/Sep/15 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/16664 |
| Comment by Peter Jones [ 28/Oct/15 ] |
|
Fan Yong What are the next steps here? It looks like the debug patch did not trigger the failure. Should we land the debug patch or does it just need to be run with a higher number of runs to improve the chances of hitting it? Thanks Peter |
| Comment by nasf (Inactive) [ 29/Oct/15 ] |
|
The issues cannot be reproduced any longer after the debug patch, even through several repeat the failed test case. So I suggest to land the patch to master and give the chance to normal Maloo run. I will update the patch to drop "fortestonly" and ask for landing permission. |
| Comment by Peter Jones [ 29/Oct/15 ] |
|
Excellent. Thanks Fan Yong. |
| Comment by nasf (Inactive) [ 10/Nov/15 ] |
|
We hit the trouble with the patch applied, the log shows that there is really some disk inconsistency as following: CMD: onyx-44vm3 e2fsck -d -v -t -t -f -n /dev/lvm-Role_MDS/P1
onyx-44vm3: e2fsck 1.42.13.wc3 (28-Aug-2015)
Warning: skipping journal recovery because doing a read-only filesystem check.
Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 292k/0k (87k/206k), time: 0.00/ 0.00/ 0.00
Pass 1: I/O read: 1MB, write: 0MB, rate: 553.40MB/s
Pass 2: Checking directory structure
Pass 2: Memory used: 292k/0k (98k/195k), time: 0.00/ 0.00/ 0.00
Pass 2: I/O read: 1MB, write: 0MB, rate: 655.74MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 292k/0k (98k/195k), time: 0.01/ 0.00/ 0.00
Pass 3: Memory used: 292k/0k (96k/197k), time: 0.00/ 0.00/ 0.00
Pass 3: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 4: Checking reference counts
Pass 4: Memory used: 292k/0k (62k/231k), time: 0.00/ 0.00/ 0.00
Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 5: Checking group summary information
Free blocks count wrong (33296, counted=32947).
Fix? no
Free inodes count wrong (99987, counted=99750).
Fix? no
Pass 5: Memory used: 292k/0k (62k/231k), time: 0.00/ 0.00/ 0.00
Pass 5: I/O read: 1MB, write: 0MB, rate: 333.78MB/s
13 inodes used (0.01%, out of 100000)
6 non-contiguous files (46.2%)
0 non-contiguous directories (0.0%)
# of inodes with ind/dind/tind blocks: 0/0/0
16704 blocks used (33.41%, out of 50000)
0 bad blocks
1 large file
125 regular files
116 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
------------
241 files
Memory used: 292k/0k (61k/232k), time: 0.01/ 0.01/ 0.00
I/O read: 1MB, write: 0MB, rate: 92.42MB/s
reboot facets: mds1
Such inconsistency does not means the super block crashed. Because without disk checksum, the e2fsck cannot detect per-block based data corruption. Only with above logs, we cannot say it is the root reason for test_84 failure. I will update the patch with more debug information. |
| Comment by Gerrit Updater [ 17/Nov/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16664/ |
| Comment by Joseph Gmitter (Inactive) [ 23/Nov/15 ] |
|
Landed for 2.8 |
| Comment by Andreas Dilger [ 27/Nov/15 ] |
|
The landed patch was only for debugging. This issue is not resolved. |
| Comment by nasf (Inactive) [ 27/Nov/15 ] |
|
Right, we are still waiting for more failure instances after landing the patch. |
| Comment by Andreas Dilger [ 27/Nov/15 ] |
|
There are many, many failures of this test, but unfortunately they have all been assigned different bugs because the error messages are different. In the tests I've seen, the e2fsck run is clean, except for the superblock inside and block counts, which is expected. I pushed a patch under |
| Comment by nasf (Inactive) [ 28/Nov/15 ] |
|
You are right. The former test_83 reformat the system after test_83 done, then move to test_84, after the replay_barrier(), the e2fsck found some inconsistency: Warning: skipping journal recovery because doing a read-only filesystem check. Pass 1: Checking inodes, blocks, and sizes Pass 1: Memory used: 292k/0k (86k/207k), time: 0.01/ 0.00/ 0.00 Pass 1: I/O read: 1MB, write: 0MB, rate: 104.44MB/s Pass 2: Checking directory structure Pass 2: Memory used: 292k/0k (98k/195k), time: 0.02/ 0.00/ 0.01 Pass 2: I/O read: 1MB, write: 0MB, rate: 47.50MB/s Pass 3: Checking directory connectivity Peak memory: Memory used: 292k/0k (98k/195k), time: 0.04/ 0.01/ 0.01 Pass 3: Memory used: 292k/0k (96k/197k), time: 0.00/ 0.00/ 0.00 Pass 3: I/O read: 0MB, write: 0MB, rate: 0.00MB/s Pass 4: Checking reference counts Pass 4: Memory used: 292k/0k (62k/231k), time: 0.00/ 0.00/ 0.00 Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s Pass 5: Checking group summary information Free blocks count wrong (33296, counted=32960). Fix? no Free inodes count wrong (99987, counted=99754). Fix? no Pass 5: Memory used: 292k/0k (61k/232k), time: 0.01/ 0.01/ 0.00 ... It is reasonable to suspect that the replay_barrier() marked the system as readonly too early before super block synced to disk. But seems the replay_barrier() has considered that: replay_barrier() {
local facet=$1
do_facet $facet "sync; sync; sync"
df $MOUNT
..
So seems the "sync" does not work. As for your patch http://review.whamcloud.com/#/c/17371/, it should be helpful. But why not make the sync_all_data to be inside replay_barrier() to replace the 'do_facet $facet "sync; sync; sync"', then it can help others. |
| Comment by nasf (Inactive) [ 28/Nov/15 ] |
|
The patch from Andreas: |
| Comment by Andreas Dilger [ 28/Nov/15 ] |
|
The Free blocks/inodes count wrong errors do not indicate any kind of problem. These fields in the superblock are not updated during normal usage, so after replay_barrier() they will be out of date. I thought about putting the sync_all_data(); sleep 5; call inside replay_barrier() but I didn't want to slow down any of the other tests that use this function that don't immediately follow reformatting the filesystem. We'll see if this patch actually fixes the problem. |
| Comment by Peter Jones [ 30/Nov/15 ] |
|
believed to be a duplicate of |