Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.7.0
-
None
-
3
-
9223372036854775807
Description
trying to create a MDT snapshot via LVM (for backups) with the following command:
lvcreate --size 20G --snapshot --name lv-mdt0_snap /dev/vg-3015/lv-mdt0
immediately following, errors are logged like this:
Dec 28 09:24:32 mds01 kernel: ------------[ cut here ]------------ Dec 28 09:24:32 mds01 kernel: WARNING: at /var/lib/jenkins/workspace/lustre-b2_7/arch/x86_64/build_type/server/distro/el6.6/ib_stack/inkernel/BUILD/BUILD/lustre-2.7.0/ldiskfs/super.c:280 ldiskfs_journal_start_sb+0xce/0xe0 [ldiskfs]() (Tainted: P W --------------- ) Dec 28 09:24:32 mds01 kernel: Hardware name: ProLiant DL380p Gen8 Dec 28 09:24:32 mds01 kernel: Modules linked in: ext2 osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) ldiskfs(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ip6table_filter ip6_tables ebtable_nat ebtables bridge stp llc ipt_REJECT ipt_LOG xt_limit nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate dm_snapshot dm_bufio dm_round_robin dm_multipath vhost_net macvtap macvlan tun kvm_intel kvm iTCO_wdt iTCO_vendor_support microcode hpilo hpwdt power_meter acpi_ipmi ipmi_si ipmi_msghandler tg3 sg serio_raw lpc_ich mfd_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en ptp pps_core mlx4_core hpsa(U) qla2xxx scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_ Dec 28 09:24:32 mds01 kernel: log dm_mod [last unloaded: scsi_wait_scan] Dec 28 09:24:32 mds01 kernel: Pid: 5300, comm: mdt00_005 Tainted: P W --------------- 2.6.32-504.8.1.el6_lustre.x86_64 #1 Dec 28 09:24:32 mds01 kernel: Call Trace: Dec 28 09:24:32 mds01 kernel: [<ffffffff81074df7>] ? warn_slowpath_common+0x87/0xc0 Dec 28 09:24:32 mds01 kernel: [<ffffffff81074e4a>] ? warn_slowpath_null+0x1a/0x20 Dec 28 09:24:32 mds01 kernel: [<ffffffffa0e9234e>] ? ldiskfs_journal_start_sb+0xce/0xe0 [ldiskfs] Dec 28 09:24:32 mds01 kernel: [<ffffffffa0e6c23a>] ? ldiskfs_dirty_inode+0x2a/0x60 [ldiskfs] Dec 28 09:24:32 mds01 kernel: [<ffffffffa1050966>] ? osd_attr_set+0x166/0x460 [osd_ldiskfs] Dec 28 09:25:03 mds01 kernel: ------------[ cut here ]------------
The lvcreate command does return if the MDT activity is halted (unmount clients).
Attachments
Issue Links
- is related to
-
LU-8071 lvcreate --snapshot of MDT hangs in ldiskfs_journal_start_sb
-
- Resolved
-
Not sure how closely tied this is, but... after the reboot we ran e2fsck on the MDT. The fsck was able replay the journal but during the scan found lots of unattached inodes and moved them to /lost+found. Interestingly, most of those inodes were almost sequential:
Mounted as ldiskfs and saw that all but about 20 files were owned by root and had surprisingly old date stamps:
nbp9-mds ~ # ls -l /mnt/lustre/nbp9-mdt/lost+found/ total 6816 -rw------- 1 root root 5792 Jul 26 1970 #17831719 -rw------- 1 root root 5792 Jul 26 1970 #17831723 -rw------- 1 root root 5792 Feb 4 1988 #571005637 -rw------- 1 root root 5792 Feb 4 1988 #571016437 -rw------- 1 root root 5792 Feb 4 1988 #571029133 ... -rw------- 1 root root 4832 Jul 5 1988 #584109225 -rw------- 1 root root 4832 Jul 5 1988 #584109227 -rw------- 1 root root 4832 Jul 5 1988 #584109231 -rw------- 1 root root 4832 Jul 5 1988 #584109233 -rw------- 1 root root 4832 Jul 5 1988 #584109235 nbp9-mds ~ # ls -l /mnt/lustre/nbp9-mdt/lost+found | grep -v " root " total 47328 -rwxr----- 1 dmenemen g26209 0 Feb 10 2014 #570436464 -rw-r--r-- 1 dmenemen g26209 0 Feb 10 2014 #570436465 -rwxr----- 1 dmenemen g26209 0 Feb 10 2014 #570436466 -rw-r--r-- 1 dmenemen g26209 0 Feb 10 2014 #570436467 -rw-r--r-- 1 dmenemen g26209 0 Feb 10 2014 #570436468 -rw-r--r-- 1 dmenemen g26209 0 Feb 10 2014 #570436469 ... (another dozen or so more similar files)
Ran "ll_recover_lost_found_objs" on the MDT mounted as ldiskfs, and it reported results like:
(It got all the non-root owned files)
I believe that the rest of the files in lost+found were probably either useless or were never pointing to user data... they have no xattrs:
# getfattr -d -m ".*" -e hex /mnt/lustre/nbp9-mdt/lost+found/*
(returned nothing!)
So, we ran another e2fsck (which came back clean!) and put the filesystem back into production.