[LU-7616] creation of LVM snapshot on ldiskfs based MDT hangs until MDT activity/use is halted Created: 29/Dec/15  Updated: 27/Apr/16

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: tom crowe Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

CentOS release 6.5 (Final)

2.6.32-504.8.1.el6_lustre.x86_64 #1 SMP Thu Feb 26 22:39:27 PST 2015 x86_64 x86_64 x86_64 GNU/Linux

lustre: 2.7.0
kernel: patchless_client
build: 2.7.0-RC4--PRISTINE-2.6.32-504.8.1.el6_lustre.x86_64


Issue Links:
Related
is related to LU-8071 lvcreate --snapshot of MDT hangs in l... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

trying to create a MDT snapshot via LVM (for backups) with the following command:

lvcreate --size 20G --snapshot --name lv-mdt0_snap /dev/vg-3015/lv-mdt0

immediately following, errors are logged like this:

Dec 28 09:24:32 mds01 kernel: ------------[ cut here ]------------
Dec 28 09:24:32 mds01 kernel: WARNING: at /var/lib/jenkins/workspace/lustre-b2_7/arch/x86_64/build_type/server/distro/el6.6/ib_stack/inkernel/BUILD/BUILD/lustre-2.7.0/ldiskfs/super.c:280 ldiskfs_journal_start_sb+0xce/0xe0 [ldiskfs]() (Tainted: P        W  ---------------   )
Dec 28 09:24:32 mds01 kernel: Hardware name: ProLiant DL380p Gen8
Dec 28 09:24:32 mds01 kernel: Modules linked in: ext2 osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) ldiskfs(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ip6table_filter ip6_tables ebtable_nat ebtables bridge stp llc ipt_REJECT ipt_LOG xt_limit nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate dm_snapshot dm_bufio dm_round_robin dm_multipath vhost_net macvtap macvlan tun kvm_intel kvm iTCO_wdt iTCO_vendor_support microcode hpilo hpwdt power_meter acpi_ipmi ipmi_si ipmi_msghandler tg3 sg serio_raw lpc_ich mfd_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en ptp pps_core mlx4_core hpsa(U) qla2xxx scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_
Dec 28 09:24:32 mds01 kernel: log dm_mod [last unloaded: scsi_wait_scan]
Dec 28 09:24:32 mds01 kernel: Pid: 5300, comm: mdt00_005 Tainted: P        W  ---------------    2.6.32-504.8.1.el6_lustre.x86_64 #1
Dec 28 09:24:32 mds01 kernel: Call Trace:
Dec 28 09:24:32 mds01 kernel: [<ffffffff81074df7>] ? warn_slowpath_common+0x87/0xc0
Dec 28 09:24:32 mds01 kernel: [<ffffffff81074e4a>] ? warn_slowpath_null+0x1a/0x20
Dec 28 09:24:32 mds01 kernel: [<ffffffffa0e9234e>] ? ldiskfs_journal_start_sb+0xce/0xe0 [ldiskfs]
Dec 28 09:24:32 mds01 kernel: [<ffffffffa0e6c23a>] ? ldiskfs_dirty_inode+0x2a/0x60 [ldiskfs]
Dec 28 09:24:32 mds01 kernel: [<ffffffffa1050966>] ? osd_attr_set+0x166/0x460 [osd_ldiskfs]
Dec 28 09:25:03 mds01 kernel: ------------[ cut here ]------------

The lvcreate command does return if the MDT activity is halted (unmount clients).



 Comments   
Comment by Nathan Dauchy (Inactive) [ 26/Apr/16 ]

We saw a very similar issue at NASA yesterday... and we believe there may be some bugs in journal locking when creating snapshots.

Using CentOS-6.7, lustre-2.5.3, lvm2-2.02.118-3.el6_7.4. Also note that the MDT uses an external journal device.

The MDS became extremely sluggish, and all MDT threads went into D state, when running the following command:

lvcreate -l95%FREE -s -p r -n mdt_snap /dev/nbp9-vg/mdt9

(the command never returned, and any further lv* commands hung as well)

In the logs...

Apr 25 17:09:35 nbp9-mds kernel: WARNING: at /usr/src/redhat/BUILD/lustre-2.5.3/ldiskfs/super.c:280 ldiskfs_journal_start_sb+0xce/0xe0 [ldiskfs]() (Not tainted)
Apr 25 17:14:45 nbp9-mds ]
Apr 25 17:14:45 nbp9-mds kernel: [<ffffffffa0e5c4e5>] ? mds_readpage_handle+0x15/0x20 [mdt]
Apr 25 17:14:45 nbp9-mds kernel: [<ffffffffa08a90c5>] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
Apr 25 17:14:45 nbp9-mds kernel: [<ffffffffa05d18d5>] ? lc_watchdog_touch+0x65/0x170 [libcfs]
Apr 25 17:14:45 nbp9-mds kernel: [<ffffffffa08a1a69>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
Apr 25 17:14:45 nbp9-mds kernel: [<ffffffffa08ab89d>] ? ptlrpc_main+0xafd/0x1780 [ptlrpc]
Apr 25 17:14:45 nbp9-mds kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
Apr 25 17:14:45 nbp9-mds kernel: [<ffffffffa08aada0>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
Apr 25 17:14:45 nbp9-mds kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Apr 25 17:14:45 nbp9-mds kernel: ---[ end trace c9f3339c0e103edf ]---

Apr 25 17:14:57 nbp9-mds kernel: WARNING: at /usr/src/redhat/BUILD/lustre-2.5.3/ldiskfs/super.c:280 ldiskfs_journal_start_sb+0xce/0xe0 [ldiskfs]() (Tainted: G        W  ---------------   )
Apr 25 17:14:57 nbp9-mds kernel: Hardware name: AltixXE270
Apr 25 17:14:57 nbp9-mds kernel: Modules linked in: dm_snapshot dm_bufio osp(U) mdd(U) lfsck(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) ldiskfs(U) jbd2 lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic crc32c_intel libcfs(U) sunrpc bonding ib_ucm(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) configfs ib_ipoib(U) ib_cm(U) ib_uverbs(U) ib_umad(U) dm_round_robin scsi_dh_rdac dm_multipath microcode iTCO_wdt iTCO_vendor_support i2c_i801 lpc_ich mfd_core shpchp sg igb dca ptp pps_core tcp_bic ext3 jbd sd_mod crc_t10dif sr_mod cdrom ahci pata_acpi ata_generic pata_jmicron mptfc scsi_transport_fc scsi_tgt mptsas mptscsih mptbase scsi_transport_sas mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) ipv6 mlx4_core(U) mlx_compat(U) memtrack(U) usb_storage radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod gru [last unloaded: scsi_wait_scan]
Apr 25 17:14:57 nbp9-mds kernel: Pid: 85906, comm: mdt_rdpg00_042 Tainted: G        W  ---------------    2.6.32-504.30.3.el6.20151008.x86_64.lustre253 #1
Apr 25 17:14:57 nbp9-mds kernel: Call Trace:
Apr 25 17:14:57 nbp9-mds kernel: [<ffffffff81074127>] ? warn_slowpath_common+0x87/0xc0
Apr 25 17:14:57 nbp9-mds kernel: [<ffffffff8107417a>] ? warn_slowpath_null+0x1a/0x20
Apr 25 17:14:57 nbp9-mds kernel: [<ffffffffa0a1c33e>] ? ldiskfs_journal_start_sb+0xce/0xe0 [ldiskfs]
Apr 25 17:14:57 nbp9-mds kernel: [<ffffffffa0d6069f>] ? osd_trans_start+0x1df/0x660 [osd_ldiskfs]
Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa0ef3619>] ? lod_trans_start+0x1b9/0x250 [lod]
Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa0f7af07>] ? mdd_trans_start+0x17/0x20 [mdd]
Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa0f61ece>] ? mdd_close+0x6be/0xb80 [mdd]
Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa0e48be9>] ? mdt_mfd_close+0x4a9/0x1bc0 [mdt]
Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa0899525>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa08c07f6>] ? __req_capsule_get+0x166/0x710 [ptlrpc]
Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa089a53e>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa06ebf05>] ? class_handle2object+0x95/0x190 [obdclass]
Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa0e4b6a2>] ? mdt_close+0x642/0xa80 [mdt]
Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa0e1fada>] ? mdt_handle_common+0x52a/0x1470 [mdt]
Apr 25 17:15:10 nbp9-mds multipathd: nbp9_MGS_MDS: sdc - rdac checker reports path is down
Apr 25 17:15:10 nbp9-mds multipathd: checker failed path 8:32 in map nbp9_MGS_MDS
Apr 25 17:15:10 nbp9-mds multipathd: nbp9_MGS_MDS: remaining active paths: 1
Apr 25 17:15:10 nbp9-mds multipathd: sdd: remove path (uevent)
Apr 25 17:15:10 nbp9-mds multipathd: nbp9_MGS_MDS: failed in domap for removal of path sdd
Apr 25 17:15:10 nbp9-mds multipathd: uevent trigger error
Apr 25 17:15:10 nbp9-mds multipathd: sdc: remove path (uevent)
Apr 25 17:15:10 nbp9-mds kernel: [<ffffffffa0e5c4e5>] ? mds_readpage_handle+0x15/0x20 [mdt]
Apr 25 17:15:10 nbp9-mds kernel: [<ffffffffa08a90c5>] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
Apr 25 17:15:10 nbp9-mds kernel: [<ffffffffa05d18d5>] ? lc_watchdog_touch+0x65/0x170 [libcfs]
Apr 25 17:15:10 nbp9-mds kernel: [<ffffffffa08a1a69>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
Apr 25 17:15:10 nbp9-mds kernel: [<ffffffffa08ab89d>] ? ptlrpc_main+0xafd/0x1780 [ptlrpc]
Apr 25 17:15:10 nbp9-mds kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
Apr 25 17:15:10 nbp9-mds kernel: [<ffffffffa08aada0>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
Apr 25 17:15:10 nbp9-mds kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20

Apr 25 17:15:16 nbp9-mds multipathd: nbp9_MGS_MDS: map in use
Apr 25 17:15:16 nbp9-mds multipathd: nbp9_MGS_MDS: can't flush

The server had to be rebooted and e2fsck run to get it back into production.

Comment by Nathan Dauchy (Inactive) [ 26/Apr/16 ]

Not sure how closely tied this is, but... after the reboot we ran e2fsck on the MDT. The fsck was able replay the journal but during the scan found lots of unattached inodes and moved them to /lost+found. Interestingly, most of those inodes were almost sequential:

Unattached inode 584092565
Connect to /lost+found<y>? yes
Unattached inode 584092567
Connect to /lost+found<y>? yes
Unattached inode 584092637
Connect to /lost+found<y>? yes
Unattached inode 584092640
Connect to /lost+found<y>? yes
Unattached inode 584092643
Connect to /lost+found<y>? yes
Unattached inode 584092645
Connect to /lost+found<y>? yes
Unattached inode 584092648
Connect to /lost+found<y>? yes
Unattached inode 584092715
Connect to /lost+found<y>? yes
Unattached inode 584092717
Connect to /lost+found<y>? yes
Unattached inode 584092720
Connect to /lost+found<y>? yes
Unattached inode 584092723
Connect to /lost+found<y>? yes
Unattached inode 584092784

Mounted as ldiskfs and saw that all but about 20 files were owned by root and had surprisingly old date stamps:

nbp9-mds ~ # ls -l /mnt/lustre/nbp9-mdt/lost+found/
total 6816
-rw------- 1 root     root   5792 Jul 26  1970 #17831719
-rw------- 1 root     root   5792 Jul 26  1970 #17831723
-rw------- 1 root     root   5792 Feb  4  1988 #571005637
-rw------- 1 root     root   5792 Feb  4  1988 #571016437
-rw------- 1 root     root   5792 Feb  4  1988 #571029133
...
-rw------- 1 root     root   4832 Jul  5  1988 #584109225
-rw------- 1 root     root   4832 Jul  5  1988 #584109227
-rw------- 1 root     root   4832 Jul  5  1988 #584109231
-rw------- 1 root     root   4832 Jul  5  1988 #584109233
-rw------- 1 root     root   4832 Jul  5  1988 #584109235

nbp9-mds ~ # ls -l /mnt/lustre/nbp9-mdt/lost+found | grep -v " root "
total 47328
-rwxr----- 1 dmenemen g26209    0 Feb 10  2014 #570436464
-rw-r--r-- 1 dmenemen g26209    0 Feb 10  2014 #570436465
-rwxr----- 1 dmenemen g26209    0 Feb 10  2014 #570436466
-rw-r--r-- 1 dmenemen g26209    0 Feb 10  2014 #570436467
-rw-r--r-- 1 dmenemen g26209    0 Feb 10  2014 #570436468
-rw-r--r-- 1 dmenemen g26209    0 Feb 10  2014 #570436469
...
(another dozen or so more similar files)

Ran "ll_recover_lost_found_objs" on the MDT mounted as ldiskfs, and it reported results like:

Object /mnt/lustre/nbp9-mdt/O/22d0c13d/d23/cb410817 restored.
Object /mnt/lustre/nbp9-mdt/O/22d0c13f/d27/cb41083b restored.
Object /mnt/lustre/nbp9-mdt/O/22d0c145/d1/cb504661 restored.
Object /mnt/lustre/nbp9-mdt/O/22d0c147/d27/cb50467b restored.
Object /mnt/lustre/nbp9-mdt/O/22d0c153/d16/cb504810 restored.
Object /mnt/lustre/nbp9-mdt/O/22d0c155/d20/cb504814 restored.

(It got all the non-root owned files)

I believe that the rest of the files in lost+found were probably either useless or were never pointing to user data... they have no xattrs:

# getfattr -d -m ".*" -e hex /mnt/lustre/nbp9-mdt/lost+found/*

(returned nothing!)

So, we ran another e2fsck (which came back clean!) and put the filesystem back into production.

Generated at Sat Feb 10 02:10:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.