Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8071

lvcreate --snapshot of MDT hangs in ldiskfs_journal_start_sb

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.5.3
    • None
    • CentOS-6.7
      lustre-2.5.3
      lvm2-2.02.118-3.el6_7.4
      Also note that the MDT uses an external journal device.
    • 3
    • 9223372036854775807

    Description

      Similar to LU-7616 "creation of LVM snapshot on ldiskfs based MDT hangs until MDT activity/use is halted", but opening a new case for tracking.

      The goal is to use LVM snapshots and tar to make file level MDT backups. Procedure worked fine 2 or 3 times, then we triggered the following problem on a recent attempt.

      The MDS became extremely sluggish, and all MDT threads went into D state, when running the following command:

      lvcreate -l95%FREE -s -p r -n mdt_snap /dev/nbp9-vg/mdt9
      

      (the command never returned, and any further lv* commands hung as well)

      In the logs...

      Apr 25 17:09:35 nbp9-mds kernel: WARNING: at /usr/src/redhat/BUILD/lustre-2.5.3/ldiskfs/super.c:280 ldiskfs_journal_start_sb+0xce/0xe0 [ldiskfs]() (Not tainted)
      Apr 25 17:14:45 nbp9-mds ]
      Apr 25 17:14:45 nbp9-mds kernel: [<ffffffffa0e5c4e5>] ? mds_readpage_handle+0x15/0x20 [mdt]
      Apr 25 17:14:45 nbp9-mds kernel: [<ffffffffa08a90c5>] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
      Apr 25 17:14:45 nbp9-mds kernel: [<ffffffffa05d18d5>] ? lc_watchdog_touch+0x65/0x170 [libcfs]
      Apr 25 17:14:45 nbp9-mds kernel: [<ffffffffa08a1a69>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
      Apr 25 17:14:45 nbp9-mds kernel: [<ffffffffa08ab89d>] ? ptlrpc_main+0xafd/0x1780 [ptlrpc]
      Apr 25 17:14:45 nbp9-mds kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
      Apr 25 17:14:45 nbp9-mds kernel: [<ffffffffa08aada0>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
      Apr 25 17:14:45 nbp9-mds kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
      Apr 25 17:14:45 nbp9-mds kernel: ---[ end trace c9f3339c0e103edf ]---
      
      Apr 25 17:14:57 nbp9-mds kernel: WARNING: at /usr/src/redhat/BUILD/lustre-2.5.3/ldiskfs/super.c:280 ldiskfs_journal_start_sb+0xce/0xe0 [ldiskfs]() (Tainted: G        W  ---------------   )
      Apr 25 17:14:57 nbp9-mds kernel: Hardware name: AltixXE270
      Apr 25 17:14:57 nbp9-mds kernel: Modules linked in: dm_snapshot dm_bufio osp(U) mdd(U) lfsck(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) ldiskfs(U) jbd2 lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic crc32c_intel libcfs(U) sunrpc bonding ib_ucm(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) configfs ib_ipoib(U) ib_cm(U) ib_uverbs(U) ib_umad(U) dm_round_robin scsi_dh_rdac dm_multipath microcode iTCO_wdt iTCO_vendor_support i2c_i801 lpc_ich mfd_core shpchp sg igb dca ptp pps_core tcp_bic ext3 jbd sd_mod crc_t10dif sr_mod cdrom ahci pata_acpi ata_generic pata_jmicron mptfc scsi_transport_fc scsi_tgt mptsas mptscsih mptbase scsi_transport_sas mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) ipv6 mlx4_core(U) mlx_compat(U) memtrack(U) usb_storage radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod gru [last unloaded: scsi_wait_scan]
      Apr 25 17:14:57 nbp9-mds kernel: Pid: 85906, comm: mdt_rdpg00_042 Tainted: G        W  ---------------    2.6.32-504.30.3.el6.20151008.x86_64.lustre253 #1
      Apr 25 17:14:57 nbp9-mds kernel: Call Trace:
      Apr 25 17:14:57 nbp9-mds kernel: [<ffffffff81074127>] ? warn_slowpath_common+0x87/0xc0
      Apr 25 17:14:57 nbp9-mds kernel: [<ffffffff8107417a>] ? warn_slowpath_null+0x1a/0x20
      Apr 25 17:14:57 nbp9-mds kernel: [<ffffffffa0a1c33e>] ? ldiskfs_journal_start_sb+0xce/0xe0 [ldiskfs]
      Apr 25 17:14:57 nbp9-mds kernel: [<ffffffffa0d6069f>] ? osd_trans_start+0x1df/0x660 [osd_ldiskfs]
      Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa0ef3619>] ? lod_trans_start+0x1b9/0x250 [lod]
      Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa0f7af07>] ? mdd_trans_start+0x17/0x20 [mdd]
      Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa0f61ece>] ? mdd_close+0x6be/0xb80 [mdd]
      Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa0e48be9>] ? mdt_mfd_close+0x4a9/0x1bc0 [mdt]
      Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa0899525>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa08c07f6>] ? __req_capsule_get+0x166/0x710 [ptlrpc]
      Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa089a53e>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
      Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa06ebf05>] ? class_handle2object+0x95/0x190 [obdclass]
      Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa0e4b6a2>] ? mdt_close+0x642/0xa80 [mdt]
      Apr 25 17:15:06 nbp9-mds kernel: [<ffffffffa0e1fada>] ? mdt_handle_common+0x52a/0x1470 [mdt]
      Apr 25 17:15:10 nbp9-mds multipathd: nbp9_MGS_MDS: sdc - rdac checker reports path is down
      Apr 25 17:15:10 nbp9-mds multipathd: checker failed path 8:32 in map nbp9_MGS_MDS
      Apr 25 17:15:10 nbp9-mds multipathd: nbp9_MGS_MDS: remaining active paths: 1
      Apr 25 17:15:10 nbp9-mds multipathd: sdd: remove path (uevent)
      Apr 25 17:15:10 nbp9-mds multipathd: nbp9_MGS_MDS: failed in domap for removal of path sdd
      Apr 25 17:15:10 nbp9-mds multipathd: uevent trigger error
      Apr 25 17:15:10 nbp9-mds multipathd: sdc: remove path (uevent)
      Apr 25 17:15:10 nbp9-mds kernel: [<ffffffffa0e5c4e5>] ? mds_readpage_handle+0x15/0x20 [mdt]
      Apr 25 17:15:10 nbp9-mds kernel: [<ffffffffa08a90c5>] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
      Apr 25 17:15:10 nbp9-mds kernel: [<ffffffffa05d18d5>] ? lc_watchdog_touch+0x65/0x170 [libcfs]
      Apr 25 17:15:10 nbp9-mds kernel: [<ffffffffa08a1a69>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
      Apr 25 17:15:10 nbp9-mds kernel: [<ffffffffa08ab89d>] ? ptlrpc_main+0xafd/0x1780 [ptlrpc]
      Apr 25 17:15:10 nbp9-mds kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
      Apr 25 17:15:10 nbp9-mds kernel: [<ffffffffa08aada0>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
      Apr 25 17:15:10 nbp9-mds kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
      
      Apr 25 17:15:16 nbp9-mds multipathd: nbp9_MGS_MDS: map in use
      Apr 25 17:15:16 nbp9-mds multipathd: nbp9_MGS_MDS: can't flush
      

      The server had to be rebooted and e2fsck run to get it back into production.

      Attachments

        1. bt.all
          917 kB
        2. dmesg.out
          494 kB
        3. lostfound_nonzero.lst
          80 kB
        4. nagtest.toobig.stripes
          36 kB

        Issue Links

          Activity

            [LU-8071] lvcreate --snapshot of MDT hangs in ldiskfs_journal_start_sb
            pjones Peter Jones added a comment -

            Thanks Jay!

            pjones Peter Jones added a comment - Thanks Jay!

            Yes, this ticket can be closed. Thanks!

            jaylan Jay Lan (Inactive) added a comment - Yes, this ticket can be closed. Thanks!
            pjones Peter Jones added a comment -

            Now landed for 2.9 and queued up for maintenance releases. Is there anything further you need on this ticket or can it be closed?

            pjones Peter Jones added a comment - Now landed for 2.9 and queued up for maintenance releases. Is there anything further you need on this ticket or can it be closed?

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20062/
            Subject: LU-8071 ldiskfs: handle system freeze protection
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: bd40ca206881eefeeb6ad7586f93afd685bb8120

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20062/ Subject: LU-8071 ldiskfs: handle system freeze protection Project: fs/lustre-release Branch: master Current Patch Set: Commit: bd40ca206881eefeeb6ad7586f93afd685bb8120

            I would recommend against using the ext4-give-warning-with-dir-htree-growing.patch as this also requires other changes to the Lustre code. The other changes are OK to use on other kernels.

            Also, are the following patches from the upstream kernel already applied on your systems?

            commit 437f88cc031ffe7f37f3e705367f4fe1f4be8b0f
            Author:     Eric Sandeen <sandeen@sandeen.net>
            AuthorDate: Sun Aug 1 17:33:29 2010 -0400
            Commit:     Theodore Ts'o <tytso@mit.edu>
            CommitDate: Sun Aug 1 17:33:29 2010 -0400
            
                ext4: fix freeze deadlock under IO
                
                Commit 6b0310fbf087ad6 caused a regression resulting in deadlocks
                when freezing a filesystem which had active IO; the vfs_check_frozen
                level (SB_FREEZE_WRITE) did not let the freeze-related IO syncing
                through.  Duh.
                
                Changing the test to FREEZE_TRANS should let the normal freeze
                syncing get through the fs, but still block any transactions from
                starting once the fs is completely frozen.
                
                I tested this by running fsstress in the background while periodically
                snapshotting the fs and running fsck on the result.  I ran into
                occasional deadlocks, but different ones.  I think this is a
                fine fix for the problem at hand, and the other deadlocky things
                will need more investigation.
                
                Reported-by: Phillip Susi <psusi@cfl.rr.com>
                Signed-off-by: Eric Sandeen <sandeen@redhat.com>
                Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
            
            commit 6b0310fbf087ad6e9e3b8392adca97cd77184084
            Author:     Eric Sandeen <sandeen@redhat.com>
            AuthorDate: Sun May 16 02:00:00 2010 -0400
            Commit:     Theodore Ts'o <tytso@mit.edu>
            CommitDate: Sun May 16 02:00:00 2010 -0400
            
                ext4: don't return to userspace after freezing the fs with a mutex held
                
                ext4_freeze() used jbd2_journal_lock_updates() which takes
                the j_barrier mutex, and then returns to userspace.  The
                kernel does not like this:
                
                ================================================
                [ BUG: lock held when returning to user space! ]
                ------------------------------------------------
                lvcreate/1075 is leaving the kernel with locks still held!
                1 lock held by lvcreate/1075:
                 #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>]
                jbd2_journal_lock_updates+0xe1/0xf0
                
                Use vfs_check_frozen() added to ext4_journal_start_sb() and
                ext4_force_commit() instead.
                
                Addresses-Red-Hat-Bugzilla: #568503
                
                Signed-off-by: Eric Sandeen <sandeen@redhat.com>
                Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
            

            I would guess yes, since they originated from Red Hat, but just wanted to confirm.

            adilger Andreas Dilger added a comment - I would recommend against using the ext4-give-warning-with-dir-htree-growing.patch as this also requires other changes to the Lustre code. The other changes are OK to use on other kernels. Also, are the following patches from the upstream kernel already applied on your systems? commit 437f88cc031ffe7f37f3e705367f4fe1f4be8b0f Author: Eric Sandeen <sandeen@sandeen.net> AuthorDate: Sun Aug 1 17:33:29 2010 -0400 Commit: Theodore Ts'o <tytso@mit.edu> CommitDate: Sun Aug 1 17:33:29 2010 -0400 ext4: fix freeze deadlock under IO Commit 6b0310fbf087ad6 caused a regression resulting in deadlocks when freezing a filesystem which had active IO; the vfs_check_frozen level (SB_FREEZE_WRITE) did not let the freeze-related IO syncing through. Duh. Changing the test to FREEZE_TRANS should let the normal freeze syncing get through the fs, but still block any transactions from starting once the fs is completely frozen. I tested this by running fsstress in the background while periodically snapshotting the fs and running fsck on the result. I ran into occasional deadlocks, but different ones. I think this is a fine fix for the problem at hand, and the other deadlocky things will need more investigation. Reported-by: Phillip Susi <psusi@cfl.rr.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> commit 6b0310fbf087ad6e9e3b8392adca97cd77184084 Author: Eric Sandeen <sandeen@redhat.com> AuthorDate: Sun May 16 02:00:00 2010 -0400 Commit: Theodore Ts'o <tytso@mit.edu> CommitDate: Sun May 16 02:00:00 2010 -0400 ext4: don't return to userspace after freezing the fs with a mutex held ext4_freeze() used jbd2_journal_lock_updates() which takes the j_barrier mutex, and then returns to userspace. The kernel does not like this: ================================================ [ BUG: lock held when returning to user space! ] ------------------------------------------------ lvcreate/1075 is leaving the kernel with locks still held! 1 lock held by lvcreate/1075: #0: (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>] jbd2_journal_lock_updates+0xe1/0xf0 Use vfs_check_frozen() added to ext4_journal_start_sb() and ext4_force_commit() instead. Addresses-Red-Hat-Bugzilla: #568503 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> I would guess yes, since they originated from Red Hat, but just wanted to confirm.

            The last six patches in ldiskfs-2.6-rhel6.6.series of the master branch are:

            rhel6.3/ext4-drop-inode-from-orphan-list-if-ext4_delete_inode-fails.patch
            rhel6.6/ext4-remove-truncate-warning.patch
            rhel6.6/ext4-corrupted-inode-block-bitmaps-handling-patches.patch
            rhel6.3/ext4-notalloc_under_idatasem.patch
            rhel6.5/ext4-give-warning-with-dir-htree-growing.patch
            rhel6.6/ext4_s_max_ext_tree_depth.patch

            Only the first 2 patches have already been picked into b2_7_fe. All six have not been picked to b2_5_fe.

            We are running Centos 6.6 and it seems to me these patches are important to have also. Some of our servers run 2.5.3 and the rest run 2.7.1. Is it safe for us to pick up those missing ldiskfs kernel patches? Please advise.

            jaylan Jay Lan (Inactive) added a comment - The last six patches in ldiskfs-2.6-rhel6.6.series of the master branch are: rhel6.3/ext4-drop-inode-from-orphan-list-if-ext4_delete_inode-fails.patch rhel6.6/ext4-remove-truncate-warning.patch rhel6.6/ext4-corrupted-inode-block-bitmaps-handling-patches.patch rhel6.3/ext4-notalloc_under_idatasem.patch rhel6.5/ext4-give-warning-with-dir-htree-growing.patch rhel6.6/ext4_s_max_ext_tree_depth.patch Only the first 2 patches have already been picked into b2_7_fe. All six have not been picked to b2_5_fe. We are running Centos 6.6 and it seems to me these patches are important to have also. Some of our servers run 2.5.3 and the rest run 2.7.1. Is it safe for us to pick up those missing ldiskfs kernel patches? Please advise.

            People

              yong.fan nasf (Inactive)
              ndauchy Nathan Dauchy (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: