Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
RHEL7 Server
Lustre 2.7.x series
-
3
-
9223372036854775807
Description
On one Customer site, we hit following deadlock: Thread 1: ofd_object_punch osd_punch ldiskfs_truncate ldiskfs_inode_attach_jinode ... do_try_to_free_pages lu_cache_shrink mutex_lock -->try to hold @lu_sites_guard kswapd thread2: kthread shrink_slab lu_cache_shrink mutex_lock ---->hold already. ... dqget ldiskfs_acquire_dquot jbd2__journal_start-->blocked to wait for more credits. Thread3: kthread kjournald2 jbd2_journal_commit_transaction-->blocked to wait Thread2 finished, since Thread1 add a handle into transaction. So deadlock happens because of Thread1 wait Thread2, Thread2 wait Thread3.. but Thread3 wait Thread1.... This problem still exists even we have switched @lu_sites_guard into a read/write lock, sine we hold write lock at lu_cahce_shrink(). Fixed the problem by making ldiskfs_inode_attach_jinode() use GPF_NOFS.
This patch was abandoned but its changes were never rolled into the primary patch for this ticket ( https://review.whamcloud.com/31806/). Should Bob's patch be revived?Edit: Nevermind, I misread the patches