Tracking bug for static code analysis fixes. (LU-2753)

[LU-3055] Possible deadlock in put_pages_on_daemon_list() Created: 28/Mar/13  Updated: 25/Oct/13  Resolved: 25/Oct/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.6.0

Type: Technical task Priority: Minor
Reporter: Sebastien Buisson (Inactive) Assignee: Liang Zhen (Inactive)
Resolution: Fixed Votes: 0
Labels: coverity

Rank (Obsolete): 7449

 Description   

By running the Coverity tool on the Lustre code, we may have found a possible thread deadlock.

In function put_pages_on_daemon_list(), the loop with cfs_tcd_for_each_type_lock() acquires the lock "cfs_trace_cpu_data.tcd_lock". And inside this loop, the call to put_pages_on_tcd_daemon_list() acquires the lock "page_collection.pc_lock".

However, we have two other cases where the locks tcd_lock and pc_lock are taken in reverse order.
In collect_pages_on_all_cpus(), spin_lock(&pc->pc_lock) acquires the lock "pc->pc_lock", and then the loop with cfs_tcd_for_each_type_lock() acquires the lock "cfs_trace_cpu_data.tcd_lock".
We have the same situation in put_pages_back_on_all_cpus().

In the end it seems a thread deadlock could occur. What do you think?

Sebastien.



 Comments   
Comment by Peter Jones [ 28/Mar/13 ]

Liang

Are you able to comment?

Thanks

Peter

Comment by Liang Zhen (Inactive) [ 15/Sep/13 ]

This code is not very clean, but it's impossible to deadlock:

  • put_pages_on_daemon_list() is only called by kthread tracefiled(), and @pc here is a structure in thread stack
  • the only chance that the same @pc can be referred by collect_pages_on_all_cpus() is, the thread thread call collect_pages()->>collect_pages_on_all_cpus()

so @pc is not going to be shared between different threads.
Actually, after I looked into code, I think pc_lock is totally useless, it's there because we used to have a function trace_call_on_all_cpus() and it's supposed to protect race between CPUs (see BZ15878), now all use-cases of page_collection are actually just per-thread and in-stack, so we can simply remove this lock.

Comment by Liang Zhen (Inactive) [ 05/Oct/13 ]

patch is ready to land: http://review.whamcloud.com/#/c/7660/

Comment by Peter Jones [ 25/Oct/13 ]

Landed for 2.6

Generated at Sat Feb 10 01:30:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.