Details
-
Technical task
-
Resolution: Fixed
-
Minor
-
Lustre 2.4.0
-
7449
Description
By running the Coverity tool on the Lustre code, we may have found a possible thread deadlock.
In function put_pages_on_daemon_list(), the loop with cfs_tcd_for_each_type_lock() acquires the lock "cfs_trace_cpu_data.tcd_lock". And inside this loop, the call to put_pages_on_tcd_daemon_list() acquires the lock "page_collection.pc_lock".
However, we have two other cases where the locks tcd_lock and pc_lock are taken in reverse order.
In collect_pages_on_all_cpus(), spin_lock(&pc->pc_lock) acquires the lock "pc->pc_lock", and then the loop with cfs_tcd_for_each_type_lock() acquires the lock "cfs_trace_cpu_data.tcd_lock".
We have the same situation in put_pages_back_on_all_cpus().
In the end it seems a thread deadlock could occur. What do you think?
Sebastien.