[LU-1721] hit LASSERT(!cfs_list_empty(&req->rq_timed_list)) in ptlrpc_server_drop_request Created: 08/Aug/12  Updated: 22/Aug/12  Resolved: 22/Aug/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: Lustre 2.3.0, Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Liang Zhen (Inactive) Assignee: Liang Zhen (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4482

 Description   

This bug is introduced in by commit 07b8db220e48782369f48d86213c5d404a628ded , which make ptlrpc_server_drop_request() not to hold at_lock for checking req::rq_at_linked. This change might race with ptlrpc_at_check_timed() if:

  1. thread-1: call ptlrpc_at_check_timed() and remove the request from paa_reqs_array, before it set req::rq_at_linked to zero...
  2. thread-2: call ptlrpc_server_drop_request() to release the last refcount, and it found req::rq_at_linked is non-zero, so it entered the condition "if (req->rq_at_linked) {...}

    "

  3. thread-1: set req::rq_at_linked to zero
  4. thread-2: take at_lock, and hit LASSERT(!cfs_list_empty(&req->rq_timed_list)) because thread-1 has already removed req::rq_at_linked from paa_reqs_array in step-1


 Comments   
Comment by Liang Zhen (Inactive) [ 08/Aug/12 ]

patch is here: http://review.whamcloud.com/#change,3564

Comment by Peter Jones [ 22/Aug/12 ]

Landed for 2.3 and 2.4

Generated at Sat Feb 10 01:19:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.