[LU-12640] per cpu env race Created: 08/Aug/19  Updated: 16/May/20  Resolved: 16/May/20

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Wang Shilong (Inactive) Assignee: Wang Shilong (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The checking and assgin below could be racy:

    if (lu_env_percpu[i].lep_task == current) {
            env = lu_env_percpu[i].lep_env;
    
    CPU0
      thread0 lu_env_percpu[i].lep_task == current
         thread1 switch into CPU0 changes lu_env_percpu[0]
           thread0 switch back CPU0,return lu_env_percpu[0].lep_env;
    

The problem is access and change per_cpu variable here
shall be atomic, since it is per cpu variable, we should
take care it not been preempted.



 Comments   
Comment by Alex Zhuravlev [ 08/Aug/19 ]

get_cpu() line above is supposed to prevent scheduling.

Comment by Gerrit Updater [ 08/Aug/19 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/35728
Subject: LU-12640 obdclass: percpu env race
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 74e41620a79f6ba97d02638e4e4b9b57a891a729

Comment by Andreas Dilger [ 15/May/20 ]

Shilong, how often are you seeing this? I don't see any stack dumps in this ticket, links to other tickets, links to failures in Maloo, bug reproducer test case in the patch, etc., so it seems very unlikely to happed in real life? I'm wondering if I (or, preferably, you) need to chase reviewers, or it should be closed with "Cannot Reproduce" and the patch abandoned until someone hits it again (which would be easier to find if there was a stack trace in this ticket).

Generated at Sat Feb 10 02:54:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.