[LU-513] cfs_wait_event_interruptible_exclusive is not exclusive Created: 19/Jul/11  Updated: 10/Jan/12  Resolved: 31/Oct/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.2.0, Lustre 2.1.1

Type: Bug Priority: Major
Reporter: Christopher Morrone Assignee: Liang Zhen (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

PPC64, RHEL6.1 (but should apply to all platforms)


Severity: 3
Rank (Obsolete): 4877

 Description   

It looks like cfs_wait_event_interruptible_exclusive() was accidentally written to use wait_event_interruptible(), when it should use wait_event_interruptible_exclusive().

This was expecially apparent when trying to run lnet selftest on Sequoia IO Nodes where there is 4-way SMT with 17 cores. This makes 68 "cpus" in linux, so Lustre is creating 68 cfs_wi_sd* scheduler threads.

Even when writing to a single peer (over o2iblnd) with concurrency 1, I saw ALL of the cfs_wi_sd threads eating all of the CPU time. Throughput numbers were also quite low (680MB/s writing to an x86_64 server, concurrency level 16). My suspicion was that all of the tasks were being woken up instead of only one for each work item. That is what eventually made me find the missing _exclusive() extension.

When I changed the it to use the _exclusive() version of the function, the cfs_wi_sd* threads no longer appear at the top of "top"'s display, and throughput jumped to 2750MB/s on the write test.

This would be a problem on all architectures, but with its 68 "processors", it is just more obviously a problem on Sequoia.

I'll submit a one-line patch.



 Comments   
Comment by Christopher Morrone [ 19/Jul/11 ]

Patch here:

http://review.whamcloud.com/1118

Comment by Peter Jones [ 25/Jul/11 ]

Liang

Are you able to handle this landing or should I reassign it to another engineer? In any case your inspection of this (very small) patch would be appreciated

Thanks

Peter

Comment by Liang Zhen (Inactive) [ 25/Jul/11 ]

Hi Peter, I think it doesn't better who is the owner because Chris has already posted the fix.

Comment by Build Master (Inactive) [ 25/Oct/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #310
LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

Oleg Drokin : c202086061147673dc6ad08c52befc4931229b86
Files :

  • libcfs/include/libcfs/linux/portals_compat25.h
Comment by Build Master (Inactive) [ 25/Oct/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #310
LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

Oleg Drokin : c202086061147673dc6ad08c52befc4931229b86
Files :

  • libcfs/include/libcfs/linux/portals_compat25.h
Comment by Build Master (Inactive) [ 25/Oct/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #310
LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

Oleg Drokin : c202086061147673dc6ad08c52befc4931229b86
Files :

  • libcfs/include/libcfs/linux/portals_compat25.h
Comment by Build Master (Inactive) [ 25/Oct/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #310
LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

Oleg Drokin : c202086061147673dc6ad08c52befc4931229b86
Files :

  • libcfs/include/libcfs/linux/portals_compat25.h
Comment by Build Master (Inactive) [ 25/Oct/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #310
LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

Oleg Drokin : c202086061147673dc6ad08c52befc4931229b86
Files :

  • libcfs/include/libcfs/linux/portals_compat25.h
Comment by Build Master (Inactive) [ 25/Oct/11 ]

Integrated in lustre-master » i686,client,el5,ofa #310
LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

Oleg Drokin : c202086061147673dc6ad08c52befc4931229b86
Files :

  • libcfs/include/libcfs/linux/portals_compat25.h
Comment by Build Master (Inactive) [ 25/Oct/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #310
LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

Oleg Drokin : c202086061147673dc6ad08c52befc4931229b86
Files :

  • libcfs/include/libcfs/linux/portals_compat25.h
Comment by Build Master (Inactive) [ 25/Oct/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #310
LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

Oleg Drokin : c202086061147673dc6ad08c52befc4931229b86
Files :

  • libcfs/include/libcfs/linux/portals_compat25.h
Comment by Build Master (Inactive) [ 25/Oct/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #310
LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

Oleg Drokin : c202086061147673dc6ad08c52befc4931229b86
Files :

  • libcfs/include/libcfs/linux/portals_compat25.h
Comment by Build Master (Inactive) [ 25/Oct/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #310
LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

Oleg Drokin : c202086061147673dc6ad08c52befc4931229b86
Files :

  • libcfs/include/libcfs/linux/portals_compat25.h
Comment by Build Master (Inactive) [ 25/Oct/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #310
LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

Oleg Drokin : c202086061147673dc6ad08c52befc4931229b86
Files :

  • libcfs/include/libcfs/linux/portals_compat25.h
Comment by Build Master (Inactive) [ 25/Oct/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #310
LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

Oleg Drokin : c202086061147673dc6ad08c52befc4931229b86
Files :

  • libcfs/include/libcfs/linux/portals_compat25.h
Comment by Build Master (Inactive) [ 25/Oct/11 ]

Integrated in lustre-master » i686,server,el5,ofa #310
LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

Oleg Drokin : c202086061147673dc6ad08c52befc4931229b86
Files :

  • libcfs/include/libcfs/linux/portals_compat25.h
Comment by Build Master (Inactive) [ 25/Oct/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #310
LU-513 Make cfs_wait_event_interruptible_exclusive really exclusive

Oleg Drokin : c202086061147673dc6ad08c52befc4931229b86
Files :

  • libcfs/include/libcfs/linux/portals_compat25.h
Comment by Peter Jones [ 31/Oct/11 ]

Landed for 2.2

Generated at Sat Feb 10 01:07:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.