[LU-1416] SMP Node thread affinity Created: 16/May/12  Updated: 07/Mar/14  Resolved: 07/Mar/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major
Reporter: Eva Hocks (Inactive) Assignee: Liang Zhen (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

SMP / vSMP


Attachments: File lustre_b1_8.thread-affinity.diff    
Rank (Obsolete): 4037

 Description   

Lustre by default starts threads by the count of processors in the system. In a SMP system there may be 512 procs which create 512 Lustre threads! The patch attached for version 1.8.7.70 reduces the number of threads to the number specified in the ksocknal.conf file as well allows to bind those threads to specific processors:
options ksocklnd ncpus=12
options ksocklnd cpu_affinity_off=256

This reduction in threads and placement of those threads is essential for any lustre performance in a SMP system! Please consider integration of this new feature / patch in the general lustre distribution so everyone can benefit from better lustre performance in a SMP system.

Thanks
Eva Hocks, SDSC



 Comments   
Comment by Andreas Dilger [ 16/May/12 ]

Liang is already working in SMP thread affinity.

Comment by Liang Zhen (Inactive) [ 17/May/12 ]

I think it could be fine to have such a patch for 1.8.x because we are not going to backport SMP patch from 2.3 to 1.8.x.

Eva, I'd like to see this patch, could you please post it?

Liang

Comment by Eva Hocks (Inactive) [ 17/May/12 ]

Liang,

I included it as an attachment to the ticket. Please let me know if you cannot
get it.

Thanks
Eva

Comment by Eva Hocks (Inactive) [ 17/May/12 ]

Here is the patch again as attachment as well as included in this email.

-Eva

diff -rwup lustre_b1_8.orig/aclocal.m4 lustre_b1_8/aclocal.m4
— lustre_b1_8.orig/aclocal.m4 2012-04-05 14:54:39.000000000 -0700
+++ lustre_b1_8/aclocal.m4 2012-04-28 14:22:12.000000000 -0700
@@ -3597,7 +3597,7 @@ else
#else
unsigned long m;
#endif

  • set_cpus_allowed(&t, m);
    + set_cpus_allowed_ptr(&t, &m);
    ],[
    AC_DEFINE(CPU_AFFINITY, 1, [kernel has cpu affinity support])
    AC_MSG_RESULT([yes])
    diff -rwup lustre_b1_8.orig/configure lustre_b1_8/configure
      • lustre_b1_8.orig/configure 2012-04-05 14:54:34.000000000 -0700
        +++ lustre_b1_8/configure 2012-04-28 14:22:12.000000000 -0700
        @@ -6189,7 +6189,7 @@ main (void)
        #else
        unsigned long m;
        #endif
  • set_cpus_allowed(&t, m);
    + set_cpus_allowed_ptr(&t, &m);

;
return 0;
diff -rwup lustre_b1_8.orig/lnet/autoconf/lustre-lnet.m4 lustre_b1_8/lnet/autoconf/lustre-lnet.m4
— lustre_b1_8.orig/lnet/autoconf/lustre-lnet.m4 2012-04-05 14:54:34.000000000 -0700
+++ lustre_b1_8/lnet/autoconf/lustre-lnet.m4 2012-04-28 14:22:12.000000000 -0700
@@ -107,7 +107,7 @@ else
#else
unsigned long m;
#endif

  • set_cpus_allowed(&t, m);
    + set_cpus_allowed_ptr(&t, &m);
    ],[
    AC_DEFINE(CPU_AFFINITY, 1, [kernel has cpu affinity support])
    AC_MSG_RESULT([yes])
    diff -rwup lustre_b1_8.orig/lnet/klnds/socklnd/socklnd.c lustre_b1_8/lnet/klnds/socklnd/socklnd.c
      • lustre_b1_8.orig/lnet/klnds/socklnd/socklnd.c 2012-04-05 14:54:35.000000000 -0700
        +++ lustre_b1_8/lnet/klnds/socklnd/socklnd.c 2012-04-28 16:33:45.000000000 -0700
        @@ -2371,7 +2371,14 @@ ksocknal_base_startup (void)
        ksocknal_data.ksnd_init = SOCKNAL_INIT_DATA;
        PORTAL_MODULE_USE;

+#ifdef CPU_AFFINITY
+ if ((*ksocknal_tunables.ksnd_ncpus < 1) || (*ksocknal_tunables.ksnd_ncpus > ksocknal_nsched()))
+ *ksocknal_tunables.ksnd_ncpus = ksocknal_nsched();
+
+ ksocknal_data.ksnd_nschedulers = *ksocknal_tunables.ksnd_ncpus;
+#else
ksocknal_data.ksnd_nschedulers = ksocknal_nsched();
+#endif
LIBCFS_ALLOC(ksocknal_data.ksnd_schedulers,
sizeof(ksock_sched_t) * ksocknal_data.ksnd_nschedulers);
if (ksocknal_data.ksnd_schedulers == NULL)
diff -rwup lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_cb.c lustre_b1_8/lnet/klnds/socklnd/socklnd_cb.c
— lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_cb.c 2012-04-05 14:54:35.000000000 -0700
+++ lustre_b1_8/lnet/klnds/socklnd/socklnd_cb.c 2012-04-28 16:36:54.000000000 -0700
@@ -2184,6 +2184,9 @@ ksocknal_connd (void *arg)
cfs_daemonize (name);
cfs_block_allsigs ();

+ if (ksocknal_lib_bind_thread_to_node_cpu(id))
+ CERROR ("Can't set CPU affinity for %s to node of CPU %d\n", name, (int)id % *ksocknal_tunables.ksnd_ncpus);
+
cfs_waitlink_init (&wait);

cfs_spin_lock_bh (connd_lock);
@@ -2576,6 +2579,9 @@ ksocknal_reaper (void *arg)
cfs_daemonize ("socknal_reaper");
cfs_block_allsigs ();

+ if (ksocknal_lib_bind_thread_to_node_cpu(0))
+ CERROR ("Can't set CPU affinity for %s to node of CPU %d\n", "socknal_reaper", 0);
+
CFS_INIT_LIST_HEAD(&enomem_conns);
cfs_waitlink_init (&wait);

diff -rwup lustre_b1_8.orig/lnet/klnds/socklnd/socklnd.h lustre_b1_8/lnet/klnds/socklnd/socklnd.h
— lustre_b1_8.orig/lnet/klnds/socklnd/socklnd.h 2012-04-05 14:54:35.000000000 -0700
+++ lustre_b1_8/lnet/klnds/socklnd/socklnd.h 2012-04-28 15:55:14.000000000 -0700
@@ -127,6 +127,8 @@ typedef struct
int ksnd_zc_recv_min_nfrags; / minimum # of fragments to enable ZC receive */
#ifdef CPU_AFFINITY
int ksnd_irq_affinity; / enable IRQ affinity? */
+ unsigned int ksnd_ncpus; / # CPUs */
+ unsigned int ksnd_cpu_affinity_off;/ affinity offset */
#endif
#ifdef SOCKNAL_BACKOFF
int ksnd_backoff_init; / initial TCP backoff */
@@ -607,3 +609,4 @@ extern void ksocknal_lib_csum_tx(ksock_t
extern int ksocknal_lib_memory_pressure(ksock_conn_t *conn);
extern __u64 ksocknal_lib_new_incarnation(void);
extern int ksocknal_lib_bind_thread_to_cpu(int id);
+extern int ksocknal_lib_bind_thread_to_node_cpu(int id);
diff -rwup lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_lib-linux.c lustre_b1_8/lnet/klnds/socklnd/socklnd_lib-linux.c
— lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_lib-linux.c 2012-04-05 14:54:35.000000000 -0700
+++ lustre_b1_8/lnet/klnds/socklnd/socklnd_lib-linux.c 2012-04-28 15:29:01.000000000 -0700
@@ -57,6 +57,8 @@ enum {
SOCKLND_TX_BUFFER_SIZE,
SOCKLND_NAGLE,
SOCKLND_IRQ_AFFINITY,
+ SOCKLND_NCPUS,
+ SOCKLND_CPU_AFFINITY_OFF,
SOCKLND_ROUND_ROBIN,
SOCKLND_KEEPALIVE,
SOCKLND_KEEPALIVE_IDLE,
@@ -86,6 +88,8 @@ enum

{ #define SOCKLND_TX_BUFFER_SIZE CTL_UNNUMBERED #define SOCKLND_NAGLE CTL_UNNUMBERED #define SOCKLND_IRQ_AFFINITY CTL_UNNUMBERED +#define SOCKLND_NCPUS CTL_UNNUMBERED +#define SOCKLND_CPU_AFFINITY_OFF CTL_UNNUMBERED #define SOCKLND_ROUND_ROBIN CTL_UNNUMBERED #define SOCKLND_KEEPALIVE CTL_UNNUMBERED #define SOCKLND_KEEPALIVE_IDLE CTL_UNNUMBERED @@ -263,6 +267,24 @@ static cfs_sysctl_table_t ksocknal_ctl_t .proc_handler = &proc_dointvec, .strategy = &sysctl_intvec, },
+ { + .ctl_name = SOCKLND_NCPUS, + .procname = "ncpus", + .data = &ksocknal_tunables.ksnd_ncpus, + .maxlen = sizeof(int), + .mode = 0444, + .proc_handler = &proc_dointvec, + .strategy = &sysctl_intvec, + },
+ { + .ctl_name = SOCKLND_CPU_AFFINITY_OFF, + .procname = "cpu_affinity_off", + .data = &ksocknal_tunables.cpu_affinity_off, + .maxlen = sizeof(int), + .mode = 0444, + .proc_handler = &proc_dointvec, + .strategy = &sysctl_intvec, + },
#endif
{
.ctl_name = SOCKLND_ROUND_ROBIN,
@@ -1296,11 +1318,11 @@ int
ksocknal_lib_bind_thread_to_cpu(int id)
{
#if defined(CONFIG_SMP) && defined(CPU_AFFINITY)
- id = ksocknal_sched2cpu(id);
+ id = ksocknal_sched2cpu(id) + *ksocknal_tunables.ksnd_cpu_affinity_off;
if (cpu_online(id)) { cpumask_t m = CPU_MASK_NONE; cpu_set(id, m); - set_cpus_allowed(current, m); + set_cpus_allowed_ptr(current, &m); return 0; }

@@ -1310,3 +1332,22 @@ ksocknal_lib_bind_thread_to_cpu(int id)
return 0;
#endif
}
+
+int
+ksocknal_lib_bind_thread_to_node_cpu(int id)
+{
+#if defined(CONFIG_SMP) && defined(CPU_AFFINITY)
+ id = ksocknal_sched2cpu(id) + *ksocknal_tunables.ksnd_cpu_affinity_off;
+ if (cpu_online(id)) { + cpumask_t m = node_to_cpumask(cpu_to_node(id)); + set_cpus_allowed_ptr(current, &m); + return 0; + }
+
+ return -1;
+
+#else
+ return 0;
+#endif
+}
+
diff -rwup lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_modparams.c lustre_b1_8/lnet/klnds/socklnd/socklnd_modparams.c
— lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_modparams.c 2012-04-05 14:54:35.000000000 -0700
+++ lustre_b1_8/lnet/klnds/socklnd/socklnd_modparams.c 2012-04-28 16:34:12.000000000 -0700
@@ -131,6 +131,14 @@ CFS_MODULE_PARM(inject_csum_error, "i",
static int enable_irq_affinity = 0;
CFS_MODULE_PARM(enable_irq_affinity, "i", int, 0644,
"enable IRQ affinity");
+
+static unsigned int ncpus = 0;
+CFS_MODULE_PARM(ncpus, "i", int, 0444,
+ "maximum number of CPUs to use");
+
+static unsigned int cpu_affinity_off = 0;
+CFS_MODULE_PARM(cpu_affinity_off, "i", int, 0444,
+ "CPU affinity offset");
#endif

static int nonblk_zcack = 1;
@@ -194,6 +202,8 @@ ksock_tunables_t ksocknal_tunables = { .ksnd_zc_recv_min_nfrags = &zc_recv_min_nfrags, #ifdef CPU_AFFINITY .ksnd_irq_affinity = &enable_irq_affinity, + .ksnd_ncpus = &ncpus, + .ksnd_cpu_affinity_off = &cpu_affinity_off, #endif #ifdef SOCKNAL_BACKOFF .ksnd_backoff_init = &backoff_init, diff -rwup lustre_b1_8.orig/lustre/ptlrpc/service.c lustre_b1_8/lustre/ptlrpc/service.c --- lustre_b1_8.orig/lustre/ptlrpc/service.c 2012-04-05 14:54:36.000000000 -0700 +++ lustre_b1_8/lustre/ptlrpc/service.c 2012-04-28 14:22:12.000000000 -0700 @@ -1687,7 +1687,8 @@ static int ptlrpc_main(void *arg) break; num_cpu++; }
- set_cpus_allowed(cfs_current(), node_to_cpumask(cpu_to_node(cpu)));
+ cpumask_t m = node_to_cpumask(cpu_to_node(cpu));
+ set_cpus_allowed_ptr(cfs_current(), &m);
}
#endif

diff -rwup lustre_b1_8.orig/aclocal.m4 lustre_b1_8/aclocal.m4
— lustre_b1_8.orig/aclocal.m4 2012-04-28 17:33:39.000000000 -0700
+++ lustre_b1_8/aclocal.m4 2012-04-28 17:24:13.000000000 -0700
@@ -1924,13 +1924,13 @@ AC_DEFINE(HAVE_FILEMAP_FDATAWRITE_RANGE,

# The actual symbol exported varies among architectures, so we need
# to check many symbols (but only in the current architecture.) No
-# matter what symbol is exported, the kernel #defines node_to_cpumask
+# matter what symbol is exported, the kernel #defines node_to_cpumask_map
# to the appropriate function and that's what we use.
AC_DEFUN([LC_EXPORT_NODE_TO_CPUMASK],
- [LB_CHECK_SYMBOL_EXPORT([node_to_cpumask],
+ [LB_CHECK_SYMBOL_EXPORT([node_to_cpumask_map],
[arch/$LINUX_ARCH/mm/numa.c],
[AC_DEFINE(HAVE_NODE_TO_CPUMASK, 1,
- [node_to_cpumask is exported by
+ [node_to_cpumask_map is exported by
the kernel])]) # x86_64
LB_CHECK_SYMBOL_EXPORT([node_to_cpu_mask],
[arch/$LINUX_ARCH/kernel/smpboot.c],
diff -rwup lustre_b1_8.orig/configure lustre_b1_8/configure
— lustre_b1_8.orig/configure 2012-04-28 17:33:39.000000000 -0700
+++ lustre_b1_8/configure 2012-04-28 17:23:24.000000000 -0700
@@ -10276,14 +10276,14 @@ _ACEOF
fi


- echo "$as_me:$LINENO: checking if Linux was built with symbol node_to_cpumask exported" >&5
-echo $ECHO_N "checking if Linux was built with symbol node_to_cpumask exported... $ECHO_C" >&6
-grep -q -E '[[:space:]]node_to_cpumask[[:space:]]' $LINUX/$SYMVERFILE 2>/dev/null
+ echo "$as_me:$LINENO: checking if Linux was built with symbol node_to_cpumask_map exported" >&5
+echo $ECHO_N "checking if Linux was built with symbol node_to_cpumask_map exported... $ECHO_C" >&6
+grep -q -E '[[:space:]]node_to_cpumask_map[[:space:]]' $LINUX/$SYMVERFILE 2>/dev/null
rc=$?
if test $rc -ne 0; then
export=0
for file in arch/$LINUX_ARCH/mm/numa.c; do
- grep -q -E "EXPORT_SYMBOL.*(node_to_cpumask)" "$LINUX/$file" 2>/dev/null
+ grep -q -E "EXPORT_SYMBOL.*(node_to_cpumask_map)" "$LINUX/$file" 2>/dev/null
rc=$?
if test $rc -eq 0; then
export=1
diff -rwup lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_lib-linux.c lustre_b1_8/lnet/klnds/socklnd/socklnd_lib-linux.c
— lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_lib-linux.c 2012-04-28 17:33:39.000000000 -0700
+++ lustre_b1_8/lnet/klnds/socklnd/socklnd_lib-linux.c 2012-04-28 17:31:13.000000000 -0700
@@ -1339,8 +1339,8 @@ ksocknal_lib_bind_thread_to_node_cpu(int
#if defined(CONFIG_SMP) && defined(CPU_AFFINITY)
id = ksocknal_sched2cpu(id) + *ksocknal_tunables.ksnd_cpu_affinity_off;
if (cpu_online(id)) { - cpumask_t m = node_to_cpumask(cpu_to_node(id)); - set_cpus_allowed_ptr(current, &m); + cpumask_t *m = node_to_cpumask_map[cpu_to_node(id)]; + set_cpus_allowed_ptr(current, m); return 0; }

diff -rwup lustre_b1_8.orig/lustre/autoconf/lustre-core.m4 lustre_b1_8/lustre/autoconf/lustre-core.m4
— lustre_b1_8.orig/lustre/autoconf/lustre-core.m4 2012-04-05 14:54:36.000000000 -0700
+++ lustre_b1_8/lustre/autoconf/lustre-core.m4 2012-04-28 17:27:34.000000000 -0700
@@ -1013,13 +1013,13 @@ AC_DEFINE(HAVE_FILEMAP_FDATAWRITE_RANGE,

# The actual symbol exported varies among architectures, so we need
# to check many symbols (but only in the current architecture.) No
-# matter what symbol is exported, the kernel #defines node_to_cpumask
+# matter what symbol is exported, the kernel #defines node_to_cpumask_map
# to the appropriate function and that's what we use.
AC_DEFUN([LC_EXPORT_NODE_TO_CPUMASK],
- [LB_CHECK_SYMBOL_EXPORT([node_to_cpumask],
+ [LB_CHECK_SYMBOL_EXPORT([node_to_cpumask_map],
[arch/$LINUX_ARCH/mm/numa.c],
[AC_DEFINE(HAVE_NODE_TO_CPUMASK, 1,
- [node_to_cpumask is exported by
+ [node_to_cpumask_map is exported by
the kernel])]) # x86_64
LB_CHECK_SYMBOL_EXPORT([node_to_cpu_mask],
[arch/$LINUX_ARCH/kernel/smpboot.c],
diff -rwup lustre_b1_8.orig/lustre/ptlrpc/service.c lustre_b1_8/lustre/ptlrpc/service.c
— lustre_b1_8.orig/lustre/ptlrpc/service.c 2012-04-28 17:33:39.000000000 -0700
+++ lustre_b1_8/lustre/ptlrpc/service.c 2012-04-28 17:52:24.000000000 -0700
@@ -1679,6 +1679,7 @@ static int ptlrpc_main(void *arg)
* we get the per-thread allocations on local node. bug 7342 */
if (svc->srv_cpu_affinity) {
int cpu, num_cpu;
+ cpumask_t *m;

for (cpu = 0, num_cpu = 0; cpu < num_possible_cpus(); cpu++) { if (!cpu_online(cpu)) @@ -1687,8 +1688,8 @@ static int ptlrpc_main(void *arg) break; num_cpu++; }
- cpumask_t m = node_to_cpumask(cpu_to_node(cpu));
- set_cpus_allowed_ptr(cfs_current(), &m);
+ m = node_to_cpumask_map[cpu_to_node(cpu)];
+ set_cpus_allowed_ptr(cfs_current(), m);
}
#endif





diff -rwup lustre_b1_8.orig/aclocal.m4 lustre_b1_8/aclocal.m4
— lustre_b1_8.orig/aclocal.m4 2012-04-05 14:54:39.000000000 -0700
+++ lustre_b1_8/aclocal.m4 2012-04-28 14:22:12.000000000 -0700
@@ -3597,7 +3597,7 @@ else
#else
unsigned long m;
#endif
- set_cpus_allowed(&t, m);
+ set_cpus_allowed_ptr(&t, &m);
],[
AC_DEFINE(CPU_AFFINITY, 1, [kernel has cpu affinity support])
AC_MSG_RESULT([yes])
diff -rwup lustre_b1_8.orig/configure lustre_b1_8/configure
— lustre_b1_8.orig/configure 2012-04-05 14:54:34.000000000 -0700
+++ lustre_b1_8/configure 2012-04-28 14:22:12.000000000 -0700
@@ -6189,7 +6189,7 @@ main (void)
#else
unsigned long m;
#endif
- set_cpus_allowed(&t, m);
+ set_cpus_allowed_ptr(&t, &m);

;
return 0;
diff -rwup lustre_b1_8.orig/lnet/autoconf/lustre-lnet.m4 lustre_b1_8/lnet/autoconf/lustre-lnet.m4
— lustre_b1_8.orig/lnet/autoconf/lustre-lnet.m4 2012-04-05 14:54:34.000000000 -0700
+++ lustre_b1_8/lnet/autoconf/lustre-lnet.m4 2012-04-28 14:22:12.000000000 -0700
@@ -107,7 +107,7 @@ else
#else
unsigned long m;
#endif
- set_cpus_allowed(&t, m);
+ set_cpus_allowed_ptr(&t, &m);
],[
AC_DEFINE(CPU_AFFINITY, 1, [kernel has cpu affinity support])
AC_MSG_RESULT([yes])
diff -rwup lustre_b1_8.orig/lnet/klnds/socklnd/socklnd.c lustre_b1_8/lnet/klnds/socklnd/socklnd.c
— lustre_b1_8.orig/lnet/klnds/socklnd/socklnd.c 2012-04-05 14:54:35.000000000 -0700
+++ lustre_b1_8/lnet/klnds/socklnd/socklnd.c 2012-04-28 16:33:45.000000000 -0700
@@ -2371,7 +2371,14 @@ ksocknal_base_startup (void)
ksocknal_data.ksnd_init = SOCKNAL_INIT_DATA;
PORTAL_MODULE_USE;

+#ifdef CPU_AFFINITY
+ if ((*ksocknal_tunables.ksnd_ncpus < 1) || (*ksocknal_tunables.ksnd_ncpus > ksocknal_nsched()))
+ *ksocknal_tunables.ksnd_ncpus = ksocknal_nsched();
+
+ ksocknal_data.ksnd_nschedulers = *ksocknal_tunables.ksnd_ncpus;
+#else
ksocknal_data.ksnd_nschedulers = ksocknal_nsched();
+#endif
LIBCFS_ALLOC(ksocknal_data.ksnd_schedulers,
sizeof(ksock_sched_t) * ksocknal_data.ksnd_nschedulers);
if (ksocknal_data.ksnd_schedulers == NULL)
diff -rwup lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_cb.c lustre_b1_8/lnet/klnds/socklnd/socklnd_cb.c
— lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_cb.c 2012-04-05 14:54:35.000000000 -0700
+++ lustre_b1_8/lnet/klnds/socklnd/socklnd_cb.c 2012-04-28 16:36:54.000000000 -0700
@@ -2184,6 +2184,9 @@ ksocknal_connd (void *arg)
cfs_daemonize (name);
cfs_block_allsigs ();

+ if (ksocknal_lib_bind_thread_to_node_cpu(id))
+ CERROR ("Can't set CPU affinity for %s to node of CPU %d\n", name, (int)id % *ksocknal_tunables.ksnd_ncpus);
+
cfs_waitlink_init (&wait);

cfs_spin_lock_bh (connd_lock);
@@ -2576,6 +2579,9 @@ ksocknal_reaper (void *arg)
cfs_daemonize ("socknal_reaper");
cfs_block_allsigs ();

+ if (ksocknal_lib_bind_thread_to_node_cpu(0))
+ CERROR ("Can't set CPU affinity for %s to node of CPU %d\n", "socknal_reaper", 0);
+
CFS_INIT_LIST_HEAD(&enomem_conns);
cfs_waitlink_init (&wait);

diff -rwup lustre_b1_8.orig/lnet/klnds/socklnd/socklnd.h lustre_b1_8/lnet/klnds/socklnd/socklnd.h
— lustre_b1_8.orig/lnet/klnds/socklnd/socklnd.h 2012-04-05 14:54:35.000000000 -0700
+++ lustre_b1_8/lnet/klnds/socklnd/socklnd.h 2012-04-28 15:55:14.000000000 -0700
@@ -127,6 +127,8 @@ typedef struct
int ksnd_zc_recv_min_nfrags; / minimum # of fragments to enable ZC receive */
#ifdef CPU_AFFINITY
int ksnd_irq_affinity; / enable IRQ affinity? */
+ unsigned int ksnd_ncpus; / # CPUs */
+ unsigned int ksnd_cpu_affinity_off;/ affinity offset */
#endif
#ifdef SOCKNAL_BACKOFF
int ksnd_backoff_init; / initial TCP backoff */
@@ -607,3 +609,4 @@ extern void ksocknal_lib_csum_tx(ksock_t
extern int ksocknal_lib_memory_pressure(ksock_conn_t *conn);
extern __u64 ksocknal_lib_new_incarnation(void);
extern int ksocknal_lib_bind_thread_to_cpu(int id);
+extern int ksocknal_lib_bind_thread_to_node_cpu(int id);
diff -rwup lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_lib-linux.c lustre_b1_8/lnet/klnds/socklnd/socklnd_lib-linux.c
— lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_lib-linux.c 2012-04-05 14:54:35.000000000 -0700
+++ lustre_b1_8/lnet/klnds/socklnd/socklnd_lib-linux.c 2012-04-28 15:29:01.000000000 -0700
@@ -57,6 +57,8 @@ enum {
SOCKLND_TX_BUFFER_SIZE,
SOCKLND_NAGLE,
SOCKLND_IRQ_AFFINITY,
+ SOCKLND_NCPUS,
+ SOCKLND_CPU_AFFINITY_OFF,
SOCKLND_ROUND_ROBIN,
SOCKLND_KEEPALIVE,
SOCKLND_KEEPALIVE_IDLE,
@@ -86,6 +88,8 @@ enum { #define SOCKLND_TX_BUFFER_SIZE CTL_UNNUMBERED #define SOCKLND_NAGLE CTL_UNNUMBERED #define SOCKLND_IRQ_AFFINITY CTL_UNNUMBERED+#define SOCKLND_NCPUS CTL_UNNUMBERED+#define SOCKLND_CPU_AFFINITY_OFF CTL_UNNUMBERED #define SOCKLND_ROUND_ROBIN CTL_UNNUMBERED #define SOCKLND_KEEPALIVE CTL_UNNUMBERED #define SOCKLND_KEEPALIVE_IDLE CTL_UNNUMBERED@@ -263,6 +267,24 @@ static cfs_sysctl_table_t ksocknal_ctl_t .proc_handler = &proc_dointvec, .strategy = &sysctl_intvec, }

,
+

{ + .ctl_name = SOCKLND_NCPUS, + .procname = "ncpus", + .data = &ksocknal_tunables.ksnd_ncpus, + .maxlen = sizeof(int), + .mode = 0444, + .proc_handler = &proc_dointvec, + .strategy = &sysctl_intvec, + }

,
+

{ + .ctl_name = SOCKLND_CPU_AFFINITY_OFF, + .procname = "cpu_affinity_off", + .data = &ksocknal_tunables.cpu_affinity_off, + .maxlen = sizeof(int), + .mode = 0444, + .proc_handler = &proc_dointvec, + .strategy = &sysctl_intvec, + }

,
#endif
{
.ctl_name = SOCKLND_ROUND_ROBIN,
@@ -1296,11 +1318,11 @@ int
ksocknal_lib_bind_thread_to_cpu(int id)
{
#if defined(CONFIG_SMP) && defined(CPU_AFFINITY)

  • id = ksocknal_sched2cpu(id);
    + id = ksocknal_sched2cpu(id) + *ksocknal_tunables.ksnd_cpu_affinity_off;
    if (cpu_online(id)) { cpumask_t m = CPU_MASK_NONE; cpu_set(id, m); - set_cpus_allowed(current, m); + set_cpus_allowed_ptr(current, &m); return 0; }

@@ -1310,3 +1332,22 @@ ksocknal_lib_bind_thread_to_cpu(int id)
return 0;
#endif
}
+
+int
+ksocknal_lib_bind_thread_to_node_cpu(int id)
+{
+#if defined(CONFIG_SMP) && defined(CPU_AFFINITY)
+ id = ksocknal_sched2cpu(id) + *ksocknal_tunables.ksnd_cpu_affinity_off;
+ if (cpu_online(id))

{ + cpumask_t m = node_to_cpumask(cpu_to_node(id)); + set_cpus_allowed_ptr(current, &m); + return 0; + }

+
+ return -1;
+
+#else
+ return 0;
+#endif
+}
+
diff -rwup lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_modparams.c lustre_b1_8/lnet/klnds/socklnd/socklnd_modparams.c
— lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_modparams.c 2012-04-05 14:54:35.000000000 -0700
+++ lustre_b1_8/lnet/klnds/socklnd/socklnd_modparams.c 2012-04-28 16:34:12.000000000 -0700
@@ -131,6 +131,14 @@ CFS_MODULE_PARM(inject_csum_error, "i",
static int enable_irq_affinity = 0;
CFS_MODULE_PARM(enable_irq_affinity, "i", int, 0644,
"enable IRQ affinity");
+
+static unsigned int ncpus = 0;
+CFS_MODULE_PARM(ncpus, "i", int, 0444,
+ "maximum number of CPUs to use");
+
+static unsigned int cpu_affinity_off = 0;
+CFS_MODULE_PARM(cpu_affinity_off, "i", int, 0444,
+ "CPU affinity offset");
#endif

static int nonblk_zcack = 1;
@@ -194,6 +202,8 @@ ksock_tunables_t ksocknal_tunables =

{ .ksnd_zc_recv_min_nfrags = &zc_recv_min_nfrags, #ifdef CPU_AFFINITY .ksnd_irq_affinity = &enable_irq_affinity, + .ksnd_ncpus = &ncpus, + .ksnd_cpu_affinity_off = &cpu_affinity_off, #endif #ifdef SOCKNAL_BACKOFF .ksnd_backoff_init = &backoff_init, diff -rwup lustre_b1_8.orig/lustre/ptlrpc/service.c lustre_b1_8/lustre/ptlrpc/service.c --- lustre_b1_8.orig/lustre/ptlrpc/service.c 2012-04-05 14:54:36.000000000 -0700 +++ lustre_b1_8/lustre/ptlrpc/service.c 2012-04-28 14:22:12.000000000 -0700 @@ -1687,7 +1687,8 @@ static int ptlrpc_main(void *arg) break; num_cpu++; }
  • set_cpus_allowed(cfs_current(), node_to_cpumask(cpu_to_node(cpu)));
    + cpumask_t m = node_to_cpumask(cpu_to_node(cpu));
    + set_cpus_allowed_ptr(cfs_current(), &m);
    }
    #endif

diff -rwup lustre_b1_8.orig/aclocal.m4 lustre_b1_8/aclocal.m4
— lustre_b1_8.orig/aclocal.m4 2012-04-28 17:33:39.000000000 -0700
+++ lustre_b1_8/aclocal.m4 2012-04-28 17:24:13.000000000 -0700
@@ -1924,13 +1924,13 @@ AC_DEFINE(HAVE_FILEMAP_FDATAWRITE_RANGE,

  1. The actual symbol exported varies among architectures, so we need
  2. to check many symbols (but only in the current architecture.) No
    1. matter what symbol is exported, the kernel #defines node_to_cpumask
      +# matter what symbol is exported, the kernel #defines node_to_cpumask_map
  3. to the appropriate function and that's what we use.
    AC_DEFUN([LC_EXPORT_NODE_TO_CPUMASK],
  • [LB_CHECK_SYMBOL_EXPORT([node_to_cpumask],
    + [LB_CHECK_SYMBOL_EXPORT([node_to_cpumask_map],
    [arch/$LINUX_ARCH/mm/numa.c],
    [AC_DEFINE(HAVE_NODE_TO_CPUMASK, 1,
  • [node_to_cpumask is exported by
    + [node_to_cpumask_map is exported by
    the kernel])]) # x86_64
    LB_CHECK_SYMBOL_EXPORT([node_to_cpu_mask],
    [arch/$LINUX_ARCH/kernel/smpboot.c],
    diff -rwup lustre_b1_8.orig/configure lustre_b1_8/configure
      • lustre_b1_8.orig/configure 2012-04-28 17:33:39.000000000 -0700
        +++ lustre_b1_8/configure 2012-04-28 17:23:24.000000000 -0700
        @@ -10276,14 +10276,14 @@ _ACEOF
        fi
  • echo "$as_me:$LINENO: checking if Linux was built with symbol node_to_cpumask exported" >&5
    -echo $ECHO_N "checking if Linux was built with symbol node_to_cpumask exported... $ECHO_C" >&6
    -grep -q -E '[[:space:]]node_to_cpumask[[:space:]]' $LINUX/$SYMVERFILE 2>/dev/null
    + echo "$as_me:$LINENO: checking if Linux was built with symbol node_to_cpumask_map exported" >&5
    +echo $ECHO_N "checking if Linux was built with symbol node_to_cpumask_map exported... $ECHO_C" >&6
    +grep -q -E '[[:space:]]node_to_cpumask_map[[:space:]]' $LINUX/$SYMVERFILE 2>/dev/null
    rc=$?
    if test $rc -ne 0; then
    export=0
    for file in arch/$LINUX_ARCH/mm/numa.c; do
  • grep -q -E "EXPORT_SYMBOL.*(node_to_cpumask)" "$LINUX/$file" 2>/dev/null
    + grep -q -E "EXPORT_SYMBOL.*(node_to_cpumask_map)" "$LINUX/$file" 2>/dev/null
    rc=$?
    if test $rc -eq 0; then
    export=1
    diff -rwup lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_lib-linux.c lustre_b1_8/lnet/klnds/socklnd/socklnd_lib-linux.c
      • lustre_b1_8.orig/lnet/klnds/socklnd/socklnd_lib-linux.c 2012-04-28 17:33:39.000000000 -0700
        +++ lustre_b1_8/lnet/klnds/socklnd/socklnd_lib-linux.c 2012-04-28 17:31:13.000000000 -0700
        @@ -1339,8 +1339,8 @@ ksocknal_lib_bind_thread_to_node_cpu(int
        #if defined(CONFIG_SMP) && defined(CPU_AFFINITY)
        id = ksocknal_sched2cpu(id) + *ksocknal_tunables.ksnd_cpu_affinity_off;
        if (cpu_online(id)) { - cpumask_t m = node_to_cpumask(cpu_to_node(id)); - set_cpus_allowed_ptr(current, &m); + cpumask_t *m = node_to_cpumask_map[cpu_to_node(id)]; + set_cpus_allowed_ptr(current, m); return 0; }

diff -rwup lustre_b1_8.orig/lustre/autoconf/lustre-core.m4 lustre_b1_8/lustre/autoconf/lustre-core.m4
— lustre_b1_8.orig/lustre/autoconf/lustre-core.m4 2012-04-05 14:54:36.000000000 -0700
+++ lustre_b1_8/lustre/autoconf/lustre-core.m4 2012-04-28 17:27:34.000000000 -0700
@@ -1013,13 +1013,13 @@ AC_DEFINE(HAVE_FILEMAP_FDATAWRITE_RANGE,

  1. The actual symbol exported varies among architectures, so we need
  2. to check many symbols (but only in the current architecture.) No
    1. matter what symbol is exported, the kernel #defines node_to_cpumask
      +# matter what symbol is exported, the kernel #defines node_to_cpumask_map
  3. to the appropriate function and that's what we use.
    AC_DEFUN([LC_EXPORT_NODE_TO_CPUMASK],
  • [LB_CHECK_SYMBOL_EXPORT([node_to_cpumask],
    + [LB_CHECK_SYMBOL_EXPORT([node_to_cpumask_map],
    [arch/$LINUX_ARCH/mm/numa.c],
    [AC_DEFINE(HAVE_NODE_TO_CPUMASK, 1,
  • [node_to_cpumask is exported by
    + [node_to_cpumask_map is exported by
    the kernel])]) # x86_64
    LB_CHECK_SYMBOL_EXPORT([node_to_cpu_mask],
    [arch/$LINUX_ARCH/kernel/smpboot.c],
    diff -rwup lustre_b1_8.orig/lustre/ptlrpc/service.c lustre_b1_8/lustre/ptlrpc/service.c
      • lustre_b1_8.orig/lustre/ptlrpc/service.c 2012-04-28 17:33:39.000000000 -0700
        +++ lustre_b1_8/lustre/ptlrpc/service.c 2012-04-28 17:52:24.000000000 -0700
        @@ -1679,6 +1679,7 @@ static int ptlrpc_main(void *arg)
  • we get the per-thread allocations on local node. bug 7342 */
    if (svc->srv_cpu_affinity) {
    int cpu, num_cpu;
    + cpumask_t *m;

for (cpu = 0, num_cpu = 0; cpu < num_possible_cpus(); cpu++)

{ if (!cpu_online(cpu)) @@ -1687,8 +1688,8 @@ static int ptlrpc_main(void *arg) break; num_cpu++; }
  • cpumask_t m = node_to_cpumask(cpu_to_node(cpu));
  • set_cpus_allowed_ptr(cfs_current(), &m);
    + m = node_to_cpumask_map[cpu_to_node(cpu)];
    + set_cpus_allowed_ptr(cfs_current(), m);
    }
    #endif
Comment by Minh Diep [ 09/Sep/13 ]

fyi http://review.whamcloud.com/#/c/7489/

This is the patch that SDSC is using

Comment by John Fuchs-Chesney (Inactive) [ 07/Mar/14 ]

Patch for v 1.8 provided by customer.

Generated at Sat Feb 10 01:16:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.