[LU-2268] SMP scalablity enhancements break FMR pools Created: 02/Nov/12  Updated: 19/Apr/13  Resolved: 27/Nov/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Jeremy Filizetti Assignee: Liang Zhen (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

RHEL 6.3 with Lustre 2.3


Severity: 3
Rank (Obsolete): 5427

 Description   

The SMP scalability patch never uses FMR due to the following code from master at 2299:

   2286         }
   2287 
   2288         for (i = 0; i < ncpts; i++) {
   2289                 cpt = (cpts == NULL) ? i : cpts[i];
   2290                 rc = kiblnd_init_fmr_poolset(net->ibn_fmr_ps[cpt], cpt, net,
   2291                                              kiblnd_fmr_pool_size(ncpts),
   2292                                              kiblnd_fmr_flush_trigger(ncpts));
   2293                 if (rc == -ENOSYS && i == 0) /* no FMR */
   2294                         break; /* create PMR pool */
   2295                 if (rc != 0)
   2296                         goto failed; /* a real error */
   2297         }
   2298 
   2299         cfs_percpt_free(net->ibn_fmr_ps);
   2300         net->ibn_fmr_ps = NULL;
   2301 

I had hoped that just adding the following would be sufficient:

if (rc > 0)
         return 0; /* FMR success */

However, when attempting to run with that I am seeing a kernel panic. Right now I have the patch for LU-1757 in the code I will remove and test again just to make sure its not that patch.



 Comments   
Comment by Jeremy Filizetti [ 03/Nov/12 ]

Correction on the fix for FMR:

if (i > 0)
         return 0; /* FMR success */

One other side-effect I failed to mention was since FMR always fails it falls back to PMR with call ib_reg_phys_mr which is not supported on mlx4 cards.

LNetError: 3496:0:(o2iblnd.c:1952:kiblnd_pmr_pool_map()) Failed ib_reg_phys_mr: -38
LNetError: 3496:0:(o2iblnd_cb.c:611:kiblnd_pmr_map_tx()) Failed to create MR by phybuf: -38

After testing with master I still see a kernel panic. Here is the stacktrace:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa08dd64f>] kiblnd_map_tx+0x21f/0x550 [ko2iblnd]
PGD 1819af8067 PUD 181f82a067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
CPU 3 
Modules linked in: osp(U) ofd(U) ost(U) mgc(U) fsfilt_ldiskfs(U) exportfs osd_ldiskfs(U) lquota(U) mdd(U) fid(U) fld(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa mlx4_ib ib_mad ib_core mlx4_en mlx4_core igb sg microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 15459, comm: ll_ost_io01_002 Not tainted 2.6.32-279.5.1.el6_lustre.gb16fe80.x86_64 #1 SGI.COM C1104-2TY9/X8DTT-IBQF
RIP: 0010:[<ffffffffa08dd64f>]  [<ffffffffa08dd64f>] kiblnd_map_tx+0x21f/0x550 [ko2iblnd]
RSP: 0018:ffff88119f5ef710  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88044333c080 RCX: 0000000000000000
RDX: 0000000574cca000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88119f5ef780 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88044333c080 R11: ffff88182143e940 R12: 0000000000000001
R13: ffff88182151c7c0 R14: ffffc900230ccf48 R15: 0000000000100000
FS:  00007f451c390700(0000) GS:ffff88002c260000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000018201a0000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ll_ost_io01_002 (pid: 15459, threadinfo ffff88119f5ee000, task ffff88119f5eaae0)
Stack:
 ffff88000006d4a8 0000000000000002 0000001000000100 ffffffff81a945c0
<d> ffff88182143e940 0000010000000002 ffff880000053600 0006125000000001
<d> 0000000000000246 ffff880423589060 0000000000000001 0000000000000000
Call Trace:
 [<ffffffffa08dda91>] kiblnd_setup_rd_kiov+0x111/0x2d0 [ko2iblnd]
 [<ffffffffa08e39a3>] kiblnd_send+0x5b3/0x9f0 [ko2iblnd]
 [<ffffffffa04d7edb>] lnet_ni_send+0x4b/0x110 [lnet]
 [<ffffffffa04dc486>] lnet_send+0x6e6/0xc10 [lnet]
 [<ffffffffa04dcc94>] LNetGet+0x2e4/0x830 [lnet]
 [<ffffffffa0732b21>] ptlrpc_start_bulk_transfer+0x151/0x640 [ptlrpc]
 [<ffffffffa0703dd0>] target_bulk_io+0x180/0x950 [ptlrpc]
 [<ffffffffa042bbe0>] ? cfs_alloc+0x30/0x60 [libcfs]
 [<ffffffffa042b885>] ? cfs_waitq_init+0x15/0x20 [libcfs]
 [<ffffffffa0729926>] ? new_bulk+0x106/0x210 [ptlrpc]
 [<ffffffffa0726618>] ? __ptlrpc_prep_bulk_page+0x68/0x1a0 [ptlrpc]
 [<ffffffffa0b06627>] ost_brw_write+0x1327/0x15d0 [ost]
 [<ffffffffa0738e6c>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc]
 [<ffffffffa0738fc8>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc]
 [<ffffffffa0b0bec2>] ost_handle+0x32e2/0x4690 [ost]
 [<ffffffffa042bbe0>] ? cfs_alloc+0x30/0x60 [libcfs]
 [<ffffffffa074015b>] ? ptlrpc_update_export_timer+0x4b/0x470 [ptlrpc]
 [<ffffffffa07485cc>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
 [<ffffffffa042b65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa043d17f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
 [<ffffffffa073f999>] ? ptlrpc_wait_event+0xa9/0x2a0 [ptlrpc]
 [<ffffffff810533f3>] ? __wake_up+0x53/0x70
 [<ffffffffa0749bbc>] ptlrpc_main+0xc0c/0x19f0 [ptlrpc]
 [<ffffffffa0748fb0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffffa0748fb0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
 [<ffffffffa0748fb0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
Code: 8d 0c 40 31 c0 48 c1 e1 02 8b 7c 0b 08 85 ff 74 28 0f 1f 00 49 8b 55 18 48 23 54 0b 0c 4c 63 c0 49 63 fc 41 83 c4 01 49 8d 14 10 <48> 89 14 fe 41 03 45 14 3b 44 0b 08 72 db 41 83 c1 01 44 3b 4b 
RIP  [<ffffffffa08dd64f>] kiblnd_map_tx+0x21f/0x550 [ko2iblnd]
 RSP <ffff88119f5ef710>
CR2: 0000000000000000
Comment by Peter Jones [ 03/Nov/12 ]

Liang

Could you please comment?

Thanks

Peter

Comment by Liang Zhen (Inactive) [ 05/Nov/12 ]

Hi Jeremy, I think it's because kiblnd_create_tx_pool() should be called after creation of FMR pool, otherwise it will not allocate tx_pages for kib_tx_t.
I've posted a patch for this: http://review.whamcloud.com/#change,4462

Comment by Liang Zhen (Inactive) [ 27/Nov/12 ]

patch landed

Generated at Sat Feb 10 01:23:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.