Loading...

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.4.0
Affects Version/s: Lustre 2.4.0
Labels:
- MB
- sequoia
- topsequoia

Severity:
3
Rank (Obsolete):
6925

Description

I was trying to run some IOR tests using a 2.3.58 based client and am running into what appear to be stack overflow issues. The test is an IOR write using 256 clients, with 16 tasks per client:

$ salloc -N256 -n$((256*16)) -pltest
$ srun -- ior -a POSIX -Cegw -F -o /p/lstest/surya1/lu2139-ior/%d -t 1m -b 8G -i 5

The clients are all x86_64, with the default stack size (8k), which is important to note. Some info on an example crash that I've seen is below.

The console shows this:

LustreError: 7408:0:(osc_dev.c:129:osc_key_init()) ASSERTION( (!(CFS_ALLOC_IO != CFS_ALLOC_ATOMIC) || (!(((current_thread_info()->preempt_count) & ((((1UL << (10))-1) << ((0 + 8) + 8)) | (((1UL << (8))-1) << (0 + 8)) | (((1UL << (1))-1) << (((0 + 8) + 8) + 10))))))) ) failed: 
BUG: unable to handle kernel paging request at 00000001811be3e0
IP: [<ffffffff81052994>] update_curr+0x144/0x1f0
PGD 1024273067 PUD 0 
Thread overran stack, or stack corrupted
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
CPU 8 
Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) dm_round_robin scsi_dh_rdac sg sd_mod crc_t10dif zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate acpi_cpufreq freq_table mperf ib_srp scsi_transport_srp scsi_tgt ko2iblnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core dm_mirror dm_region_hash dm_log dm_multipath dm_mod vhost_net macvtap macvlan tun kvm wmi sb_edac edac_core i2c_i801 i2c_core ahci iTCO_wdt iTCO_vendor_support ioatdma ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc igb dca mlx4_en mlx4_core [last unloaded: cpufreq_ondemand]

Pid: 7408, comm: ior Tainted: P           ----------------   2.6.32-220.23.1.2chaos.ch5.x86_64 #1 appro gb812x-cn/S2600JF
...

The stack reported by crash:

crash> bt
PID: 7408   TASK: ffff881028578aa0  CPU: 8   COMMAND: "ior"
 #0 [ffff88085c4039a0] machine_kexec at ffffffff8103216b
 #1 [ffff88085c403a00] crash_kexec at ffffffff810b8d12
 #2 [ffff88085c403ad0] oops_end at ffffffff814f2c00
 #3 [ffff88085c403b00] no_context at ffffffff810423fb
 #4 [ffff88085c403b50] __bad_area_nosemaphore at ffffffff81042685
 #5 [ffff88085c403ba0] bad_area_nosemaphore at ffffffff81042753
 #6 [ffff88085c403bb0] __do_page_fault at ffffffff81042e0d
 #7 [ffff88085c403cd0] do_page_fault at ffffffff814f4bde
 #8 [ffff88085c403d00] page_fault at ffffffff814f1f95
    [exception RIP: update_curr+324]
    RIP: ffffffff81052994  RSP: ffff88085c403db8  RFLAGS: 00010086
    RAX: ffff881028578aa0  RBX: 000000003feb8e00  RCX: ffff88082ff111c0
    RDX: 0000000000018b88  RSI: 0000000000000000  RDI: ffff881028578ad8
    RBP: ffff88085c403de8   R8: ffffffff8160b665   R9: 000000000000001a
    R10: 0000000000000010  R11: 000000000000001a  R12: ffff88085c416028
    R13: 0000000001a5b3e7  R14: 0000004135410580  R15: ffff881028578aa0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff88085c403df0] task_tick_fair at ffffffff81052f4b
#10 [ffff88085c403e20] scheduler_tick at ffffffff81056791
#11 [ffff88085c403e60] update_process_times at ffffffff8107c7e2
#12 [ffff88085c403e90] tick_sched_timer at ffffffff810a0976
#13 [ffff88085c403ec0] __run_hrtimer at ffffffff8109580e
#14 [ffff88085c403f10] hrtimer_interrupt at ffffffff81095bb6
#15 [ffff88085c403f90] smp_apic_timer_interrupt at ffffffff814f754b
#16 [ffff88085c403fb0] apic_timer_interrupt at ffffffff8100bc13
--- <IRQ stack> ---
#17 [ffff881022f56178] apic_timer_interrupt at ffffffff8100bc13
    [exception RIP: _spin_unlock_irqrestore+23]
    RIP: ffffffff814f1ad7  RSP: ffff881022f56228  RFLAGS: 00000246
    RAX: 0000000000000002  RBX: ffff881022f56228  RCX: 0000000000001b78
    RDX: ffff880809274400  RSI: 0000000000000246  RDI: 0000000000000246
    RBP: ffffffff8100bc0e   R8: 0000000000000000   R9: ffffffff8163a940
    R10: 0000000000000001  R11: 0000000000000000  R12: ffff880809274400
    R13: ffffffffa0ab8878  R14: 0000000014208bc1  R15: ffff880840021b48
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
#18 [ffff881022f56230] cfs_trace_unlock_tcd at ffffffffa034336c [libcfs]
#19 [ffff881022f56250] libcfs_debug_vmsg2 at ffffffffa0354038 [libcfs]
#20 [ffff881022f563b0] libcfs_debug_msg at ffffffffa03545f1 [libcfs]
#21 [ffff881022f56410] osc_key_init at ffffffffa0a90e17 [osc]
#22 [ffff881022f56430] keys_fill at ffffffffa06d8c3f [obdclass]
#23 [ffff881022f56480] lu_context_init at ffffffffa06dcc6b [obdclass]
#24 [ffff881022f564b0] lu_env_init at ffffffffa06dce3e [obdclass]
#25 [ffff881022f564d0] cl_env_new at ffffffffa06e40bd [obdclass]
#26 [ffff881022f56520] cl_env_get at ffffffffa06e4acb [obdclass]
#27 [ffff881022f56560] lov_sub_get at ffffffffa0b3777d [lov]
#28 [ffff881022f565c0] lov_page_subio at ffffffffa0b37c5d [lov]
#29 [ffff881022f565f0] lov_page_own at ffffffffa0b311af [lov]
#30 [ffff881022f56620] cl_page_own0 at ffffffffa06e8b5b [obdclass]
#31 [ffff881022f56670] cl_page_own_try at ffffffffa06e8db3 [obdclass]
#32 [ffff881022f56680] discard_pagevec at ffffffffa0a926a9 [osc]
#33 [ffff881022f566d0] osc_lru_shrink at ffffffffa0a935d9 [osc]
#34 [ffff881022f567c0] osc_lru_del at ffffffffa0a94aa6 [osc]
#35 [ffff881022f56840] osc_page_delete at ffffffffa0a951b4 [osc]
#36 [ffff881022f56870] cl_page_delete0 at ffffffffa06e99e5 [obdclass]
#37 [ffff881022f568b0] cl_page_delete at ffffffffa06e9e62 [obdclass]
#38 [ffff881022f568d0] ll_releasepage at ffffffffa0bfe41b [lustre]
#39 [ffff881022f56920] try_to_release_page at ffffffff81110070
#40 [ffff881022f56930] shrink_page_list.clone.0 at ffffffff8112a501
#41 [ffff881022f56a60] shrink_inactive_list at ffffffff8112a8cb
#42 [ffff881022f56c10] shrink_zone at ffffffff8112b5df
#43 [ffff881022f56cc0] zone_reclaim at ffffffff8112c384
#44 [ffff881022f56da0] get_page_from_freelist at ffffffff81122834
#45 [ffff881022f56ec0] __alloc_pages_nodemask at ffffffff81123ab1
#46 [ffff881022f56fe0] kmem_getpages at ffffffff8115e2f2
#47 [ffff881022f57010] cache_grow at ffffffff8115e95f
#48 [ffff881022f57080] cache_alloc_refill at ffffffff8115ebb2
#49 [ffff881022f570f0] __kmalloc at ffffffff8115f8d9
#50 [ffff881022f57140] cfs_alloc at ffffffffa0344c40 [libcfs]
#51 [ffff881022f57170] ptlrpc_request_alloc_internal at ffffffffa085d407 [ptlrpc]
#52 [ffff881022f571a0] ptlrpc_request_alloc_pool at ffffffffa085d66e [ptlrpc]
#53 [ffff881022f571b0] osc_brw_prep_request at ffffffffa0a8451b [osc]
#54 [ffff881022f57290] osc_build_rpc at ffffffffa0a8a513 [osc]
#55 [ffff881022f573f0] osc_io_unplug0 at ffffffffa0aa642d [osc]
#56 [ffff881022f57500] osc_io_unplug at ffffffffa0aa7ce1 [osc]
#57 [ffff881022f57510] osc_enter_cache at ffffffffa0aa8473 [osc]
#58 [ffff881022f576a0] osc_queue_async_io at ffffffffa0aae916 [osc]
#59 [ffff881022f57830] osc_page_cache_add at ffffffffa0a94fc9 [osc]
#60 [ffff881022f57870] cl_page_cache_add at ffffffffa06e61d7 [obdclass]
#61 [ffff881022f57910] lov_page_cache_add at ffffffffa0b31325 [lov]
#62 [ffff881022f57940] cl_page_cache_add at ffffffffa06e61d7 [obdclass]
#63 [ffff881022f579e0] vvp_io_commit_write at ffffffffa0c1161d [lustre]
#64 [ffff881022f57a50] cl_io_commit_write at ffffffffa06f5b1d [obdclass]
#65 [ffff881022f57aa0] ll_commit_write at ffffffffa0be68be [lustre]
#66 [ffff881022f57b00] ll_write_end at ffffffffa0bfe4e0 [lustre]
#67 [ffff881022f57b30] generic_file_buffered_write at ffffffff81111684
#68 [ffff881022f57c00] __generic_file_aio_write at ffffffff81112f70
#69 [ffff881022f57cc0] generic_file_aio_write at ffffffff8111320f
#70 [ffff881022f57d10] vvp_io_write_start at ffffffffa0c11f3c [lustre]
#71 [ffff881022f57d50] cl_io_start at ffffffffa06f244a [obdclass]
#72 [ffff881022f57d80] cl_io_loop at ffffffffa06f6d54 [obdclass]
#73 [ffff881022f57db0] ll_file_io_generic at ffffffffa0bbda7b [lustre]
#74 [ffff881022f57e20] ll_file_aio_write at ffffffffa0bbdce2 [lustre]
#75 [ffff881022f57e80] ll_file_write at ffffffffa0bbeaac [lustre]
#76 [ffff881022f57ef0] vfs_write at ffffffff81177b98
#77 [ffff881022f57f30] sys_write at ffffffff811785a1
#78 [ffff881022f57f80] system_call_fastpath at ffffffff8100b0f2
    RIP: 00002aaaab24748d  RSP: 00007fffffffcb50  RFLAGS: 00010246
    RAX: 0000000000000001  RBX: ffffffff8100b0f2  RCX: 0000081f512ba39b
    RDX: 0000000000100000  RSI: 0000000000821000  RDI: 0000000000000009
    RBP: 0000000000100000   R8: 00000000007ff070   R9: 0000000080000000
    R10: 0000000000101008  R11: 0000000000000293  R12: 00000000007ff070
    R13: 0000000000821000  R14: 0000000004900000  R15: 0000000000000000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

Doing the math, this stack shows about 7.5k of used space: 0xffff881022f57f80 - 0xffff881022f56230 = 7504, which is awfully close to 8k.

Now lets look at the thread_info structure for that thread:

crash> p *(struct task_struct *)0xffff881028578aa0
...
  stack = 0xffff881022f56000,
...
crash> p *(struct thread_info *)0xffff881022f56000 
$5 = {
  task = 0xffffffff814f4cfa, 
  exec_domain = 0xffff881022f56028, 
  flags = 2167540686, 
  status = 4294967295, 
  cpu = 1072401920, 
  preempt_count = 34824, 
  addr_limit = {
    seg = 0
  }, 
  restart_block = {
    fn = 0xffff881022f56088, 
    {
      {
        arg0 = 18446744071582132352, 
        arg1 = 18446612201620201608, 
        arg2 = 52, 
        arg3 = 21474836533
      }, 
      futex = {
        uaddr = 0xffffffff81322880, 
        val = 586506376, 
        flags = 4294936592, 
        bitset = 52, 
        time = 21474836533, 
        uaddr2 = 0x100000000000006
      }, 
      nanosleep = {
        index = -2127419264, 
        rmtp = 0xffff881022f56088, 
        compat_rmtp = 0x34, 
        expires = 21474836533
      }, 
      poll = {
        ufds = 0xffffffff81322880, 
        nfds = 586506376, 
        has_timeout = -30704, 
        tv_sec = 52, 
        tv_nsec = 21474836533
      }
    }
  }, 
  sysenter_return = 0xffff881022f560f8, 
  uaccess_err = -2119203296
}

This structure is definitely trashed and contains bogus information, probably a result of going over the default 8k stack size on x86_64.

Attachments

Issue Links

is related to

LU-3416 Hangs on write in osc_enter_cache()

Resolved

is related to

LU-3188 IOR fails due to client stack overrun

Resolved

Stack overflow issues on x86_64 clients in memory reclaim during writes

Details

Description

Attachments

Issue Links

Activity

People

Dates