Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2859

Stack overflow issues on x86_64 clients in memory reclaim during writes

    XMLWordPrintable

Details

    • 3
    • 6925

    Description

      I was trying to run some IOR tests using a 2.3.58 based client and am running into what appear to be stack overflow issues. The test is an IOR write using 256 clients, with 16 tasks per client:

      $ salloc -N256 -n$((256*16)) -pltest
      $ srun -- ior -a POSIX -Cegw -F -o /p/lstest/surya1/lu2139-ior/%d -t 1m -b 8G -i 5
      

      The clients are all x86_64, with the default stack size (8k), which is important to note. Some info on an example crash that I've seen is below.

      The console shows this:

      LustreError: 7408:0:(osc_dev.c:129:osc_key_init()) ASSERTION( (!(CFS_ALLOC_IO != CFS_ALLOC_ATOMIC) || (!(((current_thread_info()->preempt_count) & ((((1UL << (10))-1) << ((0 + 8) + 8)) | (((1UL << (8))-1) << (0 + 8)) | (((1UL << (1))-1) << (((0 + 8) + 8) + 10))))))) ) failed: 
      BUG: unable to handle kernel paging request at 00000001811be3e0
      IP: [<ffffffff81052994>] update_curr+0x144/0x1f0
      PGD 1024273067 PUD 0 
      Thread overran stack, or stack corrupted
      Oops: 0000 [#1] SMP 
      last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
      CPU 8 
      Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) dm_round_robin scsi_dh_rdac sg sd_mod crc_t10dif zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate acpi_cpufreq freq_table mperf ib_srp scsi_transport_srp scsi_tgt ko2iblnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core dm_mirror dm_region_hash dm_log dm_multipath dm_mod vhost_net macvtap macvlan tun kvm wmi sb_edac edac_core i2c_i801 i2c_core ahci iTCO_wdt iTCO_vendor_support ioatdma ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc igb dca mlx4_en mlx4_core [last unloaded: cpufreq_ondemand]
      
      Pid: 7408, comm: ior Tainted: P           ----------------   2.6.32-220.23.1.2chaos.ch5.x86_64 #1 appro gb812x-cn/S2600JF
      ...
      

      The stack reported by crash:

      crash> bt
      PID: 7408   TASK: ffff881028578aa0  CPU: 8   COMMAND: "ior"
       #0 [ffff88085c4039a0] machine_kexec at ffffffff8103216b
       #1 [ffff88085c403a00] crash_kexec at ffffffff810b8d12
       #2 [ffff88085c403ad0] oops_end at ffffffff814f2c00
       #3 [ffff88085c403b00] no_context at ffffffff810423fb
       #4 [ffff88085c403b50] __bad_area_nosemaphore at ffffffff81042685
       #5 [ffff88085c403ba0] bad_area_nosemaphore at ffffffff81042753
       #6 [ffff88085c403bb0] __do_page_fault at ffffffff81042e0d
       #7 [ffff88085c403cd0] do_page_fault at ffffffff814f4bde
       #8 [ffff88085c403d00] page_fault at ffffffff814f1f95
          [exception RIP: update_curr+324]
          RIP: ffffffff81052994  RSP: ffff88085c403db8  RFLAGS: 00010086
          RAX: ffff881028578aa0  RBX: 000000003feb8e00  RCX: ffff88082ff111c0
          RDX: 0000000000018b88  RSI: 0000000000000000  RDI: ffff881028578ad8
          RBP: ffff88085c403de8   R8: ffffffff8160b665   R9: 000000000000001a
          R10: 0000000000000010  R11: 000000000000001a  R12: ffff88085c416028
          R13: 0000000001a5b3e7  R14: 0000004135410580  R15: ffff881028578aa0
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #9 [ffff88085c403df0] task_tick_fair at ffffffff81052f4b
      #10 [ffff88085c403e20] scheduler_tick at ffffffff81056791
      #11 [ffff88085c403e60] update_process_times at ffffffff8107c7e2
      #12 [ffff88085c403e90] tick_sched_timer at ffffffff810a0976
      #13 [ffff88085c403ec0] __run_hrtimer at ffffffff8109580e
      #14 [ffff88085c403f10] hrtimer_interrupt at ffffffff81095bb6
      #15 [ffff88085c403f90] smp_apic_timer_interrupt at ffffffff814f754b
      #16 [ffff88085c403fb0] apic_timer_interrupt at ffffffff8100bc13
      --- <IRQ stack> ---
      #17 [ffff881022f56178] apic_timer_interrupt at ffffffff8100bc13
          [exception RIP: _spin_unlock_irqrestore+23]
          RIP: ffffffff814f1ad7  RSP: ffff881022f56228  RFLAGS: 00000246
          RAX: 0000000000000002  RBX: ffff881022f56228  RCX: 0000000000001b78
          RDX: ffff880809274400  RSI: 0000000000000246  RDI: 0000000000000246
          RBP: ffffffff8100bc0e   R8: 0000000000000000   R9: ffffffff8163a940
          R10: 0000000000000001  R11: 0000000000000000  R12: ffff880809274400
          R13: ffffffffa0ab8878  R14: 0000000014208bc1  R15: ffff880840021b48
          ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
      #18 [ffff881022f56230] cfs_trace_unlock_tcd at ffffffffa034336c [libcfs]
      #19 [ffff881022f56250] libcfs_debug_vmsg2 at ffffffffa0354038 [libcfs]
      #20 [ffff881022f563b0] libcfs_debug_msg at ffffffffa03545f1 [libcfs]
      #21 [ffff881022f56410] osc_key_init at ffffffffa0a90e17 [osc]
      #22 [ffff881022f56430] keys_fill at ffffffffa06d8c3f [obdclass]
      #23 [ffff881022f56480] lu_context_init at ffffffffa06dcc6b [obdclass]
      #24 [ffff881022f564b0] lu_env_init at ffffffffa06dce3e [obdclass]
      #25 [ffff881022f564d0] cl_env_new at ffffffffa06e40bd [obdclass]
      #26 [ffff881022f56520] cl_env_get at ffffffffa06e4acb [obdclass]
      #27 [ffff881022f56560] lov_sub_get at ffffffffa0b3777d [lov]
      #28 [ffff881022f565c0] lov_page_subio at ffffffffa0b37c5d [lov]
      #29 [ffff881022f565f0] lov_page_own at ffffffffa0b311af [lov]
      #30 [ffff881022f56620] cl_page_own0 at ffffffffa06e8b5b [obdclass]
      #31 [ffff881022f56670] cl_page_own_try at ffffffffa06e8db3 [obdclass]
      #32 [ffff881022f56680] discard_pagevec at ffffffffa0a926a9 [osc]
      #33 [ffff881022f566d0] osc_lru_shrink at ffffffffa0a935d9 [osc]
      #34 [ffff881022f567c0] osc_lru_del at ffffffffa0a94aa6 [osc]
      #35 [ffff881022f56840] osc_page_delete at ffffffffa0a951b4 [osc]
      #36 [ffff881022f56870] cl_page_delete0 at ffffffffa06e99e5 [obdclass]
      #37 [ffff881022f568b0] cl_page_delete at ffffffffa06e9e62 [obdclass]
      #38 [ffff881022f568d0] ll_releasepage at ffffffffa0bfe41b [lustre]
      #39 [ffff881022f56920] try_to_release_page at ffffffff81110070
      #40 [ffff881022f56930] shrink_page_list.clone.0 at ffffffff8112a501
      #41 [ffff881022f56a60] shrink_inactive_list at ffffffff8112a8cb
      #42 [ffff881022f56c10] shrink_zone at ffffffff8112b5df
      #43 [ffff881022f56cc0] zone_reclaim at ffffffff8112c384
      #44 [ffff881022f56da0] get_page_from_freelist at ffffffff81122834
      #45 [ffff881022f56ec0] __alloc_pages_nodemask at ffffffff81123ab1
      #46 [ffff881022f56fe0] kmem_getpages at ffffffff8115e2f2
      #47 [ffff881022f57010] cache_grow at ffffffff8115e95f
      #48 [ffff881022f57080] cache_alloc_refill at ffffffff8115ebb2
      #49 [ffff881022f570f0] __kmalloc at ffffffff8115f8d9
      #50 [ffff881022f57140] cfs_alloc at ffffffffa0344c40 [libcfs]
      #51 [ffff881022f57170] ptlrpc_request_alloc_internal at ffffffffa085d407 [ptlrpc]
      #52 [ffff881022f571a0] ptlrpc_request_alloc_pool at ffffffffa085d66e [ptlrpc]
      #53 [ffff881022f571b0] osc_brw_prep_request at ffffffffa0a8451b [osc]
      #54 [ffff881022f57290] osc_build_rpc at ffffffffa0a8a513 [osc]
      #55 [ffff881022f573f0] osc_io_unplug0 at ffffffffa0aa642d [osc]
      #56 [ffff881022f57500] osc_io_unplug at ffffffffa0aa7ce1 [osc]
      #57 [ffff881022f57510] osc_enter_cache at ffffffffa0aa8473 [osc]
      #58 [ffff881022f576a0] osc_queue_async_io at ffffffffa0aae916 [osc]
      #59 [ffff881022f57830] osc_page_cache_add at ffffffffa0a94fc9 [osc]
      #60 [ffff881022f57870] cl_page_cache_add at ffffffffa06e61d7 [obdclass]
      #61 [ffff881022f57910] lov_page_cache_add at ffffffffa0b31325 [lov]
      #62 [ffff881022f57940] cl_page_cache_add at ffffffffa06e61d7 [obdclass]
      #63 [ffff881022f579e0] vvp_io_commit_write at ffffffffa0c1161d [lustre]
      #64 [ffff881022f57a50] cl_io_commit_write at ffffffffa06f5b1d [obdclass]
      #65 [ffff881022f57aa0] ll_commit_write at ffffffffa0be68be [lustre]
      #66 [ffff881022f57b00] ll_write_end at ffffffffa0bfe4e0 [lustre]
      #67 [ffff881022f57b30] generic_file_buffered_write at ffffffff81111684
      #68 [ffff881022f57c00] __generic_file_aio_write at ffffffff81112f70
      #69 [ffff881022f57cc0] generic_file_aio_write at ffffffff8111320f
      #70 [ffff881022f57d10] vvp_io_write_start at ffffffffa0c11f3c [lustre]
      #71 [ffff881022f57d50] cl_io_start at ffffffffa06f244a [obdclass]
      #72 [ffff881022f57d80] cl_io_loop at ffffffffa06f6d54 [obdclass]
      #73 [ffff881022f57db0] ll_file_io_generic at ffffffffa0bbda7b [lustre]
      #74 [ffff881022f57e20] ll_file_aio_write at ffffffffa0bbdce2 [lustre]
      #75 [ffff881022f57e80] ll_file_write at ffffffffa0bbeaac [lustre]
      #76 [ffff881022f57ef0] vfs_write at ffffffff81177b98
      #77 [ffff881022f57f30] sys_write at ffffffff811785a1
      #78 [ffff881022f57f80] system_call_fastpath at ffffffff8100b0f2
          RIP: 00002aaaab24748d  RSP: 00007fffffffcb50  RFLAGS: 00010246
          RAX: 0000000000000001  RBX: ffffffff8100b0f2  RCX: 0000081f512ba39b
          RDX: 0000000000100000  RSI: 0000000000821000  RDI: 0000000000000009
          RBP: 0000000000100000   R8: 00000000007ff070   R9: 0000000080000000
          R10: 0000000000101008  R11: 0000000000000293  R12: 00000000007ff070
          R13: 0000000000821000  R14: 0000000004900000  R15: 0000000000000000
          ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
      

      Doing the math, this stack shows about 7.5k of used space: 0xffff881022f57f80 - 0xffff881022f56230 = 7504, which is awfully close to 8k.

      Now lets look at the thread_info structure for that thread:

      crash> p *(struct task_struct *)0xffff881028578aa0
      ...
        stack = 0xffff881022f56000,
      ...
      crash> p *(struct thread_info *)0xffff881022f56000 
      $5 = {
        task = 0xffffffff814f4cfa, 
        exec_domain = 0xffff881022f56028, 
        flags = 2167540686, 
        status = 4294967295, 
        cpu = 1072401920, 
        preempt_count = 34824, 
        addr_limit = {
          seg = 0
        }, 
        restart_block = {
          fn = 0xffff881022f56088, 
          {
            {
              arg0 = 18446744071582132352, 
              arg1 = 18446612201620201608, 
              arg2 = 52, 
              arg3 = 21474836533
            }, 
            futex = {
              uaddr = 0xffffffff81322880, 
              val = 586506376, 
              flags = 4294936592, 
              bitset = 52, 
              time = 21474836533, 
              uaddr2 = 0x100000000000006
            }, 
            nanosleep = {
              index = -2127419264, 
              rmtp = 0xffff881022f56088, 
              compat_rmtp = 0x34, 
              expires = 21474836533
            }, 
            poll = {
              ufds = 0xffffffff81322880, 
              nfds = 586506376, 
              has_timeout = -30704, 
              tv_sec = 52, 
              tv_nsec = 21474836533
            }
          }
        }, 
        sysenter_return = 0xffff881022f560f8, 
        uaccess_err = -2119203296
      }
      

      This structure is definitely trashed and contains bogus information, probably a result of going over the default 8k stack size on x86_64.

      Attachments

        Issue Links

          Activity

            People

              jay Jinshan Xiong (Inactive)
              prakash Prakash Surya (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: