[LU-1138] Client Panic on Lustre 1.8.6 and RHEL 6 Created: 24/Feb/12  Updated: 22/Jan/24  Resolved: 24/Apr/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Dennis Nelson Assignee: Zhenyu Xu
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

RHEL 6 2.6.32-71.el6.x86_64 kernel


Attachments: File r20i1n11-20120213.console.bz2     File r20i1n11.messages.bz2    
Severity: 3
Rank (Obsolete): 6447

 Description   

Customer reports that a few compute nodes have been panic'ing. They have seen the behavior on 7 nodes. Each node has seen the problem numerous times. It looks like it may be similar to LU-93. I'd like to get Whamcloud to weigh in on whether you think it is related or if it is a known issue. The trace backs and console messages are attached.



 Comments   
Comment by Dennis Nelson [ 24/Feb/12 ]

Customer just asked me to bump up the priority on this one. They just reported that this issue has caused hundreds of nodes to become unresponsive on their system.

Comment by Andreas Dilger [ 24/Feb/12 ]

This doesn't appear to be the same as LU-93, which was causing the client to crash.

In this case, it looks like all of the threads are stuck in ll_teardown_mmaps->unmap_mapping_range() because the node is trying to free memory under memory pressure.

This is a somewhat unusual workload for Lustre, because while mmap IO is functional, it is quite inefficient (single page RPCs) and rarely used.

Has this application been running in the past on Lustre? Are there any changes in the environment that might have caused the application to start failing (e.g. kernel, Lustre, or application upgrade)?

Comment by Dennis Nelson [ 27/Feb/12 ]

I received the following from the customer today:

Please ask WC to stand down on it being P1. We found in a sample that there was
lustre so we went with that. Once we started looking at all of the traces lustre is present it SOME of
the stack traces but it is not in the most common. I would appreciate if Andreas
can have a look at some more strack traces to see if there is anything
he's seen before though.

ftp://shell.sgi.com/collect/jhanson/nodeswithsoftlockupconsoles.tar.bz2

What I've found by looking at these

Once there is a BUG: soft lockup the next lines are like this (example chosen at random)

BUG: soft lockup - CPU#0 stuck for 61s! [global_fcst:30024]
Modules linked in: acpi_cpufreq freq_table mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ptlrpc(U) ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad mlx4_ib iw_cxgb3 ko2iblnd(U) rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr obdclass(U) lnet(U) lvfs(U) libcfs(U) xpmem(U) xp gru xvma(U) numatools(U) microcode serio_raw i2c_i801 i2c_core iTCO_wdt
iTCO_vendor_support ioatdma ahci mlx4_en mlx4_core igb dca dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi qla4xxx scsi_transport_iscsi [last unloaded: ipmi_msghandler]
CPU 0:
Modules linked in: acpi_cpufreq freq_table mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ptlrpc(U) ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad mlx4_ib iw_cxgb3 ko2iblnd(U) rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr obdclass(U) lnet(U) lvfs(U) libcfs(U) xpmem(U) xp gru xvma(U) numatools(U) microcode serio_raw i2c_i801 i2c_core iTCO_wdt
iTCO_vendor_support ioatdma ahci mlx4_en mlx4_core igb dca dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi qla4xxx scsi_transport_iscsi [last unloaded: ipmi_msghandler]
Pid: 30024, comm: global_fcst Tainted: G W ---------------- 2.6.32-71.el6.x86_64 #1 AltixICE8400IP105
RIP: 0010:[<ffffffff814caa3e>] [<ffffffff814caa3e>] _spin_lock+0x1e/0x30
RSP: 0018:ffff8802e9b3fc38 EFLAGS: 00000297
RAX: 000000000000e364 RBX: ffff8802e9b3fc38 RCX: ffff8804b764de80
RDX: 0000000000000000 RSI: ffff88033d53d208 RDI: ffff880637837268
RBP: ffffffff81013c8e R08: ffff8802e9b3fe10 R09: 0000000000100000
R10: 00007fffffff2dc0 R11: 0000000000000213 R12: ffff88033b712100
R13: ffffffff817300c0 R14: ffff88033b7126b8 R15: 0000000000010518
FS: 00002aaaaf3e0800(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaae8f0840 CR3: 000000033ca90000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffffa0304fb1>] ? xpmem_tg_ref_by_tgid+0x41/0xe0 [xpmem]
[<ffffffff81068598>] ? get_task_mm+0x28/0x70
[<ffffffffa030073a>] ? xpmem_make+0x9a/0x360 [xpmem]
[<ffffffff8110c037>] ? __lock_page+0x67/0x70
[<ffffffffa02ff19d>] ? xpmem_ioctl+0xdd/0x3f0 [xpmem]
[<ffffffff8110dade>] ? filemap_fault+0xbe/0x510
[<ffffffff8110c177>] ? unlock_page+0x27/0x30
[<ffffffff81135837>] ? handle_pte_fault+0xf7/0xad0
[<ffffffff811502a7>] ? alloc_pages_current+0x87/0xd0
[<ffffffff8117f182>] ? vfs_ioctl+0x22/0xa0
[<ffffffff81258ae5>] ? _atomic_dec_and_lock+0x55/0x80
[<ffffffff81013c8e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff8117f324>] ? do_vfs_ioctl+0x84/0x580
[<ffffffff811363fd>] ? handle_mm_fault+0x1ed/0x2b0
[<ffffffff8117f8a1>] ? sys_ioctl+0x81/0xa0
[<ffffffff81013172>] ? system_call_fastpath+0x16/0x1b

So I went to look for commonality after Call Trace: and found little with lots
of possible places to check.

guest@globe:/cores/people/jhanson/noaa/softlockup/nodeswithsoftlockupconsoles> grep --binary-files=text -h -A1 "Call Trace" r* | sort | uniq

Call Trace:
[<ffffffff810117bc>] ? __switch_to+0x1ac/0x320
[<ffffffff81013ace>] ? common_interrupt+0xe/0x13
[<ffffffff81013b76>] retint_careful+0x14/0x32
[<ffffffff81013c8e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff81013cee>] ? invalidate_interrupt1+0xe/0x20
[<ffffffff81013d4e>] ? invalidate_interrupt4+0xe/0x20
[<ffffffff81013d6e>] ? invalidate_interrupt5+0xe/0x20
[<ffffffff81014162>] ? kernel_thread+0x82/0xe0
[<ffffffff81014645>] ? math_state_restore+0x45/0x60
[<ffffffff8101660f>] ? dump_trace+0x1af/0x3a0
[<ffffffff8101a4f9>] ? read_tsc+0x9/0x20
[<ffffffff8104f61c>] ? enqueue_task+0x5c/0x70
[<ffffffff8104fff9>] ? __wake_up_common+0x59/0x90
[<ffffffff810507f8>] ? resched_task+0x68/0x80
[<ffffffff810508a5>] ? check_preempt_curr_idle+0x15/0x20
[<ffffffff81056303>] ? __wake_up+0x53/0x70
[<ffffffff81056630>] ? __dequeue_entity+0x30/0x50
[<ffffffff81059d12>] ? finish_task_switch+0x42/0xd0
[<ffffffff8105a808>] ? pull_task+0x58/0x70
[<ffffffff8105c490>] ? default_wake_function+0x0/0x20
[<ffffffff8105c4a2>] ? default_wake_function+0x12/0x20
[<ffffffff8105c4e5>] ? wake_up_process+0x15/0x20
[<ffffffff8105c756>] ? update_curr+0xe6/0x1e0
[<ffffffff8105fa72>] ? enqueue_entity+0x122/0x320
[<ffffffff8105fcb3>] ? enqueue_task_fair+0x43/0x90
[<ffffffff81061b71>] ? dequeue_entity+0x1a1/0x1e0
[<ffffffff81062b84>] ? find_busiest_group+0x254/0xb40
[<ffffffff8106329a>] ? find_busiest_group+0x96a/0xb40
[<ffffffff81066d6e>] ? select_task_rq_fair+0x9ee/0xab0
[<ffffffff810670c1>] ? check_preempt_wakeup+0x41/0x3c0
[<ffffffff81067244>] ? check_preempt_wakeup+0x1c4/0x3c0
[<ffffffff81067732>] migration_thread+0x1d2/0x310
[<ffffffff81069207>] ? dup_mm+0x2a7/0x520
[<ffffffff8106b857>] warn_slowpath_common+0x87/0xc0
[<ffffffff8106b9f5>] ? __call_console_drivers+0x75/0x90
[<ffffffff8106d0a1>] do_syslog+0x461/0x4c0
[<ffffffff8106f805>] do_wait+0x1c5/0x250
[<ffffffff8107064f>] do_exit+0x56f/0x820
[<ffffffff810737a5>] ksoftirqd+0xd5/0x110
[<ffffffff8107d5ac>] ? lock_timer_base+0x3c/0x70
[<ffffffff8107e616>] ? mod_timer+0x146/0x230
[<ffffffff8107e718>] ? add_timer+0x18/0x30
[<ffffffff8108ac20>] ? __call_usermodehelper+0x0/0xa0
[<ffffffff8108c4a0>] ? worker_thread+0x0/0x2a0
[<ffffffff8108cc82>] ? queue_work_on+0x42/0x60
[<ffffffff81091cb6>] ? autoremove_wake_function+0x16/0x40
[<ffffffff81091eae>] ? prepare_to_wait_exclusive+0x4e/0x80
[<ffffffff81091f8e>] ? prepare_to_wait+0x4e/0x80
[<ffffffff81095da3>] ? __hrtimer_start_range_ns+0x1a3/0x430
[<ffffffff8109638a>] ? down_read_trylock+0x1a/0x30
[<ffffffff81096bff>] ? up+0x2f/0x50
[<ffffffff81098f05>] async_manager_thread+0xc5/0x100
[<ffffffff8109b9a9>] ? ktime_get_ts+0xa9/0xe0
[<ffffffff810a25a9>] futex_wait_queue_me+0xb9/0xf0
[<ffffffff810a666b>] ? rt_mutex_adjust_pi+0x7b/0x90
[<ffffffff810c2b01>] ? cpuset_print_task_mems_allowed+0x91/0xb0
[<ffffffff810c2b01>] ? cpuset_print_task_mems_allowed+0x91/0xb0
[<ffffffff810ca7b6>] ? audit_hold_skb+0x26/0x50
[<ffffffff810cab7b>] ? kauditd_send_skb+0x3b/0x90
[<ffffffff810d3d4b>] ? audit_syscall_exit+0x25b/0x290
[<ffffffff8110351b>] slow_work_thread+0x32b/0x3a0
[<ffffffff81108047>] ? perf_event_exit_task+0x37/0x160
[<ffffffff8110b832>] ? iov_iter_copy_from_user_atomic+0x92/0x130
[<ffffffff8110bb70>] ? find_get_pages_tag+0x40/0x120
[<ffffffff8110c060>] ? sync_page+0x0/0x50
[<ffffffff8110c0b0>] ? sync_page_killable+0x0/0x40
[<ffffffff8110eecb>] oom_kill_process+0xcb/0x2e0
[<ffffffff8111b3a5>] ? __rmqueue+0xc5/0x490
[<ffffffff8111bd57>] bad_page+0x107/0x160
[<ffffffff8111cf91>] ? get_page_from_freelist+0x3d1/0x820
[<ffffffff8111e1c6>] ? __alloc_pages_nodemask+0xf6/0x810
[<ffffffff8111e48d>] ? __alloc_pages_nodemask+0x3bd/0x810
[<ffffffff8111e745>] __alloc_pages_nodemask+0x675/0x810
[<ffffffff8111f78a>] ? determine_dirtyable_memory+0x1a/0x30
[<ffffffff81120951>] ? do_writepages+0x21/0x40
[<ffffffff8112bc27>] ? vma_prio_tree_next+0x47/0x70
[<ffffffff8112d14d>] ? zone_statistics+0x7d/0xa0
[<ffffffff8112d980>] ? vmstat_update+0x0/0x40
[<ffffffff8112de70>] ? bdi_sync_supers+0x0/0x60
[<ffffffff811336b5>] ? unmap_vmas+0xa85/0xc00
[<ffffffff811345a2>] ? unmap_mapping_range+0x72/0x150
[<ffffffff81135a85>] ? handle_pte_fault+0x345/0xad0
[<ffffffff81136455>] ? handle_mm_fault+0x245/0x2b0
[<ffffffff81139582>] ? unlink_file_vma+0x42/0x70
[<ffffffff8113e59d>] ? rmap_walk+0x7d/0x1c0
[<ffffffff8113f2de>] ? page_referenced+0x9e/0x2f0
[<ffffffff8113fb72>] ? try_to_unmap_file+0x42/0x750
[<ffffffff81156007>] ? cache_grow+0x217/0x320
[<ffffffff811560bf>] ? cache_grow+0x2cf/0x320
[<ffffffff81157e51>] ? drain_array+0xe1/0x100
[<ffffffff81158d38>] ? drain_freelist+0x78/0xc0
[<ffffffff81158d80>] ? cache_reap+0x0/0x260
[<ffffffff8115fe28>] ? __mem_cgroup_uncharge_common+0x78/0x260
[<ffffffff81161c89>] ? mem_cgroup_charge_common+0x99/0xc0
[<ffffffff81165218>] khugepaged+0x958/0x1190
[<ffffffff8116c65a>] ? do_sync_read+0xfa/0x140
[<ffffffff81175fdb>] pipe_wait+0x5b/0x80
[<ffffffff81258839>] ? cpumask_next_and+0x29/0x50
[<ffffffff81262a54>] ? vsnprintf+0x484/0x5f0
[<ffffffff81264025>] ? memmove+0x45/0x50
[<ffffffff812fcaa0>] ? flush_to_ldisc+0x0/0x1b0
[<ffffffff812fee81>] vt_event_wait+0xa1/0x100
[<ffffffff8137fe39>] hub_thread+0x369/0x17f0
[<ffffffff8138a164>] ? usb_suspend_both+0x1a4/0x320
[<ffffffff814277d0>] ? eth_type_trans+0x40/0x140
[<ffffffff81445e95>] ? ip_local_out+0x25/0x30
[<ffffffff8144e7e6>] ? tcp_sendmsg+0x756/0xa30
[<ffffffff8149b2d6>] ? unix_stream_sendmsg+0x3c6/0x3e0
[<ffffffff814c7b23>] panic+0x78/0x137
[<ffffffff814c8286>] ? thread_return+0x4e/0x778
[<ffffffff814c8b00>] ? _cond_resched+0x30/0x40
[<ffffffff814c8c5c>] ? wait_for_common+0x14c/0x180
[<ffffffff814c8d4d>] ? wait_for_completion+0x1d/0x20
[<ffffffff814c8f34>] schedule_timeout+0x194/0x2f0
[<ffffffff814c8f3c>] ? schedule_timeout+0x19c/0x2f0
[<ffffffff814c8fc5>] schedule_timeout+0x225/0x2f0
[<ffffffff814c96e0>] ? __mutex_lock_slowpath+0x70/0x180
[<ffffffff814c97ae>] __mutex_lock_slowpath+0x13e/0x180
[<ffffffff814c9ad8>] schedule_hrtimeout_range+0xc8/0x160
[<ffffffff814c9b4d>] schedule_hrtimeout_range+0x13d/0x160
[<ffffffff814c9c1b>] do_nanosleep+0x8b/0xc0
[<ffffffff814ca6b5>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff814cac1b>] ? _spin_unlock_bh+0x1b/0x20
[<ffffffff814cd766>] ? notifier_call_chain+0x16/0x80
[<ffffffffa00a78be>] ? __put_nfs_open_context+0x3e/0xc0 [nfs]
[<ffffffffa00a9e10>] ? fib6_clean_node+0x0/0xd0 [ipv6]
[<ffffffffa00b0540>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
[<ffffffffa01407fd>] ? call_transmit_status+0x4d/0xe0 [sunrpc]
[<ffffffffa01433e9>] ? xprt_release_xprt+0x89/0x90 [sunrpc]
[<ffffffffa01435bf>] ? xprt_reserve+0x1cf/0x1f0 [sunrpc]
[<ffffffffa01444a0>] ? xprt_autoclose+0x0/0x70 [sunrpc]
[<ffffffffa0146210>] ? xs_tcp_connect_worker4+0x0/0x30 [sunrpc]
[<ffffffffa01488a0>] ? rpc_async_release+0x0/0x20 [sunrpc]
[<ffffffffa0148d00>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
[<ffffffffa0149760>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
[<ffffffffa01e68be>] ? __put_nfs_open_context+0x3e/0xc0 [nfs]
[<ffffffffa01e7560>] ? nfs_wait_bit_killable+0x0/0x40 [nfs]
[<ffffffffa01ef540>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
[<ffffffffa01f40cd>] ? nfs_commit_free+0x3d/0x50 [nfs]
[<ffffffffa01f4688>] ? nfs_writeback_release_full+0x128/0x1b0 [nfs]
[<ffffffffa01fe3a5>] xpmem_clear_PFNtable+0x185/0x340 [xpmem]
[<ffffffffa02467b0>] ? process_req+0x0/0x1a0 [ib_addr]
[<ffffffffa02745ae>] ? mlx4_ib_post_send+0x4be/0xf10 [mlx4_ib]
[<ffffffffa02a80cd>] ? mcast_work_handler+0xed/0x830 [ib_sa]
[<ffffffffa030073a>] xpmem_make+0x9a/0x360 [xpmem]
[<ffffffffa0304fb1>] ? xpmem_tg_ref_by_tgid+0x41/0xe0 [xpmem]
[<ffffffffa03054f1>] ? xpmem_PFNs_exist_in_range_l3+0x51/0xa0 [xpmem]
[<ffffffffa0308445>] xpmem_clear_PFNtable+0x185/0x340 [xpmem]
[<ffffffffa0309ec8>] ? xpmem_recall_PFNs_of_tg+0xf8/0x2d0 [xpmem]
[<ffffffffa030a40b>] xpmem_pgcl_thread+0x1db/0x220 [xpmem]
[<ffffffffa0320ab2>] lcw_dispatch_main+0xd2/0x400 [libcfs]
[<ffffffffa0353b8b>] ? mlx4_ib_poll_cq+0x2ab/0x780 [mlx4_ib]
[<ffffffffa0379c9d>] ? LNetMDAttach+0x35d/0x4c0 [lnet]
[<ffffffffa03dbc5a>] obd_zombie_impexp_thread+0x15a/0x2b0 [obdclass]
[<ffffffffa046a330>] ? ipoib_reap_ah+0x0/0x50 [ib_ipoib]
[<ffffffffa04e6c3a>] ? kiblnd_queue_tx+0x4a/0x60 [ko2iblnd]
[<ffffffffa04f3eb6>] ? loi_list_maint+0xa6/0x130 [osc]
[<ffffffffa050fb64>] ? cache_add_extent+0x134/0x640 [osc]
[<ffffffffa056efd0>] ? ib_mad_completion_handler+0x0/0x810 [ib_mad]
[<ffffffffa057a492>] ? cm_process_work+0x32/0x110 [ib_cm]
[<ffffffffa057bcff>] ? cm_rep_handler+0x31f/0x590 [ib_cm]
[<ffffffffa057bf70>] ? cm_work_handler+0x0/0x11d6 [ib_cm]
[<ffffffffa0584330>] ? cma_work_handler+0x0/0xb0 [rdma_cm]
[<ffffffffa059fc81>] ? kiblnd_init_tx_msg+0x91/0x200 [ko2iblnd]
[<ffffffffa05a4465>] kiblnd_scheduler+0x325/0x760 [ko2iblnd]
[<ffffffffa05bafed>] ? ldlm_lock_put+0x19d/0x450 [ptlrpc]
[<ffffffffa05bffb1>] ? ldlm_lock_decref+0x41/0xb0 [ptlrpc]
[<ffffffffa05c0af3>] ? ldlm_resource_putref_internal+0xb3/0x4c0 [ptlrpc]
[<ffffffffa05e3397>] ? ldlm_callback_handler+0xa57/0x1e10 [ptlrpc]
[<ffffffffa05e6140>] ldlm_bl_thread_main+0x3f0/0x440 [ptlrpc]
[<ffffffffa060d1d0>] ptlrpc_wait_event+0x3b0/0x3c0 [ptlrpc]
[<ffffffffa060e6a7>] ? lov_merge_lvb+0xb7/0x240 [lov]
[<ffffffffa0684ac2>] ? ll_removepage+0x352/0x8d0 [lustre]
[<ffffffffa0695c9c>] ? ll_file_mmap+0x12c/0x180 [lustre]
[<ffffffffa06ef6a7>] ? lov_merge_lvb+0xb7/0x240 [lov]
[<ffffffffa06f20f5>] ? lov_finish_set+0x435/0x710 [lov]
[<ffffffffa07056a7>] ? lov_merge_lvb+0xb7/0x240 [lov]
[<ffffffffa073f1a4>] ll_close_thread+0x124/0x260 [lustre]
[<ffffffffa075aac2>] ? ll_removepage+0x352/0x8d0 [lustre]
[<ffffffffa09d7c9c>] ? ll_file_mmap+0x12c/0x180 [lustre]
<IRQ>
<IRQ> [<ffffffff8106b857>] warn_slowpath_common+0x87/0xc0
<IRQ> [<ffffffff810d8740>] ? handle_IRQ_event+0x60/0x170
<IRQ> [<ffffffff814c7b23>] panic+0x78/0x137

It is probably not unexpected there are many places because
guest@globe:/cores/people/jhanson/noaa/softlockup/nodeswithsoftlockupconsoles> grep --binary-files=text -h -A1 "Call Trace" r* | wc -l
372554

In the history if this cluster (as reflected in the console logs) we have had BUG: soft lockup 119496 times.

There are a wide variety of places where the back trace starts but the two most dominant are

grep -binary-files=text -h -A1 "Call Trace" r* | grep -v "Call Trace" | grep -v ^- | grep unmap_mapping_range | wc -l
49558
grep -binary-files=text -h -A1 "Call Trace" r* | grep -v "Call Trace" | grep -v ^- | grep xpmem_tg_ref_by_tgid | wc -l
30446

After the first function in the dominant ones it starts to diverge for unmap_mapping_range
grep --binary-files=text -h -A1 "unmap_mapping_range" r* | sort | uniq

[<ffffffff81013cce>] ? invalidate_interrupt0+0xe/0x20
[<ffffffff810ddc95>] ? call_rcu_sched+0x15/0x20
[<ffffffff811343b4>] unmap_mapping_range_vma+0x64/0xf0
[<ffffffff811343ea>] ? unmap_mapping_range_vma+0x9a/0xf0
[<ffffffff811344d7>] ? unmap_mapping_range_tree+0x97/0xf0
[<ffffffff811344d7>] unmap_mapping_range_tree+0x97/0xf0
[<ffffffff811345a2>] ? unmap_mapping_range+0x72/0x150
[<ffffffff811345a2>] unmap_mapping_range+0x72/0x150
[<ffffffff81134661>] ? unmap_mapping_range+0x131/0x150
[<ffffffff81134661>] unmap_mapping_range+0x131/0x150
[<ffffffff814caa3e>] ? _spin_lock+0x1e/0x30
[<ffffffff814caa41>] ? _spin_lock+0x21/0x30
[<ffffffffa01fb4f1>] ? xpmem_PFNs_exist_in_range_l3+0x51/0xa0 [xpmem]
[<ffffffffa042231c>] ? ll_teardown_mmaps+0x6c/0x1c0 [lustre]
[<ffffffffa042231c>] ll_teardown_mmaps+0x6c/0x1c0 [lustre]
[<ffffffffa069631c>] ? ll_teardown_mmaps+0x6c/0x1c0 [lustre]
[<ffffffffa069631c>] ll_teardown_mmaps+0x6c/0x1c0 [lustre]
[<ffffffffa076c31c>] ? ll_teardown_mmaps+0x6c/0x1c0 [lustre]
[<ffffffffa076c31c>] ll_teardown_mmaps+0x6c/0x1c0 [lustre]
[<ffffffffa09d831c>] ? ll_teardown_mmaps+0x6c/0x1c0 [lustre]
[<ffffffffa09d831c>] ll_teardown_mmaps+0x6c/0x1c0 [lustre]
[<ffffffffa0ac631c>] ? ll_teardown_mmaps+0x6c/0x1c0 [lustre]
[<ffffffffa0ac631c>] ll_teardown_mmaps+0x6c/0x1c0 [lustre]

For xpmem_tg_ref_by_tgid it is only get_task_mm

Comment by Peter Jones [ 28/Feb/12 ]

Bobi

Andreas is rather busy at the moment so could you please review and comment on this latest information from our customer?

Thanks

Peter

Comment by Zhenyu Xu [ 29/Feb/12 ]

When did this situation happen? Did it happen after switched to RHEL6? Upgrading from older Lustre 1.8.6, or since starting to use specific kernel version or other software?

Comment by Dennis Nelson [ 24/Apr/12 ]

Just looking at open cases. Customer found this was not a Lustre issue after all. I believe that they upgraded the kernel to fix the issue. Please close this.

Comment by Peter Jones [ 24/Apr/12 ]

ok thanks for the update Dennis

Generated at Sat Feb 10 01:13:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.