[LU-2941] Oops in lu_site_stats_print() Created: 09/Mar/13  Updated: 15/Oct/13  Resolved: 15/Oct/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.5.0

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: patch

Severity: 3
Rank (Obsolete): 7059

 Description   

To reproduce run

while true; do cat /proc/fs/lustre/llite/*/site &> /dev/null; done &
llmount.sh 
crash> bt
PID: 2221   TASK: ffff88018da2a040  CPU: 1   COMMAND: "cat"
 #0 [ffff8801671c18d0] machine_kexec at ffffffff81031f7b
 #1 [ffff8801671c1930] crash_kexec at ffffffff810b8c22
 #2 [ffff8801671c1a00] oops_end at ffffffff814eef80
 #3 [ffff8801671c1a30] no_context at ffffffff81042a0b
 #4 [ffff8801671c1a80] __bad_area_nosemaphore at ffffffff81042c95
 #5 [ffff8801671c1ad0] bad_area at ffffffff81042dbe
 #6 [ffff8801671c1b00] __do_page_fault at ffffffff81043570
 #7 [ffff8801671c1c20] do_page_fault at ffffffff814f0f5e
 #8 [ffff8801671c1c50] page_fault at ffffffff814ee315
    [exception RIP: lu_site_stats_print+66]
    RIP: ffffffffa10a4752  RSP: ffff8801671c1d08  RFLAGS: 00010282
    RAX: ffff880118714000  RBX: 0000000000000000  RCX: 0000000000001000
    RDX: 0000000000001000  RSI: ffff8801671c1d98  RDI: 0000000000000000
    RBP: ffff8801671c1dd8   R8: ffff8801671c1e64   R9: ffff8801664b6c00
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000001000
    R13: 0000000000000000  R14: ffff880166ee4000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff8801671c1de0] cl_site_stats_print at ffffffffa10ac617 [obdclass]
#10 [ffff8801671c1e30] ll_rd_site_stats at ffffffffa0c5ad25 [lustre]
#11 [ffff8801671c1e40] lprocfs_fops_read at ffffffffa10649e3 [obdclass]
#12 [ffff8801671c1ea0] proc_reg_read at ffffffff811dcbde
#13 [ffff8801671c1ef0] vfs_read at ffffffff811782b5
#14 [ffff8801671c1f30] sys_read at ffffffff811783f1
#15 [ffff8801671c1f80] system_call_fastpath at ffffffff8100b072
    RIP: 0000003282eda360  RSP: 00007fff17cae0f0  RFLAGS: 00010206
    RAX: 0000000000000000  RBX: ffffffff8100b072  RCX: 0000000000a2f030
    RDX: 0000000000008000  RSI: 0000000000a27000  RDI: 0000000000000003
    RBP: 0000000000a27000   R8: 000000328318dee8   R9: 0000000000000001
    R10: 0000000000008fff  R11: 0000000000000246  R12: ffffffffffff8000
    R13: 0000000000000003  R14: 0000000000008000  R15: 0000000000000003
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b
crash> dis -l lu_site_stats_print+66
/root/lustre-release/lustre/obdclass/lu_object.c: 2121
0xffffffffa10a4752 <lu_site_stats_print+66>:    mov    (%rdi),%rdi


 Comments   
Comment by Girish Shilamkar (Inactive) [ 02/Jul/13 ]

The oops is triggered as the proc entry is hit before the related data structures used to print the stats are allocated and initialised. In client_common_fill_super() proc entry is created before cl_sb_init() and therefore lu_site is not allocated and it crashes when you try to read lu_site stats. We moved the creation of proc entry after the cl_sb_init() and it fixed the problem. We will send the patch for review today.

Comment by Gaurav Mahajan (Inactive) [ 02/Jul/13 ]

Gerrit patch link
http://review.whamcloud.com/#/c/6852

Comment by Girish Shilamkar (Inactive) [ 19/Aug/13 ]

This issue could be closed, patch was merged.

Comment by Emoly Liu [ 19/Aug/13 ]

patch landed for 2.5

Comment by Jodi Levi (Inactive) [ 15/Oct/13 ]

Added 2.5.0 FixVersion

Generated at Sat Feb 10 01:29:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.