[LU-2820] Crash in lmv_readpage Created: 15/Feb/13  Updated: 09/Jan/20  Resolved: 09/Jan/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 6825

 Description   

Just had a dbench fail in lmv_readpage:

#0  lu_dirent_next (ent=0xffff880025404387)
    at /home/green/git/lustre-release/lustre/include/lustre/lustre_idl.h:957
#1  lmv_readpage (exp=<optimized out>, op_data=<optimized out>, 
    pages=<optimized out>, request=<optimized out>)
    at /home/green/git/lustre-release/lustre/lmv/lmv_obd.c:1955
#2  0xffffffffa0dd4bc0 in md_readpage (request=0xffff880055041ba0, 
    pages=0xffff88008e7d87f0, opdata=0xffff88003fb9edf0, exp=0xffff8800900b6bf0)
    at /home/green/git/lustre-release/lustre/include/obd_class.h:2052
#3  ll_dir_filler (_hash=<optimized out>, page0=0xffffea0000825f90)
    at /home/green/git/lustre-release/lustre/llite/dir.c:188
#4  0xffffffff811142db in __read_cache_page (gfp=<optimized out>, 
    data=<optimized out>, filler=<optimized out>, index=<optimized out>, 
    mapping=<optimized out>) at mm/filemap.c:1771
#5  do_read_cache_page (mapping=0xffff88005461bc70, index=18446744073709551615, 
    filler=0xffffffffa0dd4920 <ll_dir_filler>, data=0xffff880055041d60, 
    gfp=<optimized out>) at mm/filemap.c:1791
#6  0xffffffff8111443c in read_cache_page_async (mapping=<optimized out>, 
    index=<optimized out>, filler=<optimized out>, data=<optimized out>)
    at mm/filemap.c:1837
#7  0xffffffff8111444e in read_cache_page (mapping=<optimized out>, 
    index=<optimized out>, filler=<optimized out>, data=<optimized out>)
    at mm/filemap.c:1894
#8  0xffffffffa0dd276d in ll_get_dir_page (dir=0xffff88005461bb08, hash=0,     chain=<optimized out>)
    at /home/green/git/lustre-release/lustre/llite/dir.c:417
#9  0xffffffffa0dd3387 in ll_dir_read (inode=0xffff88005461bb08, 
    _pos=0xffff880055041ea0, cookie=0xffff880055041f38, 
    filldir=0xffffffff8118f230 <filldir>)
    at /home/green/git/lustre-release/lustre/llite/dir.c:492
#10 0xffffffffa0dd3749 in ll_readdir (filp=0xffff8800824d3f08, 
    cookie=0xffff880055041f38, filldir=0xffffffff8118f230 <filldir>)
    at /home/green/git/lustre-release/lustre/llite/dir.c:616
#11 0xffffffff8118f4c0 in vfs_readdir (file=0xffff8800824d3f08, 
    filler=0xffffffff8118f230 <filldir>, buf=0xffff880055041f38)
    at fs/readdir.c:39
#12 0xffffffff8118f6b9 in sys_getdents (fd=<optimized out>, dirent=0xbcc068, 
    count=32768) at fs/readdir.c:213
#13 0xffffffff8100b0f2 in system_call_fastpath ()
    at arch/x86/kernel/entry_64.S:488
#14 0x0000000000000246 in per_cpu__irq_stack_union ()
Cannot access memory at address 0xffffffffffffffb0

The reason it failed is because dir entry is outside of mapped area:

(gdb) p tmp
$1 = (struct lu_dirent *) 0xffff880025404387
(gdb) p ent
$2 = (struct lu_dirent *) 0xffff880025404387
(gdb) p ent->lde_reclen
Cannot access memory at address 0xffff88002540439f <-- Not mapped
(gdb) p dp
$6 = (struct lu_dirpage *) 0xffff8800253fd000
(gdb) p *dp
$8 = {ldp_hash_start = 21410096, ldp_hash_end = 21410112, ldp_flags = 21337584, 
  ldp_pad0 = 0, ldp_entries = 0xffff8800253fd018}
(gdb) p ent
$9 = (struct lu_dirent *) 0xffff880025404387
(gdb) p 0xffff880025404387-0xffff8800253fd018
$10 = 29551
(gdb) p LDF_EMPTY
$11 = LDF_EMPTY
(gdb) p dp->ldp_flags & LDF_EMPTY
$12 = 0
(gdb) p ((struct lu_dirent *)0xffff8800253fd018)->lde_reclen
$13 = 29551

So the directory entry claims it is 29k long which does not make too much sense to me.

There was a bunch of watchdog firings on OSTs right before this happened, but I don't think it's related.



 Comments   
Comment by Oleg Drokin [ 15/Feb/13 ]

I also have a crashdump if needed

Comment by Andreas Dilger [ 09/Jan/20 ]

Close old bug

Generated at Sat Feb 10 01:28:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.