[LU-3063] osp_sync.c:866:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 29 changes, 26 in progress, 7 in flight: -5 Created: 29/Mar/13  Updated: 12/Sep/13  Resolved: 12/Sep/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.1, Lustre 2.5.0

Type: Bug Priority: Critical
Reporter: Jinshan Xiong (Inactive) Assignee: Keith Mannthey (Inactive)
Resolution: Fixed Votes: 0
Labels: LB

Attachments: File 3063-out-v3     File odd-hang-messages     File single-3063    
Issue Links:
Duplicate
is duplicated by LU-3091 Interop 2.1.4<->2.4 failure on test s... Closed
Severity: 2
Rank (Obsolete): 7462

 Description   

LBUG happens quite often when running racer under a memory pressure environment. For example, I'm using 2G memory vmware, and set up lustre with:

OSTSIZE=$((512*1024)) REFORMAT=1 sh racer.sh

It can hit LBUG as follows:

ll_ost_io01_005: page allocation failure. order:4, mode:0x50
Pid: 22344, comm: ll_ost_io01_005 Not tainted 2.6.32-279.19.1.el6.x86_64.debug #1
Call Trace:
 [<ffffffff81139b2a>] ? __alloc_pages_nodemask+0x6aa/0xa20
 [<ffffffff81175dfe>] ? kmem_getpages+0x6e/0x170
 [<ffffffff8117884b>] ? fallback_alloc+0x1cb/0x2b0
 [<ffffffff811780a9>] ? cache_grow+0x4c9/0x530
 [<ffffffff8117852b>] ? ____cache_alloc_node+0xab/0x200
 [<ffffffff81179d08>] ? __kmalloc+0x288/0x330
 [<ffffffffa048ebb0>] ? cfs_alloc+0x30/0x60 [libcfs]
 [<ffffffffa048ebb0>] ? cfs_alloc+0x30/0x60 [libcfs]
 [<ffffffffa0e35ad8>] ? ost_io_thread_init+0x48/0x300 [ost]
 [<ffffffffa0888703>] ? ptlrpc_main+0xa3/0x1810 [ptlrpc]
 [<ffffffff810aebad>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8151f690>] ? _spin_unlock_irq+0x30/0x40
 [<ffffffff8105960d>] ? finish_task_switch+0x7d/0x110
 [<ffffffff810595d8>] ? finish_task_switch+0x48/0x110
 [<ffffffff810097dc>] ? __switch_to+0x1ac/0x320
 [<ffffffffa0888660>] ? ptlrpc_main+0x0/0x1810 [ptlrpc]
 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
 [<ffffffff8151f690>] ? _spin_unlock_irq+0x30/0x40
 [<ffffffff8100bb10>] ? restore_args+0x0/0x30
 [<ffffffffa0888660>] ? ptlrpc_main+0x0/0x1810 [ptlrpc]
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:  40
CPU    1: hi:  186, btch:  31 usd:   0
CPU    2: hi:  186, btch:  31 usd:   0
CPU    3: hi:  186, btch:  31 usd:  38
active_anon:74087 inactive_anon:150981 isolated_anon:0
 active_file:14723 inactive_file:15191 isolated_file:5
 unevictable:0 dirty:1405 writeback:0 unstable:0
 free:57272 slab_reclaimable:6236 slab_unreclaimable:76201
 mapped:1475 shmem:214401 pagetables:2281 bounce:0
Node 0 DMA free:8756kB min:332kB low:412kB high:496kB active_anon:656kB inactive_anon:1672kB active_file:68kB inactive_file:1488kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15300kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:2328kB slab_reclaimable:8kB slab_unreclaimable:3028kB kernel_stack:16kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 2004 2004 2004
Node 0 DMA32 free:231280kB min:44720kB low:55900kB high:67080kB active_anon:298048kB inactive_anon:602252kB active_file:58824kB inactive_file:59020kB unevictable:0kB isolated(anon):0kB isolated(file):20kB present:2052192kB mlocked:0kB dirty:5804kB writeback:0kB mapped:5900kB shmem:855276kB slab_reclaimable:24936kB slab_unreclaimable:287616kB kernel_stack:4200kB pagetables:9860kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 203*4kB 104*8kB 24*16kB 6*32kB 5*64kB 3*128kB 3*256kB 2*512kB 2*1024kB 1*2048kB 0*4096kB = 8812kB
Node 0 DMA32: 35286*4kB 9007*8kB 1989*16kB 87*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 251840kB
244228 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 0kB
Total swap = 0kB
524272 pages RAM
77789 pages reserved
65618 pages shared
325369 pages non-shared
LustreError: 47322:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 17112 of inode ffff880038147728 failed -28
LustreError: 39799:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 1022 of inode ffff8800598402a8 failed -28
LustreError: 44035:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 255 of inode ffff880046224268 failed -28
LustreError: 46598:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 304 of inode ffff8800203b02a8 failed -28
Buffer I/O error on device loop0, logical block 17163
lost page write due to I/O error on loop0
LustreError: 48203:0:(file.c:2707:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000401:0x6f4d:0x0] error: rc = -116
Buffer I/O error on device loop0, logical block 29955
lost page write due to I/O error on loop0
Buffer I/O error on device loop0, logical block 29956
lost page write due to I/O error on loop0
Buffer I/O error on device loop0, logical block 29960
lost page write due to I/O error on loop0
Buffer I/O error on device loop0, logical block 12699
lost page write due to I/O error on loop0
Buffer I/O error on device loop0, logical block 12723
lost page write due to I/O error on loop0
Buffer I/O error on device loop0, logical block 12783
lost page write due to I/O error on loop0
Buffer I/O error on device loop0, logical block 15429
lost page write due to I/O error on loop0
Buffer I/O error on device loop0, logical block 15430
lost page write due to I/O error on loop0
LDISKFS-fs error (device loop0): ldiskfs_find_entry: reading directory #50271 offset 0
Aborting journal on device loop0-8.
LustreError: 4993:0:(file.c:158:ll_close_inode_openhandle()) inode 144115205272531443 mdc close failed: rc = -30
LustreError: 4972:0:(llite_lib.c:1292:ll_md_setattr()) md_setattr fails: rc = -30
LustreError: 3150:0:(osd_handler.c:635:osd_trans_commit_cb()) transaction @0xffff88005eef12a0 commit error: 2
LDISKFS-fs error (device loop0): ldiskfs_journal_start_sb: Detected aborted journal
LDISKFS-fs (loop0): Remounting filesystem read-only
LustreError: 3150:0:(osd_handler.c:635:osd_trans_commit_cb()) transaction @0xffff880058789818 commit error: 2
LustreError: 3498:0:(osd_io.c:997:osd_ldiskfs_read()) lustre-MDT0000: can't read 4096@172032 on ino 110: rc = -5
LustreError: 3498:0:(llog_osd.c:562:llog_osd_next_block()) lustre-MDT0000-osd: can't read llog block from log [0x1:0xc:0x0] offset 172032: rc = -5
LustreError: 3498:0:(osp_sync.c:866:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 29 changes, 26 in progress, 7 in flight: -5
LustreError: 3498:0:(osp_sync.c:866:osp_sync_thread()) LBUG
Pid: 3498, comm: osp-syn-1

Call Trace:
 [<ffffffffa048d8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa048dec7>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0ea4a73>] osp_sync_thread+0x783/0x800 [osp]
 [<ffffffff810595d8>] ? finish_task_switch+0x48/0x110
 [<ffffffff8151f690>] ? _spin_unlock_irq+0x30/0x40
 [<ffffffff8105960d>] ? finish_task_switch+0x7d/0x110
 [<ffffffff810595d8>] ? finish_task_switch+0x48/0x110
 [<ffffffffa0ea42f0>] ? osp_sync_thread+0x0/0x800 [osp]
LustreError: 3496:0:(osd_io.c:997:osd_ldiskfs_read()) lustre-MDT0000: can't read 4096@180224 on ino 108: rc = -5
LustreError: 3496:0:(llog_osd.c:562:llog_osd_next_block()) lustre-MDT0000-osd: can't read llog block from log [0x1:0xa:0x0] offset 180224: rc = -5
LustreError: 3496:0:(osp_sync.c:866:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 14 changes, 30 in progress, 6 in flight: -5
LustreError: 3496:0:(osp_sync.c:866:osp_sync_thread()) LBUG
Pid: 3496, comm: osp-syn-0

Call Trace:
 [<ffffffffa048d8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa048dec7>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0ea4a73>] osp_sync_thread+0x783/0x800 [osp]
 [<ffffffff810595d8>] ? finish_task_switch+0x48/0x110
 [<ffffffff8151f690>] ? _spin_unlock_irq+0x30/0x40
 [<ffffffff8105960d>] ? finish_task_switch+0x7d/0x110
 [<ffffffff810595d8>] ? finish_task_switch+0x48/0x110
 [<ffffffffa0ea42f0>] ? osp_sync_thread+0x0/0x800 [osp]
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8151f690>] ? _spin_unlock_irq+0x30/0x40
 [<ffffffff8100bb10>] ? restore_args+0x0/0x30
 [<ffffffffa0ea42f0>] ? osp_sync_thread+0x0/0x800 [osp]
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20

LustreError: 50716:0:(file.c:158:ll_close_inode_openhandle()) inode 144115205255753869 mdc close failed: rc = -30
LustreError: 50716:0:(file.c:158:ll_close_inode_openhandle()) Skipped 1 previous similar message
LDISKFS-fs (loop0): Remounting filesystem read-only
Kernel panic - not syncing: LBUG
Pid: 3496, comm: osp-syn-0 Not tainted 2.6.32-279.19.1.el6.x86_64.debug #1
Call Trace:
 [<ffffffff8151baba>] ? panic+0xa0/0x16d
 [<ffffffffa048df1b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
 [<ffffffffa0ea4a73>] ? osp_sync_thread+0x783/0x800 [osp]
 [<ffffffff810595d8>] ? finish_task_switch+0x48/0x110
 [<ffffffff8151f690>] ? _spin_unlock_irq+0x30/0x40
 [<ffffffff8105960d>] ? finish_task_switch+0x7d/0x110
 [<ffffffff810595d8>] ? finish_task_switch+0x48/0x110
 [<ffffffffa0ea42f0>] ? osp_sync_thread+0x0/0x800 [osp]
 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
 [<ffffffff8151f690>] ? _spin_unlock_irq+0x30/0x40
 [<ffffffff8100bb10>] ? restore_args+0x0/0x30
 [<ffffffffa0ea42f0>] ? osp_sync_thread+0x0/0x800 [osp]
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20


 Comments   
Comment by Oleg Drokin [ 29/Mar/13 ]

so this stems from this:

struct ost_thread_local_cache {
        /*
         * pool of nio buffers used by write-path
         */
        struct niobuf_local   local[OST_THREAD_POOL_SIZE];
        unsigned int          temporary:1;
};

and OST_THREAD_POOL_SIZE is defined as PTLRPC_MAX_BRW_PAGES
We probably should do something about reducing this allocation too?

Comment by Keith Mannthey (Inactive) [ 29/Mar/13 ]
ll_ost_io01_005: page allocation failure. order:4, mode:0x50

It fails a page allocation at the kernel level.

Node 0 DMA: 203*4kB 104*8kB 24*16kB 6*32kB 5*64kB 3*128kB 3*256kB 2*512kB 2*1024kB 1*2048kB 0*4096kB = 8812kB
Node 0 DMA32: 35286*4kB 9007*8kB 1989*16kB 87*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 251840kB

Higher order pages are scarce for both zones. (order 4 is 16 pages 64k) The kernel may not willing give up it's last few pages for anything less than the memory reclaim processes.

This is memory fragmentation, the longer you run a system the less likely higher order allocations will work. Perhaps racers needs a mim memory configuration to run properly, I don't know much about that test.

We might be a little concerned about the lbug after we mount the FS readonly, it is not the softest landing for a memory allocation error.

Comment by Oleg Drokin [ 01/Apr/13 ]

We just need to convert the allocation to ALLOC_LARGE for the short term I imagine, probably not all that easy to reduce this.

The reason for the failure is due to racer puttign pressure on OST threads and ptlrpc deciding to spawn a few more threads, but those threads need some tls cache to handle IO, and we fail allocating it, sp. now that 4M rpc increases the required allocation size.

Comment by Alex Zhuravlev [ 01/Apr/13 ]

Keith, you're right about handling of course. I think in this case the approach should be "relax for a moment, try again". we can repeat as long as we need, in the worst case OST(s) will get full until we get some memory to handle llog and send RPCs.

Comment by Keith Mannthey (Inactive) [ 08/May/13 ]

I just wanted to note I am digging into this issue a bit and pulled fresh code today. I am able to reproduce panics with 2gb VM and 4GB Swap. I use memhog to trigger the order 4 allocations (memhog 5200M I think) and then wait around. I just setup serial to capture the full panics to see if they are the same as reported and the panic calls don't alwasy make it to disk.

My goal is to allow the system to fail the order 4 allocations without causing any panics on the system.

Comment by Keith Mannthey (Inactive) [ 09/May/13 ]

Well I hit this other fun thing with memory pressure running a bit more than the original issue:

The occurens during a unmount/mount cycles and normally before order 4 memory operations start to fail. The mount/umount happens every 300 seconds or so as per the standarnd racer run.

SLAB: cache with size 256 has lost its name
SLAB: cache with size 40 has lost its name
SLAB: cache with size 192 has lost its name
SLAB: cache with size 1216 has lost its name
SLAB: cache with size 256 has lost its name
SLAB: cache with size 40 has lost its name
SLAB: cache with size 192 has lost its name
SLAB: cache with size 1216 has lost its name
SLAB: cache with size 256 has lost its name
SLAB: cache with size 40 has lost its name
SLAB: cache with size 192 has lost its name
SLAB: cache with size 1216 has lost its name
SLAB: cache with size 256 has lost its name
SLAB: cache with size 40 has lost its name
SLAB: cache with size 192 has lost its name
SLAB: cache with size 1216 has lost its name
LustreError: 3320:0:(lprocfs_status.c:489:lprocfs_register())  Lproc: Attempting to register llite more than once 
SLAB: cache with size 256 has lost its name
SLAB: cache with size 40 has lost its name
SLAB: cache with size 192 has lost its name
SLAB: cache with size 1216 has lost its name
SLAB: cache with size 256 has lost its name
SLAB: cache with size 40 has lost its name
SLAB: cache with size 192 has lost its name
SLAB: cache with size 1216 has lost its name
SLAB: cache with size 256 has lost its name
SLAB: cache with size 40 has lost its name
SLAB: cache with size 192 has lost its name
SLAB: cache with size 1216 has lost its name
SLAB: cache with size 256 has lost its name
SLAB: cache with size 40 has lost its name
SLAB: cache with size 192 has lost its name
SLAB: cache with size 1216 has lost its name
SLAB: cache with size 256 has lost its name
SLAB: cache with size 40 has lost its name
SLAB: cache with size 192 has lost its name
SLAB: cache with size 1216 has lost its name
SLAB: cache with size 256 has lost its name
SLAB: cache with size 40 has lost its name
SLAB: cache with size 192 has lost its name
SLAB: cache with size 1216 has lost its name
SLAB: cache with size 256 has lost its name
SLAB: cache with size 40 has lost its name
SLAB: cache with size 192 has lost its name
SLAB: cache with size 1216 has lost its name
LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
BUG: unable to handle kernel paging request at ffffffffffffffde
IP: [<ffffffffa05a8498>] lprocfs_srch+0x48/0x80 [obdclass]
PGD 1a87067 PUD 1a88067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/possible
CPU 2 
Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) osd_ldiskfs(U) fsfilt_ldiskfs(U) ldiskfs(U) mdd(U) mgs(U) lquota(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) exportfs jbd ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport microcode i2c_piix4 i2c_core sg snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom ahci pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: lvfs]

Pid: 4433, comm: mount.lustre Not tainted 2.6.32.may03master #1 innotek GmbH VirtualBox
RIP: 0010:[<ffffffffa05a8498>]  [<ffffffffa05a8498>] lprocfs_srch+0x48/0x80 [obdclass]
RSP: 0000:ffff8800425aba58  EFLAGS: 00010286
RAX: ffffffffa0630ac0 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8800425abb68 RDI: ffffffffa0630ac0
RBP: ffff8800425aba78 R08: 00000000fffffffe R09: 0000000000000000
R10: 000000000000000f R11: 000000000000000f R12: ffff8800425abb68
R13: ffffffffffffff8e R14: 0000000000000000 R15: ffff8800425abb68
FS:  00007f3789752700(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffffffffde CR3: 000000001bcb7000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mount.lustre (pid: 4433, threadinfo ffff8800425aa000, task ffff8800650c1540)
Stack:
 ffffffffffffff8e ffff8800425abb68 ffffffffffffff8e 0000000000000000
<d> ffff8800425abac8 ffffffffa05aa3e4 ffff8800425aba98 ffffffff81281356
<d> 0000000000000004 0000000affffffff ffff88000017c000 0000000000000006
Call Trace:
 [<ffffffffa05aa3e4>] lprocfs_register+0x34/0x100 [obdclass]
 [<ffffffff81281356>] ? vsnprintf+0x336/0x5e0
 [<ffffffffa188eb7e>] lprocfs_register_mountpoint+0x12e/0xb20 [lustre]
 [<ffffffffa187e3b6>] client_common_fill_super+0x1a6/0x5280 [lustre]
 [<ffffffff81281640>] ? sprintf+0x40/0x50
 [<ffffffffa1884204>] ll_fill_super+0xd74/0x1500 [lustre]
 [<ffffffffa05d557d>] lustre_fill_super+0x3dd/0x530 [obdclass]
 [<ffffffffa05d51a0>] ? lustre_fill_super+0x0/0x530 [obdclass]
 [<ffffffff811841df>] get_sb_nodev+0x5f/0xa0
 [<ffffffffa05ccde5>] lustre_get_sb+0x25/0x30 [obdclass]
 [<ffffffff8118381b>] vfs_kern_mount+0x7b/0x1b0
 [<ffffffff811839c2>] do_kern_mount+0x52/0x130
 [<ffffffff811a3c72>] do_mount+0x2d2/0x8d0
 [<ffffffff811a4300>] sys_mount+0x90/0xe0
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: a0 49 89 f4 e8 3a 72 f6 e0 4d 85 ed 75 1d 48 c7 c7 c0 0a 63 a0 e8 c9 3c af e0 48 83 c4 08 48 89 d8 5b 41 5c 41 5d c9 c3 0f 1f 00 <49> 8b 5d 50 48 85 db 75 10 eb d8 0f 1f 44 00 00 48 8b 5b 40 48 
RIP  [<ffffffffa05a8498>] lprocfs_srch+0x48/0x80 [obdclass]
Comment by Keith Mannthey (Inactive) [ 15/May/13 ]

LU-3331 was opened to track the lproc crash. It looks like there is a patch for that hang.

Today testing the lproc patch I encountered:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [<ffffffff8116823c>] kmem_cache_alloc+0x6c/0x190
PGD 3ac6b067 PUD 608bc067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/module/ptlrpc/initstate
CPU 3 
Modules linked in: obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) exportfs jbd ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport microcode i2c_piix4 i2c_core snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom ahci pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: lvfs]

Pid: 10558, comm: ptlrpcd_rcv Not tainted 2.6.32.may03master #1 innotek GmbH VirtualBox
RIP: 0010:[<ffffffff8116823c>]  [<ffffffff8116823c>] kmem_cache_alloc+0x6c/0x190
RSP: 0018:ffff880000ebfd10  EFLAGS: 00010046
RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000050 RDI: 0000000000000000
RBP: ffff880000ebfd40 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000050
R13: 0000000000000000 R14: 0000000000000050 R15: 0000000000000246
FS:  00007ff97f825700(0000) GS:ffff880002380000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 000000002ca45000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ptlrpcd_rcv (pid: 10558, threadinfo ffff880000ebe000, task ffff88003af34aa0)
Stack:
 ffff880000ebfd40 0000000000000000 ffff880000ebfea0 ffffffffa08ef840
<d> 0000000000000000 ffff880000ebfee0 ffff880000ebfd60 ffffffffa034ca72
<d> ffff880000ebfd60 0000000000000040 ffff880000ebfd80 ffffffffa08c243a
Call Trace:
 [<ffffffffa034ca72>] cfs_mem_cache_alloc+0x22/0x30 [libcfs]
 [<ffffffffa08c243a>] osc_key_init+0x3a/0x200 [osc]
 [<ffffffffa04e11ef>] keys_fill+0x6f/0x190 [obdclass]
 [<ffffffffa04e132d>] lu_context_refill+0x1d/0x30 [obdclass]
 [<ffffffffa04e1356>] lu_env_refill+0x16/0x30 [obdclass]
 [<ffffffffa06c773c>] ptlrpcd_check+0x4c/0x560 [ptlrpc]
 [<ffffffff8109703c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa06c80c0>] ptlrpcd+0x1a0/0x390 [ptlrpc]
 [<ffffffff810dc565>] ? __audit_syscall_exit+0x265/0x290
 [<ffffffff81063310>] ? default_wake_function+0x0/0x20
 [<ffffffffa06c7f20>] ? ptlrpcd+0x0/0x390 [ptlrpc]
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffffa06c7f20>] ? ptlrpcd+0x0/0x390 [ptlrpc]
 [<ffffffffa06c7f20>] ? ptlrpcd+0x0/0x390 [ptlrpc]
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: 00 49 89 c7 fa 66 0f 1f 44 00 00 65 48 8b 04 25 c0 cb 00 00 f7 40 14 00 00 00 12 0f 85 88 00 00 00 65 8b 04 25 b8 e0 00 00 48 98 <49> 8b 44 c5 00 44 8b 00 45 85 c0 0f 84 d7 00 00 00 8b 10 c7 40 
RIP  [<ffffffff8116823c>] kmem_cache_alloc+0x6c/0x190

I will attach the full serial log shortly.

Comment by Keith Mannthey (Inactive) [ 15/May/13 ]

Racer 2gb kmem_cache_alloc crash -v1 serial out log.

Comment by Keith Mannthey (Inactive) [ 15/May/13 ]

With the lastest error it seems that osc_init is not checking creating of the osc_caches. I am working with a test patch in this area to fail osc_init in the case where lu_kmem_init fails. If it helps I will post it for review.

Comment by Keith Mannthey (Inactive) [ 16/May/13 ]

Well I haven not hit the kmam_cache_alloc crash in the last while. I hit a 2nd lproc (documented in lu-3331) issues yesterday and today is a bad situaion.

The system has not crashed but my root file system has stopped responding. All io is hanging on the system.

My last terminal was fine until I interacted with the root filesystem.

Sorry not a lot of debug but this is what is happening on the serial console:
Likely an enomem mount failure loop then grants fails then this mds_connect failed endless (still alive and printing)

LustreError: 15631:0:(service.c:844:ptlrpc_register_service()) Skipped 1 previous similar message
LustreError: 15631:0:(ost_handler.c:2677:ost_setup()) failed to start OST I/O service: -12
LustreError: 15631:0:(ost_handler.c:2677:ost_setup()) Skipped 1 previous similar message
LustreError: 15631:0:(obd_config.c:572:class_setup()) setup OSS failed (-12)
LustreError: 15631:0:(obd_config.c:572:class_setup()) Skipped 1 previous similar message
LustreError: 15631:0:(obd_mount.c:201:lustre_start_simple()) OSS setup error -12
LustreError: 15631:0:(obd_mount.c:201:lustre_start_simple()) Skipped 1 previous similar message
LustreError: 15631:0:(obd_mount_server.c:1225:server_start_targets()) failed to start OSS: -12
LustreError: 15631:0:(obd_mount_server.c:1225:server_start_targets()) Skipped 1 previous similar message
LustreError: 15631:0:(obd_mount_server.c:1699:server_fill_super()) Unable to start targets: -12
LustreError: 15631:0:(obd_mount_server.c:1699:server_fill_super()) Skipped 1 previous similar message
LustreError: 15631:0:(obd_mount_server.c:844:lustre_disconnect_lwp()) lustre-MDT0000-lwp-OST0000: Can't end config log lustre-client.
LustreError: 15631:0:(obd_mount_server.c:844:lustre_disconnect_lwp()) Skipped 1 previous similar message
LustreError: 15631:0:(obd_mount_server.c:1426:server_put_super()) lustre-OST0000: failed to disconnect lwp. (rc=-2)
LustreError: 15631:0:(obd_mount_server.c:1426:server_put_super()) Skipped 1 previous similar message
LustreError: 15631:0:(obd_mount_server.c:1456:server_put_super()) no obd lustre-OST0000
LustreError: 15631:0:(obd_mount_server.c:1456:server_put_super()) Skipped 1 previous similar message
LustreError: 15631:0:(obd_mount_server.c:135:server_deregister_mount()) lustre-OST0000 not registered
LustreError: 15631:0:(obd_mount_server.c:135:server_deregister_mount()) Skipped 1 previous similar message
LustreError: 15631:0:(obd_mount.c:1267:lustre_fill_super()) Unable to mount  (-12)
LustreError: 15631:0:(obd_mount.c:1267:lustre_fill_super()) Skipped 1 previous similar message
LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
LustreError: Skipped 1 previous similar message
LustreError: 24257:0:(file.c:2718:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000400:0x269:0x0] error: rc = -116
LustreError: 24264:0:(file.c:2718:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000400:0x269:0x0] error: rc = -116
Out of memory: Kill process 17806 (memhog) score 890 or sacrifice child
Killed process 17806, UID 0, (memhog) total-vm:5503100kB, anon-rss:1446724kB, file-rss:16kB
LustreError: 19093:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0001: cli 77404d61-8728-23ef-e160-d009880a0b48/ffff880057434000 left 66191360 < tot_grant 68330240 unstable 0 pending 0
LustreError: 22611:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0001: cli f6b05f1d-8d84-085e-aff5-8b8b01874ede/ffff880065281400 left 66191360 < tot_grant 68330240 unstable 0 pending 98304
LustreError: 4974:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0001: cli f6b05f1d-8d84-085e-aff5-8b8b01874ede/ffff880065281400 left 60882944 < tot_grant 64422656 unstable 12288 pending 12288
LustreError: 4974:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 82 previous similar messages
LustreError: 30176:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0001: cli f6b05f1d-8d84-085e-aff5-8b8b01874ede/ffff880065281400 left 56311808 < tot_grant 58970880 unstable 0 pending 0
LustreError: 30176:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 84 previous similar messages
LustreError: 25853:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0001: cli 77404d61-8728-23ef-e160-d009880a0b48/ffff880057434000 left 43249664 < tot_grant 48173824 unstable 0 pending 0
LustreError: 25853:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 119 previous similar messages
LustreError: 25853:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0001: cli 77404d61-8728-23ef-e160-d009880a0b48/ffff880057434000 left 27381760 < tot_grant 27546368 unstable 81920 pending 0
LustreError: 25853:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 388 previous similar messages
LustreError: 20014:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 42946 of inode ffff880053550b38 failed -28
LustreError: 5617:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 35089 of inode ffff88001874dbf8 failed -28
LustreError: 19936:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0001: cli f6b05f1d-8d84-085e-aff5-8b8b01874ede/ffff880065281400 left 0 < tot_grant 537344 unstable 0 pending 0
LustreError: 19936:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 575 previous similar messages
INFO: task mdt_rdpg00_000:18958 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task dir_create.sh:19649 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task ls:25366 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task ls:25367 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task ls:25368 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task ls:25369 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task ls:25370 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task ls:25371 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task ls:25372 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task ls:25373 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
LustreError: dumping log to /tmp/lustre-log.1368691592.18956
LustreError: 11-0: lustre-MDT0000-mdc-ffff88007cd05000: Communicating with 0@lo, operation mds_connect failed with -16.
LustreError: Skipped 1 previous similar message
LustreError: 11-0: lustre-MDT0000-mdc-ffff88007cd05000: Communicating with 0@lo, operation mds_connect failed with -16.
LustreError: Skipped 3 previous similar messages

Outside of the oom killer hitting memhog there are no Linux messages. I can't kill memhog or the racer test (stalled in the kernel somewhere).

I am working to at least get process stacks from the kernel.

Comment by Keith Mannthey (Inactive) [ 16/May/13 ]

Well the vm was non-responsove. I am setting out better RAS in my VM...

I will attach the var log messages from the boot.

Comment by Keith Mannthey (Inactive) [ 16/May/13 ]

System was up but rootfs io hung. No real into outside of what is in this log.

Comment by Keith Mannthey (Inactive) [ 23/May/13 ]

AFter a little lproc issue I have good data from this odd io hang. All userspace process were hung but some part of the kernel were running just fine.

I am still looking at all the data (full logs and vmcore) and hope to have a better idea of what happened tomorrow.

Comment by Keith Mannthey (Inactive) [ 23/May/13 ]

Full console from odd hang. There is a sysrq dump of the memory state and all the processes at the end.

Comment by Keith Mannthey (Inactive) [ 23/May/13 ]

So a it seems general processes are all in TASK_UNINTERRUPTIBLE hanging out for a common mutex.

The mutex is held by "crond" (doing who knows what) and it looks like

PID: 1433   TASK: ffff8800375c2ae0  CPU: 2   COMMAND: "crond"
 #0 [ffff88007d4dd9b8] schedule at ffffffff8150da92
 #1 [ffff88007d4dda80] __wait_on_freeing_inode at ffffffff8119c8c8
 #2 [ffff88007d4ddaf0] find_inode_fast at ffffffff8119c948
 #3 [ffff88007d4ddb20] ifind_fast at ffffffff8119daac
 #4 [ffff88007d4ddb50] iget_locked at ffffffff8119dd49
 #5 [ffff88007d4ddb90] ext4_iget at ffffffffa00baf27 [ext4]
 #6 [ffff88007d4ddc00] ext4_lookup at ffffffffa00c1bb5 [ext4]
 #7 [ffff88007d4ddc40] do_lookup at ffffffff81190465
 #8 [ffff88007d4ddca0] __link_path_walk at ffffffff81190c24
 #9 [ffff88007d4ddd60] path_walk at ffffffff811917aa
#10 [ffff88007d4ddda0] do_path_lookup at ffffffff8119197b
#11 [ffff88007d4dddd0] user_path_at at ffffffff81192607
#12 [ffff88007d4ddea0] vfs_fstatat at ffffffff81186a1c
#13 [ffff88007d4ddee0] vfs_stat at ffffffff81186b8b
#14 [ffff88007d4ddef0] sys_newstat at ffffffff81186bb4
#15 [ffff88007d4ddf80] system_call_fastpath at ffffffff8100b072

All other process don't make it past do_path_loopup or have no filesystem usage.

I am now looking into why crond is stuck where it is.

Comment by Keith Mannthey (Inactive) [ 24/May/13 ]

Hmm, Crash didn't want to show me the inode that was being waited on (I expected to to be very close to __wait_on_freeing_inode(inode) but I did not see it there, I hope it is user error not inode corruption. I looked around at other tasks.

An example racer task:

PID: 23088  TASK: ffff880037598aa0  CPU: 1   COMMAND: "ls"
 #0 [ffff8800179ef7d8] schedule at ffffffff8150da92
 #1 [ffff8800179ef8a0] __mutex_lock_slowpath at ffffffff8150f13e
 #2 [ffff8800179ef910] mutex_lock at ffffffff8150efdb
 #3 [ffff8800179ef930] mdc_close at ffffffffa0a7064b [mdc]
 #4 [ffff8800179ef980] lmv_close at ffffffffa0a3dcd8 [lmv]
 #5 [ffff8800179ef9d0] ll_close_inode_openhandle at ffffffffa14edd0e [lustre]
 #6 [ffff8800179efa50] ll_release_openhandle at ffffffffa14f2e11 [lustre]
 #7 [ffff8800179efa80] ll_file_open at ffffffffa14f53c7 [lustre]
 #8 [ffff8800179efb70] ll_dir_open at ffffffffa14d6e1b [lustre]
 #9 [ffff8800179efb90] __dentry_open at ffffffff8117e0ca
#10 [ffff8800179efbf0] lookup_instantiate_filp at ffffffff8117e4b9
#11 [ffff8800179efc10] ll_revalidate_nd at ffffffffa14d5a8a [lustre]
#12 [ffff8800179efc40] do_lookup at ffffffff81190326
#13 [ffff8800179efca0] __link_path_walk at ffffffff81190c24
#14 [ffff8800179efd60] path_walk at ffffffff811917aa
#15 [ffff8800179efda0] do_path_lookup at ffffffff8119197b
#16 [ffff8800179efdd0] do_filp_open at ffffffff811928bb
#17 [ffff8800179eff20] do_sys_open at ffffffff8117de79
#18 [ffff8800179eff70] sys_open at ffffffff8117df90
#19 [ffff8800179eff80] system_call_fastpath at ffffffff8100b072
    RIP: 00000035d68dac10  RSP: 00007fff4a55b8f8  RFLAGS: 00010202
    RAX: 0000000000000002  RBX: ffffffff8100b072  RCX: 0000000000000020
    RDX: 0000000000000001  RSI: 0000000000090800  RDI: 0000000000f74fd0
    RBP: 00007fd577d776a0   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000f74bf0  R11: 0000000000000246  R12: ffffffff8117df90
    R13: ffff8800179eff78  R14: 0000000000000000  R15: 0000000000f74fd0
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

I am going to turn on Mutex debugging to speed things up a bit and retest.

Comment by Keith Mannthey (Inactive) [ 01/Jun/13 ]

Well after sync with Master a few days ago I have not crashed.

I am running 2 lproc patches and http://review.whamcloud.com/6514 .

I don't seem to get the same IO errors and I have not seen the original Assertion but I will keep testing.

Comment by Diego Moreno (Inactive) [ 01/Aug/13 ]

Just to let you know that I'm hitting this one on 2.3.63. And I guess this one is also present on 2.4.0.

I see the patch has been integrated in master but not in 2.4. I think it could be interesting to see this one integrated in a future 2.4.1 version.

Comment by Sebastien Buisson (Inactive) [ 20/Aug/13 ]

Hi,

I have pushed the b2_4 version of the patch here:
http://review.whamcloud.com/7388

TIA,
Sebastien.

Comment by Peter Jones [ 12/Sep/13 ]

Landed for 2.4.1 and 2.5.0

Generated at Sat Feb 10 01:30:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.