Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.7.0
-
Hyperon
-
3
-
17581
Description
While performing failover testing:
LustreError: 167-0: lustre-OST0000-osc-ffff88083b339000: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
Feb 23 09:22:48 iwc260 kernel: LustreError: 51585:0:(ldlm_resource.c:777:ldlm_resource_complain()) lustre-OST0000-osc-ffff88083b339000: namespace resource [0x2d097bb:0x0:0x0].0 (ffff88106a992e40) refcount nonzero (1) after lock cleanup; forcing cleanup.
Feb 23 09:22:48 iwc260 kernel: LustreError: 51585:0:(ldlm_resource.c:1374:ldlm_resource_dump()) --- Resource: [0x2d097bb:0x0:0x0].0 (ffff88106a992e40) refcount = 3
Feb 23 09:22:48 iwc260 kernel: LustreError: 51585:0:(ldlm_resource.c:1377:ldlm_resource_dump()) Granted locks (in reverse order):
Feb 23 09:22:48 iwc260 kernel: LustreError: 51585:0:(ldlm_resource.c:1380:ldlm_resource_dump()) ### ### ns: lustre-OST0000-osc-ffff88083b339000 lock: ffff88106a60d540/0xa1f402a8812988f4 lrc: 3/0,1 mode: PW/PW res: [0x2d097bb:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->8191) flags: 0x126400020000 nid: local remote: 0x72a9f4c80e66e07b expref: -99 pid: 51570 timeout: 0 lvb_type: 1
Feb 23 09:22:48 iwc260 kernel: Lustre: lustre-OST0000-osc-ffff88083b339000: Connection restored to lustre-OST0000 (at 192.168.120.13@o2ib)
Feb 23 09:22:49 iwc260 kernel: LustreError: 90252:0:(osc_cache.c:3150:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageDirty(cl_page_vmpage(page)))) ) failed:
Feb 23 09:22:49 iwc260 kernel: LustreError: 90252:0:(osc_cache.c:3150:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageDirty(cl_page_vmpage(page)))) ) failed:
Feb 23 09:22:49 iwc260 kernel: LustreError: 90252:0:(osc_cache.c:3150:discard_cb()) LBUG
Feb 23 09:22:49 iwc260 kernel: LustreError: 90252:0:(osc_cache.c:3150:discard_cb()) LBUG
Feb 23 09:22:49 iwc260 kernel: Pid: 90252, comm: ldlm_bl_36
Feb 23 09:22:49 iwc260 kernel:
Feb 23 09:22:49 iwc260 kernel: Call Trace:
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0435895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0435e97>] lbug_with_loc+0x47/0xb0 [libcfs]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2ba56>] discard_cb+0x156/0x190 [osc]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2bdcc>] osc_page_gang_lookup+0x1ac/0x330 [osc]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2b900>] ? discard_cb+0x0/0x190 [osc]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2c094>] osc_lock_discard_pages+0x144/0x240 [osc]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa04461c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2b900>] ? discard_cb+0x0/0x190 [osc]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2298b>] osc_lock_flush+0x8b/0x260 [osc]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b22e08>] osc_ldlm_blocking_ast+0x2a8/0x3c0 [osc]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0761a6c>] ldlm_cancel_callback+0x6c/0x170 [ptlrpc]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa077450a>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0779120>] ldlm_cli_cancel+0x60/0x360 [ptlrpc]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b22c3b>] osc_ldlm_blocking_ast+0xdb/0x3c0 [osc]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa04461c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa077cb60>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc]
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa077d0c1>] ldlm_bl_thread_main+0x291/0x3f0 [ptlrpc]
Feb 23 09:22:49 iwc260 kernel: [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
Feb 23 09:22:49 iwc260 kernel: [<ffffffffa077ce30>] ? ldlm_bl_thread_main+0x0/0x3f0 [ptlrpc]
Feb 23 09:22:49 iwc260 kernel: [<ffffffff8109abf6>] kthread+0x96/0xa0
Feb 23 09:22:49 iwc260 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Feb 23 09:22:49 iwc260 kernel: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
Feb 23 09:22:49 iwc260 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Attachments
- iwc189.lbug.log.txt.gz
- 0.2 kB
- iwc260.log.gz
- 4.81 MB
- l-23.LU-6271.txt.gz
- 0.2 kB
- lustre-log.iwc37.txt
- 329 kB
- r2i1n4.messages.gz
- 96 kB
- server_messages.tar.gz
- 924 kB
Issue Links
Activity
OK I've been running the scripts with patchset 5 of 16456 for some time, seems the general IO is fine.
But when I tried the group lock, the client crashed immediately after eviction:
<0>LustreError: 1697:0:(osc_cache.c:2907:osc_cache_writeback_range()) ASSERTION( !ext->oe_hp ) failed: <0>LustreError: 1697:0:(osc_cache.c:2907:osc_cache_writeback_range()) LBUG <4>Pid: 1697, comm: ldlm_bl_11 <4> <4>Call Trace: <4> [<ffffffffa021d875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa021de77>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0922085>] osc_cache_writeback_range+0x1275/0x1280 [osc] <4> [<ffffffffa090b545>] osc_lock_flush+0x175/0x260 [osc] <4> [<ffffffffa090b8d8>] osc_ldlm_blocking_ast+0x2a8/0x3c0 [osc] <4> [<ffffffffa050ba57>] ldlm_cancel_callback+0x87/0x280 [ptlrpc] <4> [<ffffffff81060530>] ? __dequeue_entity+0x30/0x50 <4> [<ffffffff8100969d>] ? __switch_to+0x7d/0x340 <4> [<ffffffffa051e84a>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc] <4> [<ffffffffa05234bc>] ldlm_cli_cancel+0x9c/0x3e0 [ptlrpc] <4> [<ffffffffa090b70b>] osc_ldlm_blocking_ast+0xdb/0x3c0 [osc] <4> [<ffffffff810672c2>] ? default_wake_function+0x12/0x20 <4> [<ffffffffa0527400>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc] <4> [<ffffffffa0527ee4>] ldlm_bl_thread_main+0x484/0x700 [ptlrpc] <4> [<ffffffff810672b0>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa0527a60>] ? ldlm_bl_thread_main+0x0/0x700 [ptlrpc] <4> [<ffffffff810a101e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] child_rip+0xa/0x20 <4> [<ffffffff810a0f80>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 1697, comm: ldlm_bl_11 Not tainted 2.6.32-573.3.1.el6.x86_64 #1 <4>Call Trace: <4> [<ffffffff81537c54>] ? panic+0xa7/0x16f <4> [<ffffffffa021decb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa0922085>] ? osc_cache_writeback_range+0x1275/0x1280 [osc] <4> [<ffffffffa090b545>] ? osc_lock_flush+0x175/0x260 [osc] <4> [<ffffffffa090b8d8>] ? osc_ldlm_blocking_ast+0x2a8/0x3c0 [osc] <4> [<ffffffffa050ba57>] ? ldlm_cancel_callback+0x87/0x280 [ptlrpc] <4> [<ffffffff81060530>] ? __dequeue_entity+0x30/0x50 <4> [<ffffffff8100969d>] ? __switch_to+0x7d/0x340 <4> [<ffffffffa051e84a>] ? ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc] <4> [<ffffffffa05234bc>] ? ldlm_cli_cancel+0x9c/0x3e0 [ptlrpc] <4> [<ffffffffa090b70b>] ? osc_ldlm_blocking_ast+0xdb/0x3c0 [osc] <4> [<ffffffff810672c2>] ? default_wake_function+0x12/0x20 <4> [<ffffffffa0527400>] ? ldlm_handle_bl_callback+0x130/0x400 [ptlrpc] <4> [<ffffffffa0527ee4>] ? ldlm_bl_thread_main+0x484/0x700 [ptlrpc] <4> [<ffffffff810672b0>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa0527a60>] ? ldlm_bl_thread_main+0x0/0x700 [ptlrpc] <4> [<ffffffff810a101e>] ? kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] ? child_rip+0xa/0x20 <4> [<ffffffff810a0f80>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
Please check the new patch and see if it can fix the problem.
I did very similar thing with iozone but they wrote to different files, let me try your reproducer.
Hi Jinshan,
Hardware shouldn't matter as I'm reproducing it on my VMs.
Here are the scripts I used.
On client, I'm using IOR and when IOR dies because of eviction, just bring up new ones:
#!/bin/sh while true do num=$(ps aux | grep IOR | wc -l) while [ $num -lt 20 ] do /root/IOR/src/C/IOR -b 8g -w -e -E -t 1m -v -k -o /mnt/testfile & num=$(ps aux | grep IOR | wc -l) done sleep 1 done
Here I got 20 of them running at the same time.
On the OSS:
#!/bin/sh while true do for ost in /proc/fs/lustre/obdfilter/* do if [ -n $(cat $ost/exports/<your client nid>/uuid) ] then echo $(cat $ost/exports/<your client nid>/uuid) > $ost/evict_client fi done sleep 10 done
As you can see I'm evicting the client over and over, and I'm using 10secs as the interval. Again you can change this.
Thanks
Dongyang
Hi Dongyang,
it certainly lasted longer on my VM node
Please upload your reproducer and I will restart working on this issue once I get access to a real hardware.
Thanks,
Hi Jinshan,
I'm using master which has 14989 landed already, plus patch set 3 of 16456:
The client crashed 5mins after I started the reproducer:
Sep 18 13:29:00 client kernel: LustreError: 16824:0:(ldlm_resource.c:887:ldlm_resource_complain()) testfs-OST0000-osc-ffff8800375c4000: namespace resource [0x303:0x0:0x0].0 (ffff88003d2abcc0) refcount nonzero (1) after lock cleanup; forcing cleanup.
Sep 18 13:29:00 client kernel: LustreError: 16824:0:(ldlm_resource.c:1502:ldlm_resource_dump()) — Resource: [0x303:0x0:0x0].0 (ffff88003d2abcc0) refcount = 3
Sep 18 13:29:00 client kernel: LustreError: 16824:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order):
Sep 18 13:29:00 client kernel: LustreError: 16824:0:(ldlm_resource.c:1508:ldlm_resource_dump()) ### ### ns: testfs-OST0000-osc-ffff8800375c4000 lock: ffff88000b69d380/0x2104da78ff029aa9 lrc: 8/0,1 mode: PW/PW res: [0x303:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x126400000000 nid: local remote: 0x59a587f3f1d1d697 expref: -99 pid: 16588 timeout: 0 lvb_type: 1
Sep 18 13:29:00 client kernel: LustreError: 16824:0:(ldlm_resource.c:1502:ldlm_resource_dump()) — Resource: [0x303:0x0:0x0].0 (ffff88003d2abcc0) refcount = 2
Sep 18 13:29:00 client kernel: LustreError: 16824:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order):
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_cache.c:3134:discard_cb()) page@ffff88003d5a7600[3 ffff88003a513b38 1 0 1 ffff88003a5d6b78]
Sep 18 13:29:01 client kernel:
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_cache.c:3134:discard_cb()) vvp-page@ffff88003d5a7650(0:0) vm@ffffea00004439e0 2000000000087d 3:0 ffff88003d5a7600 25088 lru
Sep 18 13:29:01 client kernel:
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_cache.c:3134:discard_cb()) lov-page@ffff88003d5a7690, raid0
Sep 18 13:29:01 client kernel:
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_cache.c:3134:discard_cb()) osc-page@ffff88003d5a76f8 25088: 1< 0x845fed 258 0 + - > 2< 102760448 0 4096 0x0 0x520 | (null) ffff88003c479500 ffff88003a517e20 > 3< 1 0 0 > 4< 0 9 8 18446744073642434560 + | + - + - > 5< + - + - | 0 - | 5635 - ->
Sep 18 13:29:01 client kernel:
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_cache.c:3134:discard_cb()) end page@ffff88003d5a7600
Sep 18 13:29:01 client kernel:
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_cache.c:3134:discard_cb()) discard dirty page?
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_cache.c:2447:osc_teardown_async_page()) extent ffff880039a18720@
trunc at 25088.
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_cache.c:2447:osc_teardown_async_page()) ### extent: ffff880039a18720
Sep 18 13:29:01 client kernel: ns: testfs-OST0000-osc-ffff8800375c4000 lock: ffff88000b69d580/0x2104da78ff029c0e lrc: 41/0,3 mode: PW/PW res: [0x303:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615] (req 102760448->103809023) flags: 0x20000000000 nid: local remote: 0x59a587f3f1d1d6ac expref: -99 pid: 16683 timeout: 0 lvb_type: 1
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_page.c:307:osc_page_delete()) page@ffff88003d5a7600[3 ffff88003a513b38 4 0 1 (null)]
Sep 18 13:29:01 client kernel:
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_page.c:307:osc_page_delete()) vvp-page@ffff88003d5a7650(0:0) vm@ffffea00004439e0 2000000000087d 3:0 ffff88003d5a7600 25088 lru
Sep 18 13:29:01 client kernel:
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_page.c:307:osc_page_delete()) lov-page@ffff88003d5a7690, raid0
Sep 18 13:29:01 client kernel:
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_page.c:307:osc_page_delete()) osc-page@ffff88003d5a76f8 25088: 1< 0x845fed 258 0 + - > 2< 102760448 0 4096 0x0 0x520 | (null) ffff88003c479500 ffff88003a517e20 > 3< 0 0 0 > 4< 0 9 8 18446744073642434560 + | + - + - > 5< + - + - | 0 - | 5635 - ->
Sep 18 13:29:01 client kernel:
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_page.c:307:osc_page_delete()) end page@ffff88003d5a7600
Sep 18 13:29:01 client kernel:
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_page.c:307:osc_page_delete()) Trying to teardown failed: -16
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_page.c:308:osc_page_delete()) ASSERTION( 0 ) failed:
Message from syslogd@client at Sep 18 13:29:01 ...
kernel:LustreError: 13392:0:(osc_page.c:308:osc_page_delete()) ASSERTION( 0 ) failed:
Sep 18 13:29:01 client kernel: LustreError: 13392:0:(osc_page.c:308:osc_page_delete()) LBUG
Sep 18 13:29:01 client kernel: Pid: 13392, comm: ldlm_bl_00
Sep 18 13:29:01 client kernel:
Sep 18 13:29:01 client kernel: Call Trace:
Sep 18 13:29:01 client kernel: [<ffffffffa10ad875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Sep 18 13:29:01 client kernel: [<ffffffffa10ade77>] lbug_with_loc+0x47/0xb0 [libcfs]
Sep 18 13:29:01 client kernel: [<ffffffffa178bf9e>] osc_page_delete+0x46e/0x4e0 [osc]
Sep 18 13:29:01 client kernel: [<ffffffffa12019cd>] cl_page_delete0+0x7d/0x210 [obdclass]
Sep 18 13:29:01 client kernel: [<ffffffffa1201b9d>] cl_page_delete+0x3d/0x110 [obdclass]
Sep 18 13:29:01 client kernel: [<ffffffffa16b6d2d>] ll_invalidatepage+0x8d/0x160 [lustre]
Sep 18 13:29:01 client kernel: [<ffffffffa16c5d85>] vvp_page_discard+0xc5/0x160 [lustre]
Sep 18 13:29:01 client kernel: [<ffffffffa11fffc8>] cl_page_invoid+0x68/0x160 [obdclass]
Sep 18 13:29:01 client kernel: [<ffffffffa12000d3>] cl_page_discard+0x13/0x20 [obdclass]
Sep 18 13:29:01 client kernel: [<ffffffffa1797158>] discard_cb+0x88/0x1e0 [osc]
Sep 18 13:29:01 client kernel: [<ffffffffa1796f4e>] osc_page_gang_lookup+0x1ae/0x330 [osc]
Sep 18 13:29:01 client kernel: [<ffffffffa17970d0>] ? discard_cb+0x0/0x1e0 [osc]
Sep 18 13:29:01 client kernel: [<ffffffffa17973f4>] osc_lock_discard_pages+0x144/0x240 [osc]
Sep 18 13:29:01 client kernel: [<ffffffffa17970d0>] ? discard_cb+0x0/0x1e0 [osc]
Sep 18 13:29:01 client kernel: [<ffffffffa178e45b>] osc_lock_flush+0x8b/0x260 [osc]
Sep 18 13:29:01 client kernel: [<ffffffffa178e8d8>] osc_ldlm_blocking_ast+0x2a8/0x3c0 [osc]
Sep 18 13:29:01 client kernel: [<ffffffffa138ea57>] ldlm_cancel_callback+0x87/0x280 [ptlrpc]
Sep 18 13:29:01 client kernel: [<ffffffff81060530>] ? __dequeue_entity+0x30/0x50
Sep 18 13:29:01 client kernel: [<ffffffff8100969d>] ? __switch_to+0x7d/0x340
Sep 18 13:29:01 client kernel: [<ffffffffa13a184a>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc]
Sep 18 13:29:01 client kernel: [<ffffffffa13a6480>] ldlm_cli_cancel+0x60/0x360 [ptlrpc]
Sep 18 13:29:01 client kernel: [<ffffffffa178e70b>] osc_ldlm_blocking_ast+0xdb/0x3c0 [osc]
Sep 18 13:29:01 client kernel: [<ffffffffa13aa380>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc]
Sep 18 13:29:01 client kernel: [<ffffffffa13aae64>] ldlm_bl_thread_main+0x484/0x700 [ptlrpc]
Sep 18 13:29:01 client kernel: [<ffffffff810672b0>] ? default_wake_function+0x0/0x20
Sep 18 13:29:01 client kernel: [<ffffffffa13aa9e0>] ? ldlm_bl_thread_main+0x0/0x700 [ptlrpc]
Sep 18 13:29:01 client kernel: [<ffffffff810a101e>] kthread+0x9e/0xc0
Sep 18 13:29:01 client kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Hi Dongyang,
Please try patch 16456 and see if it can fix the problem.
Jinshan
Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/16456
Subject: LU-6271 osc: further OSC cleanup after eviction
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9dd22234a098bb2bea26c4694d91edc928e027ac
I reckon it is too soon to land http://review.whamcloud.com/14989/
With the patch the client still crashes when I run the reproducer, for example:
<3>LustreError: 15348:0:(ldlm_resource.c:835:ldlm_resource_complain()) testfs-OST0000-osc-ffff880037fea000: namespace resource [0x302:0x0:0x0].0 (ffff88003deae500) refcount no
nzero (1) after lock cleanup; forcing cleanup.
<3>LustreError: 15348:0:(ldlm_resource.c:1450:ldlm_resource_dump()) — Resource: [0x302:0x0:0x0].0 (ffff88003deae500) refcount = 2
<3>LustreError: 15348:0:(ldlm_resource.c:1453:ldlm_resource_dump()) Granted locks (in reverse order):
<3>LustreError: 15348:0:(ldlm_resource.c:1456:ldlm_resource_dump()) ### ### ns: testfs-OST0000-osc-ffff880037fea000 lock: ffff880037e29940/0xaa5a0b4e389da92d lrc: 4/0,1 mode:
PW/PW res: [0x302:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x126400000000 nid: local remote: 0x59a587f3f1d1130e expref: -99 pid: 15181 tim
eout: 0 lvb_type: 1
<3>LustreError: 15348:0:(ldlm_resource.c:1450:ldlm_resource_dump()) — Resource: [0x302:0x0:0x0].0 (ffff88003deae500) refcount = 2
<3>LustreError: 15348:0:(ldlm_resource.c:1453:ldlm_resource_dump()) Granted locks (in reverse order):
<3>LustreError: 12670:0:(osc_cache.c:3141:discard_cb()) page@ffff880038c3fa00[3 ffff88003ac29b38 1 0 1 ffff88003a78ebb8 (null)]
<3>
<3>LustreError: 12670:0:(osc_cache.c:3141:discard_cb()) vvp-page@ffff880038c3fa68(0:0) vm@ffffea000071dfb8 2000000000087f 2:0 ffff880038c3fa00 21504 lru
<3>
<3>LustreError: 12670:0:(osc_cache.c:3141:discard_cb()) lov-page@ffff880038c3faa8, raid0
<3>
<3>LustreError: 12670:0:(osc_cache.c:3141:discard_cb()) osc-page@ffff880038c3fb10 21504: 1< 0x845fed 258 0 + - > 2< 88080384 0 4096 0x0 0x520 | (null) ffff88003ac58500 ffff880
03cfdbe60 > 3< 1 0 0 > 4< 0 10 8 18446744073619427328 + | + - + - > 5< + - + - | 0 - | 6663 - ->
<3>
<3>LustreError: 12670:0:(osc_cache.c:3141:discard_cb()) end page@ffff880038c3fa00
<3>
<3>LustreError: 12670:0:(osc_cache.c:3141:discard_cb()) discard dirty page?
<3>LustreError: 12670:0:(osc_cache.c:2454:osc_teardown_async_page()) extent ffff88003da0eb10@
trunc at 21504.
<3>LustreError: 12670:0:(osc_cache.c:2454:osc_teardown_async_page()) ### extent: ffff88003da0eb10
<3> ns: testfs-OST0000-osc-ffff880037fea000 lock: ffff88003c75d680/0xaa5a0b4e389daa3e lrc: 68/0,6 mode: PW/PW res: [0x302:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615]
(req 70254592->71303167) flags: 0x20000000000 nid: local remote: 0x59a587f3f1d11323 expref: -99 pid: 15246 timeout: 0 lvb_type: 1
<3>LustreError: 12670:0:(osc_page.c:307:osc_page_delete()) page@ffff880038c3fa00[3 ffff88003ac29b38 4 0 1 (null) (null)]
<3>
<3>LustreError: 12670:0:(osc_page.c:307:osc_page_delete()) vvp-page@ffff880038c3fa68(0:0) vm@ffffea000071dfb8 2000000000087f 2:0 ffff880038c3fa00 21504 lru
<3>
<3>LustreError: 12670:0:(osc_page.c:307:osc_page_delete()) lov-page@ffff880038c3faa8, raid0
<3>
<3>LustreError: 12670:0:(osc_page.c:307:osc_page_delete()) osc-page@ffff880038c3fb10 21504: 1< 0x845fed 258 0 + - > 2< 88080384 0 4096 0x0 0x520 | (null) ffff88003ac58500 ffff
88003cfdbe60 > 3< 0 0 0 > 4< 0 9 8 18446744073619361792 + | + - + - > 5< + - + - | 0 - | 6679 - ->
<3>
<3>LustreError: 12670:0:(osc_page.c:307:osc_page_delete()) end page@ffff880038c3fa00
<3>
<3>LustreError: 12670:0:(osc_page.c:307:osc_page_delete()) Trying to teardown failed: -16
<0>LustreError: 12670:0:(osc_page.c:308:osc_page_delete()) ASSERTION( 0 ) failed:
<0>LustreError: 12670:0:(osc_page.c:308:osc_page_delete()) LBUG
<4>Pid: 12670, comm: ldlm_bl_00
<4>
<4>Call Trace:
<4> [<ffffffffa021d875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4> [<ffffffffa021de77>] lbug_with_loc+0x47/0xb0 [libcfs]
<4> [<ffffffffa090dbfe>] osc_page_delete+0x46e/0x4e0 [osc]
<4> [<ffffffffa037e9dd>] cl_page_delete0+0x7d/0x210 [obdclass]
<4> [<ffffffffa037ebad>] cl_page_delete+0x3d/0x110 [obdclass]
<4> [<ffffffffa0837d0d>] ll_invalidatepage+0x8d/0x160 [lustre]
<4> [<ffffffffa0846da5>] vvp_page_discard+0xc5/0x160 [lustre]
<4> [<ffffffffa037cfd8>] cl_page_invoid+0x68/0x160 [obdclass]
<4> [<ffffffffa037d0e3>] cl_page_discard+0x13/0x20 [obdclass]
<4> [<ffffffffa0919678>] discard_cb+0x88/0x1e0 [osc]
<4> [<ffffffffa091946e>] osc_page_gang_lookup+0x1ae/0x330 [osc]
<4> [<ffffffffa09195f0>] ? discard_cb+0x0/0x1e0 [osc]
<4> [<ffffffffa0919914>] osc_lock_discard_pages+0x144/0x240 [osc]
<4> [<ffffffffa09195f0>] ? discard_cb+0x0/0x1e0 [osc]
<4> [<ffffffffa090ff7b>] osc_lock_flush+0x8b/0x260 [osc]
<4> [<ffffffffa09103f8>] osc_ldlm_blocking_ast+0x2a8/0x3c0 [osc]
<4> [<ffffffffa050e9dc>] ldlm_cancel_callback+0x6c/0x170 [ptlrpc]
<4> [<ffffffffa052190a>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc]
<4> [<ffffffffa0526540>] ldlm_cli_cancel+0x60/0x360 [ptlrpc]
<4> [<ffffffffa091022b>] osc_ldlm_blocking_ast+0xdb/0x3c0 [osc]
<4> [<ffffffffa052a440>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc]
<4> [<ffffffffa052af24>] ldlm_bl_thread_main+0x484/0x700 [ptlrpc]
<4> [<ffffffff810672b0>] ? default_wake_function+0x0/0x20
<4> [<ffffffffa052aaa0>] ? ldlm_bl_thread_main+0x0/0x700 [ptlrpc]
<4> [<ffffffff810a101e>] kthread+0x9e/0xc0
<4> [<ffffffff8100c28a>] child_rip+0xa/0x20
<4> [<ffffffff810a0f80>] ? kthread+0x0/0xc0
<4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
<4>
<0>Kernel panic - not syncing: LBUG
<4>Pid: 12670, comm: ldlm_bl_00 Not tainted 2.6.32-573.3.1.el6.x86_64 #1
<4>Call Trace:
<4> [<ffffffff81537c54>] ? panic+0xa7/0x16f
<4> [<ffffffffa021decb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
<4> [<ffffffffa090dbfe>] ? osc_page_delete+0x46e/0x4e0 [osc]
<4> [<ffffffffa037e9dd>] ? cl_page_delete0+0x7d/0x210 [obdclass]
<4> [<ffffffffa037ebad>] ? cl_page_delete+0x3d/0x110 [obdclass]
<4> [<ffffffffa0837d0d>] ? ll_invalidatepage+0x8d/0x160 [lustre]
<4> [<ffffffffa0846da5>] ? vvp_page_discard+0xc5/0x160 [lustre]
<4> [<ffffffffa037cfd8>] ? cl_page_invoid+0x68/0x160 [obdclass]
<4> [<ffffffffa037d0e3>] ? cl_page_discard+0x13/0x20 [obdclass]
<4> [<ffffffffa0919678>] ? discard_cb+0x88/0x1e0 [osc]
<4> [<ffffffffa091946e>] ? osc_page_gang_lookup+0x1ae/0x330 [osc]
<4> [<ffffffffa09195f0>] ? discard_cb+0x0/0x1e0 [osc]
<4> [<ffffffffa0919914>] ? osc_lock_discard_pages+0x144/0x240 [osc]
<4> [<ffffffffa09195f0>] ? discard_cb+0x0/0x1e0 [osc]
<4> [<ffffffffa090ff7b>] ? osc_lock_flush+0x8b/0x260 [osc]
<4> [<ffffffffa09103f8>] ? osc_ldlm_blocking_ast+0x2a8/0x3c0 [osc]
<4> [<ffffffffa050e9dc>] ? ldlm_cancel_callback+0x6c/0x170 [ptlrpc]
<4> [<ffffffffa052190a>] ? ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc]
<4> [<ffffffffa0526540>] ? ldlm_cli_cancel+0x60/0x360 [ptlrpc]
<4> [<ffffffffa091022b>] ? osc_ldlm_blocking_ast+0xdb/0x3c0 [osc]
<4> [<ffffffffa052a440>] ? ldlm_handle_bl_callback+0x130/0x400 [ptlrpc]
<4> [<ffffffffa052af24>] ? ldlm_bl_thread_main+0x484/0x700 [ptlrpc]
<4> [<ffffffff810672b0>] ? default_wake_function+0x0/0x20
<4> [<ffffffffa052aaa0>] ? ldlm_bl_thread_main+0x0/0x700 [ptlrpc]
<4> [<ffffffff810a101e>] ? kthread+0x9e/0xc0
<4> [<ffffffff8100c28a>] ? child_rip+0xa/0x20
<4> [<ffffffff810a0f80>] ? kthread+0x0/0xc0
<4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
and another one, however don't know if it's related.
<3>LustreError: 4278:0:(ldlm_resource.c:835:ldlm_resource_complain()) testfs-OST0001-osc-ffff88003c715c00: namespace resource [0x302:0x0:0x0].0 (ffff880037015800) refcount non
zero (1) after lock cleanup; forcing cleanup.
<3>LustreError: 4278:0:(ldlm_resource.c:1450:ldlm_resource_dump()) — Resource: [0x302:0x0:0x0].0 (ffff880037015800) refcount = 2
<3>LustreError: 4278:0:(ldlm_resource.c:1453:ldlm_resource_dump()) Granted locks (in reverse order):
<3>LustreError: 4278:0:(ldlm_resource.c:1456:ldlm_resource_dump()) ### ### ns: testfs-OST0001-osc-ffff88003c715c00 lock: ffff880037269380/0x532f2161a177e283 lrc: 19/0,1 mode:
PW/PW res: [0x302:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x126400000000 nid: local remote: 0x59a587f3f1d100b5 expref: -99 pid: 4040 time
out: 0 lvb_type: 1
<3>LustreError: 790:0:(osc_io.c:1010:osc_req_attr_set()) page@ffff880022f02600[2 ffff88003be7db38 2 0 1 (null) ffff88003df849c0]
<3>
<3>LustreError: 789:0:(osc_io.c:1010:osc_req_attr_set()) page@ffff88003dfc8000[2 ffff88003be7db38 2 0 1 (null) ffff88003b3982c0]
<3>
<3>LustreError: 789:0:(osc_io.c:1010:osc_req_attr_set()) vvp-page@ffff88003dfc8068(0:0) vm@ffffea0000785428 2000000000282c 2:0 ffff88003dfc8000 19200 lru
<3>
<3>LustreError: 789:0:(osc_io.c:1010:osc_req_attr_set()) lov-page@ffff88003dfc80a8, raid0
<3>
<3>LustreError: 789:0:(osc_io.c:1010:osc_req_attr_set()) osc-page@ffff88003dfc8110 19200: 1< 0x845fed 258 0 + + > 2< 78643200 0 4096 0x5 0x520 | (null) ffff88003d970540 ffff88
003bc81e20 > 3< 1 12 0 > 4< 0 7 8 18446744073678077952 - | - - - - > 5< - - - - | 0 - | 0 - ->
<3>
<3>LustreError: 789:0:(osc_io.c:1010:osc_req_attr_set()) end page@ffff88003dfc8000
<3>
<3>LustreError: 789:0:(osc_io.c:1010:osc_req_attr_set()) uncovered page!
<3>LustreError: 789:0:(ldlm_resource.c:1450:ldlm_resource_dump()) — Resource: [0x302:0x0:0x0].0 (ffff880037015800) refcount = 3
<4>Pid: 789, comm: ptlrpcd_00_00
<4>
<4>Call Trace:
<4> [<ffffffffa021d875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4> [<ffffffffa09133fa>] osc_req_attr_set+0x55a/0x720 [osc]
<4> [<ffffffffa0383829>] cl_req_attr_set+0xc9/0x220 [obdclass]
<4> [<ffffffffa0904082>] osc_build_rpc+0x882/0x12d0 [osc]
<4> [<ffffffffa091f623>] osc_io_unplug0+0x1133/0x1af0 [osc]
<4> [<ffffffffa0918428>] ? osc_ap_completion+0x1a8/0x550 [osc]
<4> [<ffffffffa0917a3e>] ? osc_extent_put+0xbe/0x260 [osc]
<4> [<ffffffffa0374f75>] ? lu_object_put+0x135/0x3b0 [obdclass]
<4> [<ffffffffa09224b0>] osc_io_unplug+0x10/0x20 [osc]
<4> [<ffffffffa0905593>] brw_interpret+0xac3/0x2320 [osc]
<4> [<ffffffffa0546ee2>] ? ptlrpc_unregister_bulk+0xa2/0xac0 [ptlrpc]
<4> [<ffffffffa053f4bc>] ? ptlrpc_unregister_reply+0x6c/0x810 [ptlrpc]
<4> [<ffffffffa053e2a4>] ? ptlrpc_send_new_req+0x154/0x980 [ptlrpc]
<4> [<ffffffffa0540551>] ptlrpc_check_set+0x331/0x1be0 [ptlrpc]
<4> [<ffffffffa056e443>] ptlrpcd_check+0x3d3/0x610 [ptlrpc]
<4> [<ffffffffa056e8fa>] ptlrpcd+0x27a/0x500 [ptlrpc]
<4> [<ffffffff810672b0>] ? default_wake_function+0x0/0x20
<4> [<ffffffffa056e680>] ? ptlrpcd+0x0/0x500 [ptlrpc]
<4> [<ffffffff810a101e>] kthread+0x9e/0xc0
<4> [<ffffffff8100c28a>] child_rip+0xa/0x20
<4> [<ffffffff810a0f80>] ? kthread+0x0/0xc0
<4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
<4>
<0>LustreError: 789:0:(osc_io.c:1020:osc_req_attr_set()) LBUG
<4>Pid: 789, comm: ptlrpcd_00_00
<4>
<4>Call Trace:
<4> [<ffffffffa021d875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4> [<ffffffffa021de77>] lbug_with_loc+0x47/0xb0 [libcfs]
<4> [<ffffffffa0913410>] osc_req_attr_set+0x570/0x720 [osc]
<4> [<ffffffffa0383829>] cl_req_attr_set+0xc9/0x220 [obdclass]
<4> [<ffffffffa0904082>] osc_build_rpc+0x882/0x12d0 [osc]
<4> [<ffffffffa091f623>] osc_io_unplug0+0x1133/0x1af0 [osc]
<4> [<ffffffffa0918428>] ? osc_ap_completion+0x1a8/0x550 [osc]
<4> [<ffffffffa0917a3e>] ? osc_extent_put+0xbe/0x260 [osc]
<4> [<ffffffffa0374f75>] ? lu_object_put+0x135/0x3b0 [obdclass]
<4> [<ffffffffa09224b0>] osc_io_unplug+0x10/0x20 [osc]
<4> [<ffffffffa0905593>] brw_interpret+0xac3/0x2320 [osc]
<4> [<ffffffffa0546ee2>] ? ptlrpc_unregister_bulk+0xa2/0xac0 [ptlrpc]
<4> [<ffffffffa053f4bc>] ? ptlrpc_unregister_reply+0x6c/0x810 [ptlrpc]
<4> [<ffffffffa053e2a4>] ? ptlrpc_send_new_req+0x154/0x980 [ptlrpc]
<4> [<ffffffffa0540551>] ptlrpc_check_set+0x331/0x1be0 [ptlrpc]
<4> [<ffffffffa056e443>] ptlrpcd_check+0x3d3/0x610 [ptlrpc]
<4> [<ffffffffa056e8fa>] ptlrpcd+0x27a/0x500 [ptlrpc]
<4> [<ffffffff810672b0>] ? default_wake_function+0x0/0x20
<4> [<ffffffffa056e680>] ? ptlrpcd+0x0/0x500 [ptlrpc]
<4> [<ffffffff810a101e>] kthread+0x9e/0xc0
<4> [<ffffffff8100c28a>] child_rip+0xa/0x20
<4> [<ffffffff810a0f80>] ? kthread+0x0/0xc0
<4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14989/
Subject: LU-6271 osc: handle osc eviction correctly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8f01f8b51d114b0d2d54a5ab7db3161782e52447
Hi Dongyang, please upload your test cases.