Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.13.0
-
None
-
3
-
9223372036854775807
Description
There's a flurry of failures. In maloo it mostly fails in sanity-flr test 200, but I also saw it in racer at least once with a partially similar backtrace.
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre2 Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-49vm6@tcp:/lustre /mnt/lustre2 Lustre: Mounted lustre-client Lustre: DEBUG MARKER: mkdir -p /mnt/lustre3 Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-49vm6@tcp:/lustre /mnt/lustre3 LustreError: 16418:0:(osc_cache.c:2553:osc_teardown_async_page()) extent ffff89076293af20@{[768 -> 1023/1023], [2|0|-|cache|wi|ffff890756aca8c0], [1703936|123|+|-|ffff890762821d40|256| (null)]} trunc at 848. LustreError: 16418:0:(osc_cache.c:2553:osc_teardown_async_page()) ### extent: ffff89076293af20 ns: lustre-OST0001-osc-ffff890767ec5800 lock: ffff890762821d40/0x10751cafee0649b3 lrc: 2/0,0 mode: PW/PW res: [0x175d:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 3473408->3977215) flags: 0x20000020000 nid: local remote: 0x70d1a9e0ebb537b3 expref: -99 pid: 16413 timeout: 0 lvb_type: 1 LustreError: 16418:0:(osc_page.c:192:osc_page_delete()) page@ffff89075dd84a00[2 ffff890736741ab0 4 1 (null)] LustreError: 16418:0:(osc_page.c:192:osc_page_delete()) vvp-page@ffff89075dd84a50(0:0) vm@ffffe55201ca5c80 1fffff0000083d 3:0 ffff89075dd84a00 1872 lru LustreError: 16418:0:(osc_page.c:192:osc_page_delete()) lov-page@ffff89075dd84a90, comp index: 70001, gen: 8 LustreError: 16418:0:(osc_page.c:192:osc_page_delete()) osc-page@ffff89075dd84ac8 848: 1< 0x845fed 258 0 + - > 2< 3473408 0 4096 0x0 0x520 | (null) ffff89076b15c6c0 ffff890756aca8c0 > 3< 0 0 0 > 4< 0 0 8 120979456 - | - - + - > 5< - - + - | 0 - | 123 - -> LustreError: 16418:0:(osc_page.c:192:osc_page_delete()) end page@ffff89075dd84a00 LustreError: 16418:0:(osc_page.c:192:osc_page_delete()) Trying to teardown failed: -16 LustreError: 16418:0:(osc_page.c:193:osc_page_delete()) ASSERTION( 0 ) failed: LustreError: 16418:0:(osc_page.c:193:osc_page_delete()) LBUG Pid: 16418, comm: cat 3.10.0-957.12.2.el7.x86_64 #1 SMP Tue May 14 21:24:32 UTC 2019 Call Trace: [<ffffffffc05b78bc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [<ffffffffc05b796c>] lbug_with_loc+0x4c/0xa0 [libcfs] [<ffffffffc0b8086f>] osc_page_delete+0x48f/0x500 [osc] [<ffffffffc073cdf0>] cl_page_delete0+0x80/0x220 [obdclass] [<ffffffffc073cfc3>] cl_page_delete+0x33/0x110 [obdclass] [<ffffffffc0c8c98f>] ll_invalidatepage+0x7f/0x170 [lustre] [<ffffffff8ffc6c8d>] do_invalidatepage_range+0x7d/0x90 [<ffffffff8ffc6d37>] truncate_inode_page+0x77/0x80 [<ffffffff8ffc6f6a>] truncate_inode_pages_range+0x1ea/0x700 [<ffffffff8ffc7495>] truncate_inode_pages+0x15/0x20 [<ffffffffc0ca06db>] vvp_prune+0x7b/0x2b0 [lustre] [<ffffffffc073a638>] cl_object_prune+0x58/0x110 [obdclass] [<ffffffffc0beecc9>] lov_conf_set+0x509/0xac0 [lov] [<ffffffffc073a753>] cl_conf_set+0x63/0x120 [obdclass] [<ffffffffc0c598f1>] ll_layout_conf+0x81/0x3f0 [lustre] [<ffffffffc0c5a1ad>] ll_layout_refresh+0x54d/0x900 [lustre] [<ffffffffc0c9fc97>] vvp_io_init+0x347/0x460 [lustre] [<ffffffffc07426ab>] cl_io_init0.isra.15+0x8b/0x160 [obdclass] [<ffffffffc0742843>] cl_io_init+0x43/0x80 [obdclass] [<ffffffffc07428f7>] cl_io_rw_init+0x77/0x210 [obdclass] [<ffffffffc0c502be>] ll_file_io_generic+0x17e/0xaf0 [lustre] [<ffffffffc0c51a52>] ll_file_aio_read+0x3a2/0x470 [lustre] [<ffffffffc0c51c20>] ll_file_read+0x100/0x1c0 [lustre] [<ffffffff9004167f>] vfs_read+0x9f/0x170 [<ffffffff9004253f>] SyS_read+0x7f/0xf0 [<ffffffff90575ddb>] system_call_fastpath+0x22/0x27 [<ffffffffffffffff>] 0xffffffffffffffff
example maloo report: https://testing.whamcloud.com/test_sets/4bc4f606-a1e3-11e9-8dbe-52540065bddc
racer:
LustreError: 12527:0:(osc_cache.c:2553:osc_teardown_async_page()) extent ffff8800a297e230@{[672 -> 1023/1023], [2|0|-|cache|wi|ffff8800a0bcee60], [1466368|352|+|-|ffff8800b81c0d80|1024| (null)]} trunc at 672. LustreError: 12527:0:(osc_cache.c:2553:osc_teardown_async_page()) ### extent: ffff8800a297e230 ns: lustre-OST0001-osc-ffff88010a173800 lock: ffff8800b81c0d80/0xfcc9cb8b08e86ca9 lrc: 2/0,0 mode: PW/PW res: [0x1e9:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 2752512->18446744073709551615) flags: 0x20000000000 nid: local remote: 0x4b2ddd8a9f2ac3e expref: -99 pid: 12139 timeout: 0 lvb_type: 1 LustreError: 12527:0:(osc_page.c:192:osc_page_delete()) page@ffff8800a3ce2e00[2 ffff8800a231ac58 4 1 (null)] LustreError: 12527:0:(osc_page.c:192:osc_page_delete()) vvp-page@ffff8800a3ce2e50(0:0) vm@ffffea00028cfb48 1fffff0000083d 3:0 ffff8800a3ce2e00 1696 lru LustreError: 12527:0:(osc_page.c:192:osc_page_delete()) lov-page@ffff8800a3ce2e90, comp index: 10001, gen: 5 LustreError: 12527:0:(osc_page.c:192:osc_page_delete()) osc-page@ffff8800a3ce2ec8 672: 1< 0x845fed 258 0 + - > 2< 2752512 0 4096 0x0 0x520 | (null) ffff8800c8118798 ffff8800a0bcee60 > 3< 0 0 0 > 4< 0 0 8 424652800 - | - - + - > 5< - - + - | 0 - | 352 - -> LustreError: 12527:0:(osc_page.c:192:osc_page_delete()) end page@ffff8800a3ce2e00 LustreError: 12527:0:(osc_page.c:192:osc_page_delete()) Trying to teardown failed: -16 LustreError: 12527:0:(osc_page.c:193:osc_page_delete()) ASSERTION( 0 ) failed: LustreError: 12527:0:(osc_page.c:193:osc_page_delete()) LBUG Pid: 12527, comm: touch 3.10.0-7.6-debug #5 SMP Tue Jun 11 21:26:20 EDT 2019 Call Trace: [<ffffffffa017e8cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [<ffffffffa017e97c>] lbug_with_loc+0x4c/0xa0 [libcfs] [<ffffffffa08133d9>] osc_page_delete+0x489/0x4f0 [osc] [<ffffffffa032dcad>] cl_page_delete0+0x7d/0x210 [obdclass] [<ffffffffa032de73>] cl_page_delete+0x33/0x110 [obdclass] [<ffffffffa0d6551f>] ll_invalidatepage+0x7f/0x170 [lustre] [<ffffffff811bc961>] do_invalidatepage_range+0x71/0x80 [<ffffffff811bca07>] truncate_inode_page+0x77/0x80 [<ffffffff811bcc42>] truncate_inode_pages_range+0x1f2/0x770 [<ffffffff811bd1d5>] truncate_inode_pages+0x15/0x20 [<ffffffffa0d78abb>] vvp_prune+0x7b/0x2b0 [lustre] [<ffffffffa032b595>] cl_object_prune+0x55/0x110 [obdclass] [<ffffffffa08d5121>] lov_conf_set+0x501/0xa90 [lov] [<ffffffffa032b6b0>] cl_conf_set+0x60/0x120 [obdclass] [<ffffffffa0d6f0cb>] cl_file_inode_init+0x12b/0x390 [lustre] [<ffffffffa0d44f35>] ll_update_inode+0x2e5/0x610 [lustre] [<ffffffffa0d56363>] ll_iget+0x253/0x350 [lustre] [<ffffffffa0d4955d>] ll_prep_inode+0x20d/0x9a0 [lustre] [<ffffffffa0d5849b>] ll_lookup_it+0xc0b/0x1e90 [lustre] [<ffffffffa0d5ad25>] ll_lookup_nd+0x85/0x190 [lustre] [<ffffffff8124147d>] lookup_real+0x1d/0x50 [<ffffffff81241f62>] __lookup_hash+0x42/0x60 [<ffffffff817b31f4>] lookup_slow+0x42/0xa7 [<ffffffff8124677f>] link_path_walk+0x81f/0x8c0 [<ffffffff812478fe>] path_openat+0xae/0x650 [<ffffffff812492cd>] do_filp_open+0x4d/0xb0 [<ffffffff81235167>] do_sys_open+0x137/0x240 [<ffffffff8123528e>] SyS_open+0x1e/0x20 [<ffffffff817cae15>] system_call_fastpath+0x1c/0x21 [<ffffffffffffffff>] 0xffffffffffffffff
sample report: http://testing.linuxhacker.ru:3333/lustre-reports/1051/testresults/racer-ldiskfs-DNE-centos7_x86_64-centos7_x86_64 and in maloo: https://testing.whamcloud.com/test_sets/58251f94-a11a-11e9-8dbe-52540065bddc
This seems to be a very recent crash scenario too appearing just two das ago.
Coincidentally we just landed this patch n that timeframe: https://review.whamcloud.com/35111
So I guess it's the prime suspect now.
Attachments
Issue Links
- is related to
-
LU-12328 FLR mirroring on 2.12.1-1 not usable if OST is down
- Resolved