[LU-7066] osc_cache.c:2944:osc_cache_writeback_range()) ASSERTION( hp == 0 && discard == 0 ) failed Created: 31/Aug/15  Updated: 06/Sep/16  Resolved: 01/Sep/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Cliff White (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Hyperion/ SWL test


Issue Links:
Duplicate
duplicates LU-6271 (osc_cache.c:3150:discard_cb()) ASSER... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Running SWL, clients LBUG. Similar to LU-2643 but newer

Aug 31 08:26:19 iwc55 kernel: LustreError: 64274:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: lustre-OST0002-osc-ffff8808427fd400 lock: ffff880687108700/0x22278a710c5f0c16 lrc: 3/0,1 mode: PW/PW res: [0x300000406:0x8dd59:0x0].0 rrc: 12 type: EXT [106954752->113246207] (req 106954752->109051903) flags: 0x126400020000 nid: local remote: 0x9307b5d51b4d3df2 expref: -99 pid: 64221 timeout: 0 lvb_type: 1
Aug 31 08:26:19 iwc55 kernel: Lustre: lustre-OST0002-osc-ffff8808427fd400: Connection restored to lustre-OST0002 (at 192.168.120.15@o2ib)
Aug 31 08:26:19 iwc55 kernel: LustreError: 39539:0:(osc_cache.c:2944:osc_cache_writeback_range()) ASSERTION( hp == 0 && discard == 0 ) failed:
Aug 31 08:26:19 iwc55 kernel: LustreError: 39539:0:(osc_cache.c:2944:osc_cache_writeback_range()) ASSERTION( hp == 0 && discard == 0 ) failed:
Aug 31 08:26:19 iwc55 kernel: LustreError: 39539:0:(osc_cache.c:2944:osc_cache_writeback_range()) LBUG
Aug 31 08:26:19 iwc55 kernel: LustreError: 39539:0:(osc_cache.c:2944:osc_cache_writeback_range()) LBUG
Aug 31 08:26:19 iwc55 kernel: Pid: 39539, comm: ldlm_bl_62
Aug 31 08:26:19 iwc55 kernel:
Aug 31 08:26:19 iwc55 kernel: Call Trace:
Aug 31 08:26:19 iwc55 kernel: [<ffffffffa0474875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Aug 31 08:26:19 iwc55 kernel: [<ffffffffa0474e77>] lbug_with_loc+0x47/0xb0 [libcfs]
Aug 31 08:26:19 iwc55 kernel: [<ffffffffa0b3eea7>] osc_cache_writeback_range+0x1067/0x1280 [osc]
Aug 31 08:26:19 iwc55 kernel: [<ffffffffa0603fe9>] ? cl_env_hops_keycmp+0x19/0x70 [obdclass]
Aug 31 08:26:20 iwc55 kernel: [<ffffffffa0b27c05>] osc_lock_flush+0x175/0x260 [osc]
Aug 31 08:26:20 iwc55 kernel: [<ffffffffa0b27f98>] osc_ldlm_blocking_ast+0x2a8/0x3c0 [osc]
Aug 31 08:26:20 iwc55 kernel: [<ffffffffa079a9dc>] ldlm_cancel_callback+0x6c/0x170 [ptlrpc]
Aug 31 08:26:20 iwc55 kernel: [<ffffffffa07ad84a>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc]
Aug 31 08:26:20 iwc55 kernel: [<ffffffffa07b2480>] ldlm_cli_cancel+0x60/0x360 [ptlrpc]
Aug 31 08:26:20 iwc55 kernel: [<ffffffffa0b27dcb>] osc_ldlm_blocking_ast+0xdb/0x3c0 [osc]
Aug 31 08:26:20 iwc55 kernel: [<ffffffffa0480bf1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
Aug 31 08:26:20 iwc55 kernel: [<ffffffffa07b6380>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc]
Aug 31 08:26:20 iwc55 kernel: [<ffffffffa07b6e64>] ldlm_bl_thread_main+0x484/0x700 [ptlrpc]
Aug 31 08:26:20 iwc55 kernel: [<ffffffff81064c00>] ? default_wake_function+0x0/0x20
Aug 31 08:26:20 iwc55 kernel: [<ffffffffa07b69e0>] ? ldlm_bl_thread_main+0x0/0x700 [ptlrpc]
Aug 31 08:26:20 iwc55 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0
Aug 31 08:26:20 iwc55 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Aug 31 08:26:20 iwc55 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
Aug 31 08:26:20 iwc55 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20


 Comments   
Comment by Jinshan Xiong (Inactive) [ 31/Aug/15 ]

this is probably the same issue of LU-6271. One question raises why we can see so many eviction cases recently.

Comment by Li Xi (Inactive) [ 05/Sep/16 ]

We hit the problem recently even the branch already has LU-6271 patches.

I noticed that osc_lock_flush() always call osc_cache_writeback_range() with hp=1. That means, the assertion of LASSERT(hp == 0 && discard == 0) always fails if the extent is OES_ACTIVE. Is this expected behavior?

Comment by Jinshan Xiong (Inactive) [ 06/Sep/16 ]

it's expected that the extent's state can't be OES_ACTIVE when the lock is being canceled.

Generated at Sat Feb 10 02:05:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.