[LU-16584] mdc_page_locate(): ASSERTION( *start <= *hash ) failed Created: 20/Feb/23  Updated: 21/Feb/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Lukasz Flis Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

LBUG in mdc_page_locate is happening from time to time on our HPC login nodes and ARC node. 

[134359.416653] LustreError: 3619:0:(mdc_request.c:1142:mdc_page_locate()) ASSERTION( *start <= *hash ) failed: start = 0x574,end = 0x18e,hash = 0x18e
[134359.420056] LustreError: 3619:0:(mdc_request.c:1142:mdc_page_locate()) LBUG
[134359.422647] Pid: 3619, comm: cache-clean 3.10.0-1160.83.1.el7.x86_64 #1 SMP Wed Jan 25 16:41:43 UTC 2023
[134359.425367] Call Trace:
[134359.429821]  [<ffffffffc08567cc>] libcfs_call_trace+0x8c/0xd0 [libcfs]
[134359.432752]  [<ffffffffc085689c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[134359.434703]  [<ffffffffc0b60a80>] mdc_read_page+0x8a0/0x970 [mdc]
[134359.436624]  [<ffffffffc0dac1e6>] lmv_read_page+0x156/0x3b0 [lmv]
[134359.438808]  [<ffffffffc0dd98ec>] ll_get_dir_page+0xac/0x1b0 [lustre]
[134359.441457]  [<ffffffffc0dd9ccf>] ll_dir_read+0x20f/0x320 [lustre]
[134359.443544]  [<ffffffffc0dd9f20>] ll_readdir+0x140/0x4d0 [lustre]
[134359.445484]  [<ffffffffa7c717af>] iterate_dir+0x9f/0x140
[134359.448575]  [<ffffffffa7c71cdc>] SyS_getdents+0x9c/0x120
[134359.450480]  [<ffffffffa81c539a>] system_call_fastpath+0x25/0x2a
[134359.452632]  [<ffffffffffffffff>] 0xffffffffffffffff
[134359.458091] Kernel panic - not syncing: LBUG
[134359.459266] CPU: 1 PID: 3619 Comm: cache-clean Kdump: loaded Tainted: G           OE  ------------   3.10.0-1160.83.1.el7.x86_64 #1
[134359.461671] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[134359.462959] Call Trace:
[134359.463877]  [<ffffffffa81b1bec>] dump_stack+0x19/0x1f
[134359.465676]  [<ffffffffa81ab708>] panic+0xe8/0x21f
[134359.467002]  [<ffffffffc08568eb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[134359.468329]  [<ffffffffc0b60a80>] mdc_read_page+0x8a0/0x970 [mdc]
[134359.469654]  [<ffffffffc0dac1e6>] lmv_read_page+0x156/0x3b0 [lmv]
[134359.471193]  [<ffffffffc0dd98ec>] ll_get_dir_page+0xac/0x1b0 [lustre]
[134359.472991]  [<ffffffffc0e16ef0>] ? ll_md_need_convert+0x1b0/0x1b0 [lustre]
[134359.474933]  [<ffffffffc0dd9ccf>] ll_dir_read+0x20f/0x320 [lustre]
[134359.476628]  [<ffffffffa7c71850>] ? iterate_dir+0x140/0x140
[134359.477767]  [<ffffffffc0dd9f20>] ll_readdir+0x140/0x4d0 [lustre]
[134359.478923]  [<ffffffffa7c71850>] ? iterate_dir+0x140/0x140
[134359.480001]  [<ffffffffa7c717af>] iterate_dir+0x9f/0x140
[134359.481022]  [<ffffffffa7c71cdc>] SyS_getdents+0x9c/0x120
[134359.482050]  [<ffffffffa7c71850>] ? iterate_dir+0x140/0x140
[134359.483325]  [<ffffffffa81c539a>] system_call_fastpath+0x25/0x2a

vmcore available additional info is needed



 Comments   
Comment by Lukasz Flis [ 20/Feb/23 ]

seems like already patched in LU-4472 

Comment by Peter Jones [ 21/Feb/23 ]

Could you please advise as to which version this was observed on?

Comment by Lukasz Flis [ 21/Feb/23 ]

This must have been lost while moving tickets (it was in environment section), readding for the sake of completeness:

client versions affected: 
2.12.8_26_ga3cc651
2.12.9_38_g20b8a89
os: centos 7.9
  3.10.0-1160.71.1.el7.x86_64
  3.10.0-1160.83.1.el7.x86_64

 

Generated at Sat Feb 10 03:28:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.