[LU-13441] ASSERTION( i < 1000 ) failed Created: 09/Apr/20  Updated: 04/May/20  Resolved: 19/Apr/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.5, Lustre 2.12.4
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Aurelien Degremont (Inactive) Assignee: Aurelien Degremont (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Using file with more than 500 stripes trigger the following assertion failure on MDS with a ZFS backend. The filesystem had 700 OSTs.

LustreError: 8621:0:(osd_oi.c:796:osd_idc_add()) ASSERTION( i < 1000 ) failed:
LustreError: 8621:0:(osd_oi.c:796:osd_idc_add()) LBUG
Pid: 8621, comm: mdt01_000 4.14.165-131.185.amzn2.x86_64 #1 SMP Wed Jan 15 14:19:56 UTC 2020
 Call Trace:
 save_stack_trace_tsk+0x43/0x60
 libcfs_call_trace+0x86/0xc0 [libcfs]
 lbug_with_loc+0x3f/0x90 [libcfs]
 osd_idc_add+0x2ed/0x340 [osd_zfs]
 osd_idc_find_and_init+0x6a/0x80 [osd_zfs]
 osd_declare_create+0x151/0x340 [osd_zfs]
 local_object_declare_create+0x1e7/0x5c0 [obdclass]
 llog_osd_declare_create+0xd8/0x790 [obdclass]
 llog_declare_create+0xbc/0x1f0 [obdclass]
 llog_cat_declare_add_rec+0x15e/0x850 [obdclass]
 llog_declare_add+0x73/0x1a0 [obdclass]
 osp_sync_declare_add+0x13c/0x410 [osp]
 osp_declare_destroy+0xed/0x1c0 [osp]
 lod_sub_declare_destroy+0xc4/0x310 [lod]
 lod_obj_for_each_stripe+0xb6/0x220 [lod]
 lod_declare_destroy+0x445/0x4e0 [lod]
 mdd_declare_finish_unlink+0x80/0x250 [mdd]
 mdd_unlink+0x573/0xb50 [mdd]
 mdt_reint_unlink+0xd91/0x1470 [mdt]
 mdt_reint_rec+0x7f/0x250 [mdt]
 mdt_reint_internal+0x5ee/0x680 [mdt]
 mdt_reint+0x5e/0x110 [mdt]
 tgt_request_handle+0x814/0x14b0 [ptlrpc]
 ptlrpc_server_handle_request+0x2c7/0xb70 [ptlrpc]
 ptlrpc_main+0xb17/0x1ee0 [ptlrpc]
 kthread+0x11a/0x130
 ret_from_fork+0x35/0x40
 0xffffffffffffffff
 Kernel panic - not syncing: LBUG

 

This could be easily reproduced with the following commands: 

cd /mnt/lustre
mkdir data

cd data

lfs setstripe -c 500 file2

rm -f file2

lfs setstripe -c 511 file2

rm -f file2

lfs setstripe -c 512 file2

rm -f file2

<MDT CRASH>

 



 Comments   
Comment by Gerrit Updater [ 09/Apr/20 ]

Aurelien Degremont (degremoa@amazon.com) uploaded a new patch: https://review.whamcloud.com/38187
Subject: LU-13441 osd-zfs: remove OSD thread info cache size assertion
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 117ab46991ca9b3a300a4662e911822d6fc26af5

Comment by Aurelien Degremont (Inactive) [ 09/Apr/20 ]

The problem was seen with a 2.10 MDS and I could not easily test a `master` filesystem with so many OSTs. Reviewing the related code, it did not really change since so I think the bug is still there.

Comment by Gerrit Updater [ 19/Apr/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38187/
Subject: LU-13441 osd-zfs: remove OSD thread info cache size assertion
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 209f920fcddcf216c9a56239823a6bcb62ed367b

Comment by Peter Jones [ 19/Apr/20 ]

Landed for 2.14

Comment by Aurelien Degremont (Inactive) [ 04/May/20 ]

For the record, I reproduced the crash with Lustre 2.12.4, using official RPMs.

Generated at Sat Feb 10 03:01:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.