[LU-16648] sanity test_27M: crashed in lod_statfs_and_check() Created: 18/Mar/23  Updated: 07/Jul/23  Resolved: 07/Jul/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Vitaliy Kuznetsov
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-16872 sanity: test_27M Error: '(5) stripe c... Resolved
Related
is related to LU-16872 sanity: test_27M Error: '(5) stripe c... Resolved
is related to LU-16623 lod_statfs_and_check() does not skip ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Serguei Smirnov <ssmirnov@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4c631a9e-c248-43ee-b43b-10a84e86d158

test_27M failed with the following error:

onyx-75vm4 crashed during sanity test_27M

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/93105 - 4.18.0-372.32.1.el8_6.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/93105 - 4.18.0-372.32.1.el8_lustre.x86_64

BUG: unable to handle kernel paging request at 0000000000080000
Oops: 0000 [#1] SMP PTI
CPU: 1 PID: 10173 Comm: mdt00_002 4.18.0-425.10.1.el8_lustre.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:lod_statfs_and_check+0x87/0x5e0 [lod]
Call Trace:
  lod_qos_prep_create+0xfe0/0x1320 [lod]
  lod_prepare_create+0x231/0x320 [lod]
  lod_declare_striped_create+0x291/0x920 [lod]
  lod_declare_create+0x27c/0x530 [lod]
  mdd_declare_create_object_internal+0xcd/0x370 [mdd]
  mdd_declare_create_object.isra.37+0x49/0x880 [mdd]
  mdd_declare_create+0x72/0x490 [mdd]
  mdd_create+0x8a2/0x1a30 [mdd]
  mdt_reint_open+0x2d20/0x3180 [mdt]
  mdt_reint_rec+0x11f/0x270 [mdt]
  mdt_reint_internal+0x4d3/0x7f0 [mdt]
  mdt_intent_open+0x13b/0x420 [mdt]
  mdt_intent_opc+0x12c/0xbf0 [mdt]
  mdt_intent_policy+0x20b/0x3a0 [mdt]
  ldlm_lock_enqueue+0x47f/0xb20 [ptlrpc]
  ldlm_handle_enqueue0+0x634/0x1760 [ptlrpc]
  tgt_enqueue+0xa4/0x220 [ptlrpc]
  tgt_request_handle+0xcc3/0x1920 [ptlrpc]
  ptlrpc_server_handle_request+0x31d/0xbc0 [ptlrpc]
  ptlrpc_main+0xc52/0x1510 [ptlrpc]
  kthread+0x10b/0x130

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_27M - onyx-75vm4 crashed during sanity test_27M



 Comments   
Comment by Minh Diep [ 19/Mar/23 ]

is this same as LU-16014?

Comment by Andreas Dilger [ 03/Apr/23 ]

I don't think this is exactly the same as LU-16014, since the actual RIP pointer is in a different function (lod_statfs_and_check+0x87 vs. lod_qos_prep_create+0xe96).

Comment by Andreas Dilger [ 03/Apr/23 ]

I've also seen one failure with "lod_statfs_and_check()) ASSERTION( tgt ) failed" (master, 6 days ago):
https://testing.whamcloud.com/test_sets/4c631a9e-c248-43ee-b43b-10a84e86d158

Comment by Andreas Dilger [ 05/Apr/23 ]

Excluding the above single failure with ASSERTION(tgt), the first similar failure was patch https://review.whamcloud.com/45822 "LU-14692 osp: deprecate IDIF sequence for MDT0000" on 2023-03-15 before that patch landed on 2023-03-21, then the other patches started hitting that failure, so it is very possible that this patch was the origin of this crash.

There was also patch https://review.whamcloud.com/50074 "LU-16501 lod: add qos_ost_weights to debugfs" that landed on 2023-03-21 and touched code in lustre/lod, but that doesn't explain 45822 crashing before that patch landed.

Comment by Andreas Dilger [ 07/Jul/23 ]

It looks like this is fixed by the patch in LU-16872.

Generated at Sat Feb 10 03:28:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.