[LU-13796] MDT: Crash after failover: ASSERTION( attr->la_valid & LA_TYPE ) Created: 17/Jul/20  Updated: 08/Apr/21  Resolved: 08/Apr/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.5
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Robert Redl Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Centos 7, 3.10.0-1127.8.2.el7_lustre.x86_64, ZFS


Issue Links:
Duplicate
duplicates LU-14039 Set LA_TYPE while working on osp-mdt ... Resolved
Related
is related to LU-14039 Set LA_TYPE while working on osp-mdt ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Immediately after a failover from one to node to the other (manually performed using pcs resource move) the new node crashed:

 

[82973.263903] Lustre: meteo0-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
[82973.263907] Lustre: Skipped 1 previous similar message
[82973.273770] LustreError: 28190:0:(osp_md_object.c:167:osp_md_create()) ASSERTION( attr->la_valid & LA_TYPE ) failed: 
[82973.275269] LustreError: 28190:0:(osp_md_object.c:167:osp_md_create()) LBUG
[82973.276678] Pid: 28190, comm: lod0000_rec0001 3.10.0-1127.8.2.el7_lustre.x86_64 #1 SMP Mon Jun 8 13:48:45 UTC 2020
[82973.276679] Call Trace:
[82973.276686]  [<ffffffffc10487cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[82973.276698]  [<ffffffffc104887c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[82973.276703]  [<ffffffffc1995b7a>] osp_md_create+0x42a/0x470 [osp]
[82973.276715]  [<ffffffffc1151334>] llog_osd_get_cat_list+0x8d4/0xbd0 [obdclass]
[82973.276740]  [<ffffffffc18ca359>] lod_sub_prep_llog+0xb9/0x783 [lod]
[82973.276758]  [<ffffffffc188f82b>] lod_sub_recovery_thread+0x1cb/0xc80 [lod]
[82973.276764]  [<ffffffff9f2c6691>] kthread+0xd1/0xe0
[82973.276769]  [<ffffffff9f992d1d>] ret_from_fork_nospec_begin+0x7/0x21
[82973.276774]  [<ffffffffffffffff>] 0xffffffffffffffff
[82973.276793] Kernel panic - not syncing: LBUG
[82973.278196] CPU: 29 PID: 28190 Comm: lod0000_rec0001 Kdump: loaded Tainted: P           OE  ------------   3.10.0-1127.8.2.el7_lustre.x86_64 #1
[82973.281009] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 3.1 09/14/2018
[82973.282404] Call Trace:
[82973.282990] Lustre: meteo0-MDT0000: in recovery but waiting for the first client to connect
[82973.282992] Lustre: Skipped 1 previous similar message
[82973.286683]  [<ffffffff9f97ffa5>] dump_stack+0x19/0x1b
[82973.288065]  [<ffffffff9f979541>] panic+0xe8/0x21f
[82973.289403]  [<ffffffffc10488cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[82973.290714]  [<ffffffffc1995b7a>] osp_md_create+0x42a/0x470 [osp]
[82973.292031]  [<ffffffffc1151334>] llog_osd_get_cat_list+0x8d4/0xbd0 [obdclass]
[82973.293333]  [<ffffffffc18ca359>] lod_sub_prep_llog+0xb9/0x783 [lod]
[82973.294630]  [<ffffffffc118de6c>] ? keys_fill+0xfc/0x180 [obdclass]
[82973.295895]  [<ffffffffc188f82b>] lod_sub_recovery_thread+0x1cb/0xc80 [lod]
[82973.297134]  [<ffffffffc188f660>] ? lod_obd_get_info+0x9d0/0x9d0 [lod]
[82973.298364]  [<ffffffff9f2c6691>] kthread+0xd1/0xe0
[82973.299582]  [<ffffffff9f2c65c0>] ? insert_kthread_work+0x40/0x40
[82973.300792]  [<ffffffff9f992d1d>] ret_from_fork_nospec_begin+0x7/0x21
[82973.301984]  [<ffffffff9f2c65c0>] ? insert_kthread_work+0x40/0x40



 Comments   
Comment by Andreas Dilger [ 08/Apr/21 ]

This was fixed in 2.14 via patch https://review.whamcloud.com/40655 "LU-14039 obdclass: set LA_TYPE when update_log init".

Generated at Sat Feb 10 03:04:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.