[LU-10209] conf-sanity test 41c crashes Created: 08/Nov/17  Updated: 14/Jan/18  Resolved: 14/Jan/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Related
is related to LU-4134 obdfilter-suvery bugs and panics (ioc... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I am trying to add conf-sanity to my test rig and it always crashes on test 41c:

[15456.696633] LustreError: 5866:0:(fail.c:136:cfs_race()) cfs_race id 716 sleeping
[15456.697377] LustreError: 5865:0:(fail.c:141:cfs_race()) cfs_fail_race id 716 awaking
[15456.698956] LustreError: 5866:0:(fail.c:139:cfs_race()) cfs_fail_race id 716 awake, rc=0
[15456.700216] LustreError: 5866:0:(obd_mount_server.c:1829:server_fill_super()) Unable to start osd on /dev/loop1: -114
[15456.701561] LustreError: 5866:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount  (-114)
[15456.701870] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[15456.730622] Lustre: MGS: Connection restored to MGC192.168.10.226@tcp_0 (at 0@lo)
[15456.796043] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180
[15457.025866] Lustre: DEBUG MARKER: centos6-16.localnet: executing lsmod
[15457.181162] LustreError: 6060:0:(genops.c:489:class_register_device()) lustre-OST0000-osd: already exists, won't add
[15457.182084] LustreError: 6060:0:(genops.c:415:class_free_dev()) ASSERTION( obd->obd_magic == OBD_DEVICE_MAGIC ) failed: ffff88007dc98300 obd_magic 6b6b6b6b != ab5cd6ef
[15457.183133] LDISKFS-fs (loop2): file extents enabled, maximum tree depth=5
[15457.183899] LustreError: 6060:0:(genops.c:415:class_free_dev()) LBUG
[15457.184445] Pid: 6060, comm: mount.lustre
[15457.184466] LDISKFS-fs (loop2): mounted filesystem with ordered data mode. Opts: ,errors=remount-ro,no_mbcache,nodelalloc
[15457.185769]
Call Trace:
[15457.195156]  [<ffffffffa02517fe>] libcfs_call_trace+0x4e/0x60 [libcfs]
[15457.195548]  [<ffffffffa025188c>] lbug_with_loc+0x4c/0xb0 [libcfs]
[15457.195956]  [<ffffffffa036ee01>] class_free_dev+0x621/0x660 [obdclass]
[15457.196346]  [<ffffffffa036f060>] ? class_export_put+0x220/0x2f0 [obdclass]
[15457.196748]  [<ffffffffa0370b85>] ? class_unlink_export+0x135/0x170 [obdclass]
[15457.197452]  [<ffffffffa03890bc>] class_attach+0x46c/0x6e0 [obdclass]
[15457.197843]  [<ffffffffa038bc18>] class_process_config+0xbd8/0x28a0 [obdclass]
[15457.198536]  [<ffffffff811cd519>] ? __kmalloc+0x649/0x660
[15457.198926]  [<ffffffffa025eca7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[15457.199317]  [<ffffffffa0390508>] do_lcfg+0x258/0x4b0 [obdclass]
[15457.199707]  [<ffffffffa0394541>] lustre_start_simple+0x61/0x210 [obdclass]
[15457.200119]  [<ffffffffa03c0114>] server_fill_super+0xe94/0x1661 [obdclass]
[15457.200515]  [<ffffffffa025eca7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[15457.200915]  [<ffffffffa03980a0>] lustre_fill_super+0x3d0/0x8b0 [obdclass]
[15457.201309]  [<ffffffffa0397cd0>] ? lustre_fill_super+0x0/0x8b0 [obdclass]
[15457.201701]  [<ffffffff811f0edd>] mount_nodev+0x4d/0xb0
[15457.202082]  [<ffffffffa0390238>] lustre_mount+0x38/0x60 [obdclass]
[15457.202469]  [<ffffffff811f18b9>] mount_fs+0x39/0x1b0
[15457.202832]  [<ffffffff8120e863>] vfs_kern_mount+0x63/0xf0
[15457.203207]  [<ffffffff81210ede>] do_mount+0x24e/0xa40
[15457.203578]  [<ffffffff8117681e>] ? __get_free_pages+0xe/0x50
[15457.204002]  [<ffffffff81211766>] SyS_mount+0x96/0xf0
[15457.204365]  [<ffffffff8170fc89>] system_call_fastpath+0x16/0x1b
[15457.204788]
[15457.205142] Kernel panic - not syncing: LBUG


 Comments   
Comment by Bruno Faccini (Inactive) [ 08/Nov/17 ]

Could recent landing of "LU-4134 obdclass: obd_device improvement" have introduced some regression during concurrent mounts/starts of same target ?

And to be fixed by more recent patch "LU-4134 obdclass: fix double free in failure path" from Yang Sheng ?

Comment by Oleg Drokin [ 08/Nov/17 ]

I only started this testing yesterday so no idea, but I can try that patch, thanks.

Comment by Oleg Drokin [ 08/Nov/17 ]

Ok, just tried the patch, it does appear to be helping

Comment by Oleg Drokin [ 14/Jan/18 ]

Duplicate of LU-4134

Generated at Sat Feb 10 02:33:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.