[LU-7262] Crash in lu_device_fini Created: 07/Oct/15  Updated: 28/Feb/20  Resolved: 28/Feb/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Just had this crash on latest master snapshot:

<4>[58151.746360] Lustre: DEBUG MARKER: test_26 fail mds1 8 times
<3>[58152.780769] LustreError: 29492:0:(osp_sync.c:1034:osp_sync_process_committed()) @@@ imp_committed = 12884904729  req@ffff8800251f9cd8 x1514339512271556/t12884904816(12884904816) o6->lustre-OST0000-osc-MDT0000@0@lo:28/4 lens 664/400 e 0 to 0 dl 1444188185 ref 1 fl Complete:R/4/0 rc 0/0
<3>[58152.783763] LustreError: 29492:0:(osp_sync.c:1034:osp_sync_process_committed()) Skipped 116 previous similar messages
<6>[58152.980833] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
<1>[58152.982437] BUG: unable to handle kernel paging request at ffff88001d946028
<1>[58152.983076] IP: [<ffffffffa11f52d9>] lu_device_fini+0x9/0xc0 [obdclass]
<4>[58152.983717] PGD 1a2e063 PUD 1a32063 PMD 1ec067 PTE 1d946060
<4>[58152.984355] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
<3>[58152.985076] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
<4>[58152.986404] last sysfs file: /sys/devices/system/cpu/possible
<4>[58152.986404] CPU 5 
<4>[58152.986404] Modules linked in: lustre ofd osp lod ost mdt mdd mgs osd_ldiskfs ldiskfs lquota lfsck obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 jbd2 mbcache virtio_console i2c_piix4 i2c_core virtio_balloon virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs]
<4>[58152.986404] 
<4>[58152.986404] Pid: 10054, comm: obd_zombid Not tainted 2.6.32-rhe6.7-debug #1 Red Hat KVM
<4>[58152.986404] RIP: 0010:[<ffffffffa11f52d9>]  [<ffffffffa11f52d9>] lu_device_fini+0x9/0xc0 [obdclass]
<4>[58152.986404] RSP: 0018:ffff880069697d80  EFLAGS: 00010282
<4>[58152.986404] RAX: 0000000000000001 RBX: ffff88001d946000 RCX: 0000000000000000
<4>[58152.986404] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88001d946000
<4>[58152.986404] RBP: ffff880069697d80 R08: 0000000000000b7c R09: 0000000000000000
<4>[58152.986404] R10: 0000000000000001 R11: 000000000000000f R12: ffff88001d946000
<4>[58152.986404] R13: ffffffffa09735c0 R14: ffff880069697db0 R15: ffff880090b01ad0
<4>[58152.986404] FS:  0000000000000000(0000) GS:ffff880006340000(0000) knlGS:0000000000000000
<4>[58152.986404] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>[58152.986404] CR2: ffff88001d946028 CR3: 0000000001a2d000 CR4: 00000000000006e0
<4>[58152.986404] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[58152.986404] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[58152.986404] Process obd_zombid (pid: 10054, threadinfo ffff880069694000, task ffff8800afb221c0)
<4>[58152.986404] Stack:
<4>[58152.986404]  ffff880069697da0 ffffffffa090cd2f 0000000000000073 ffff8800683fa100
<4>[58152.986404] <d> ffff880069697e10 ffffffffa11debdd 0000000210000001 0000000000000000
<4>[58152.986404] <d> ffff88004448fdf0 ffff880069697dc8 ffff880069697dc8 00000000000001c6
<4>[58152.986404] Call Trace:
<4>[58152.986404]  [<ffffffffa090cd2f>] mdt_device_free+0x2f/0x290 [mdt]
<4>[58152.986404]  [<ffffffffa11debdd>] class_decref+0x3dd/0x4c0 [obdclass]
<4>[58152.986404]  [<ffffffffa11c9d4e>] obd_zombie_impexp_cull+0x5ae/0x9f0 [obdclass]
<4>[58152.986404]  [<ffffffffa11ca1f5>] obd_zombie_impexp_thread+0x65/0x190 [obdclass]
<4>[58152.986404]  [<ffffffff81063a80>] ? default_wake_function+0x0/0x20
<4>[58152.986404]  [<ffffffffa11ca190>] ? obd_zombie_impexp_thread+0x0/0x190 [obdclass]
<4>[58152.986404]  [<ffffffff8109f82e>] kthread+0x9e/0xc0
<4>[58152.986404]  [<ffffffff8100c2ca>] child_rip+0xa/0x20
<4>[58152.986404]  [<ffffffff8109f790>] ? kthread+0x0/0xc0
<4>[58152.986404]  [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
<4>[58152.986404] Code: a1 31 c0 c7 05 dd df 05 00 00 00 04 00 e8 a0 b8 64 ff 48 c7 c7 80 32 25 a1 e8 74 fb 63 ff 0f 1f 40 00 55 48 89 e5 0f 1f 44 00 00 <48> 8b 47 28 48 8b 57 08 48 85 c0 74 10 48 c7 40 10 00 00 00 00 
<1>[58152.986404] RIP  [<ffffffffa11f52d9>] lu_device_fini+0x9/0xc0 [obdclass]
<4>[58152.986404]  RSP <ffff880069697d80>
<4>[58152.986404] CR2: ffff88001d946028
(gdb) l *(lu_device_fini+0x9)
0x50309 is in lu_device_fini (/home/green/git/lustre-release/lustre/obdclass/lu_object.c:1217).
1212	 */
1213	void lu_device_fini(struct lu_device *d)
1214	{
1215		struct lu_device_type *t = d->ld_type;
1216	
1217		if (d->ld_obd != NULL) {
1218			d->ld_obd->obd_lu_dev = NULL;
1219			d->ld_obd = NULL;
1220		}
1221	

Crashdump is in /exports/crashdumps/192.168.10.222-2015-10-06-23\:22\:23/



 Comments   
Comment by nasf (Inactive) [ 11/Nov/15 ]

Similar trouble on master:
https://testing.hpdd.intel.com/test_sets/c69ac87a-8783-11e5-b56a-5254006e85c2

Comment by Andreas Dilger [ 28/Feb/20 ]

Close old bug that hasn't been seen in a long time.

Generated at Sat Feb 10 02:07:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.