[LU-8147] sanity test_208: zfs softlockup in mount error path Created: 16/May/16  Updated: 01/Oct/16  Resolved: 02/Jun/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-8161 sanity-quota test_7a: lockup during m... Resolved
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for John Hammond <john.hammond@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/83f184e2-19d1-11e6-9e5d-5254006e85c2.

The sub-test test_208 failed with the following error:

test failed to respond and timed out

Console logs show a soft lockup in the error path from osd_device_fini():

22:02:16:BUG: soft lockup - CPU#0 stuck for 67s! [mount.lustre:16720]
22:02:16:Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_zfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic libcfs(U) nfsd exportfs autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: llog_test]
22:02:16:CPU 0 
22:02:16:Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_zfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic libcfs(U) nfsd exportfs autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: llog_test]
22:02:16:
22:02:16:Pid: 16720, comm: mount.lustre Tainted: P           -- ------------    2.6.32-573.26.1.el6_lustre.x86_64 #1 Red Hat KVM
22:02:16:RIP: 0010:[<ffffffff8129e8a9>]  [<ffffffff8129e8a9>] __write_lock_failed+0x9/0x20
22:02:16:RSP: 0018:ffff880079e87890  EFLAGS: 00000287
22:02:16:RAX: 0000000000000000 RBX: ffff880079e87898 RCX: ffff88005cadbe58
22:02:16:RDX: 0000000000000000 RSI: ffff88005d588000 RDI: ffff88005d5880d8
22:02:16:RBP: ffffffff8100bc0e R08: dead000000200200 R09: dead000000100100
22:02:16:R10: dead000000200200 R11: 0000000000000000 R12: ffff88005c4d1000
22:02:16:R13: ffff880079e87808 R14: ffffffffa0243e45 R15: ffff880079e87818
22:02:16:FS:  00007f17d7a347a0(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
22:02:16:CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
22:02:16:CR2: 00000030fe2e90c0 CR3: 000000005b53d000 CR4: 00000000000006f0
22:02:16:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
22:02:16:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
22:02:16:Process mount.lustre (pid: 16720, threadinfo ffff880079e84000, task ffff8800796f8ab0)
22:02:16:Stack:
22:02:16: ffffffff8153d007 ffff880079e878e8 ffffffffa0f447b4 ffff880079e87958
22:02:16:<d> ffff88005d588000 ffff880079e87908 ffff88005d588000 ffff880079e87958
22:02:16:<d> 00000000fffffff0 ffff88005cadbb40 ffff880079e87958 ffff880079e87908
22:02:16:Call Trace:
22:02:16: [<ffffffff8153d007>] ? _write_lock+0x17/0x20
22:02:16: [<ffffffffa0f447b4>] ? osd_oi_fini+0x44/0x820 [osd_zfs]
22:02:16: [<ffffffffa0f34c4c>] ? osd_device_fini+0x12c/0x530 [osd_zfs]
22:02:16: [<ffffffffa0f358b0>] ? osd_device_alloc+0x2e0/0x480 [osd_zfs]
22:02:16: [<ffffffffa08a61af>] ? obd_setup+0x1bf/0x290 [obdclass]
22:02:16: [<ffffffffa08a6488>] ? class_setup+0x208/0x870 [obdclass]
22:02:16: [<ffffffffa08af54c>] ? class_process_config+0xc6c/0x1ad0 [obdclass]
22:02:16: [<ffffffff8117904c>] ? __kmalloc+0x21c/0x230
22:02:16: [<ffffffffa08b68ad>] ? do_lcfg+0x61d/0x750 [obdclass]
22:02:16: [<ffffffffa08b6a74>] ? lustre_start_simple+0x94/0x200 [obdclass]
22:02:16: [<ffffffffa08f08d1>] ? server_fill_super+0xfd1/0x1a6a [obdclass]
22:02:16: [<ffffffffa08bb084>] ? lustre_fill_super+0xb64/0x2120 [obdclass]
22:02:16: [<ffffffffa08ba520>] ? lustre_fill_super+0x0/0x2120 [obdclass]
22:02:16: [<ffffffff81195a5f>] ? get_sb_nodev+0x5f/0xa0
22:02:16: [<ffffffffa08b2105>] ? lustre_get_sb+0x25/0x30 [obdclass]
22:02:16: [<ffffffff8119509b>] ? vfs_kern_mount+0x7b/0x1b0
22:02:16: [<ffffffff81195242>] ? do_kern_mount+0x52/0x130
22:02:16: [<ffffffff811a7f82>] ? vfs_ioctl+0x22/0xa0
22:02:16: [<ffffffff811b71db>] ? do_mount+0x2fb/0x930
22:02:16: [<ffffffff811b78a0>] ? sys_mount+0x90/0xe0
22:02:16: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
22:02:16:Code: 00 00 48 8b 5b 20 48 83 eb 07 48 39 d9 73 06 48 89 01 31 c0 c3 b8 f2 ff ff ff c3 90 90 90 90 90 90 90 f0 81 07 00 00 00 01 f3 90 <81> 3f 00 00 00 01 75 f6 f0 81 2f 00 00 00 01 0f 85 e2 ff ff ff 
22:02:16:Call Trace:
22:02:16: [<ffffffff8153d007>] ? _write_lock+0x17/0x20
22:02:16: [<ffffffffa0f447b4>] ? osd_oi_fini+0x44/0x820 [osd_zfs]
22:02:16: [<ffffffffa0f34c4c>] ? osd_device_fini+0x12c/0x530 [osd_zfs]
22:02:16: [<ffffffffa0f358b0>] ? osd_device_alloc+0x2e0/0x480 [osd_zfs]
22:02:16: [<ffffffffa08a61af>] ? obd_setup+0x1bf/0x290 [obdclass]
22:02:16: [<ffffffffa08a6488>] ? class_setup+0x208/0x870 [obdclass]
22:02:16: [<ffffffffa08af54c>] ? class_process_config+0xc6c/0x1ad0 [obdclass]
22:02:16: [<ffffffff8117904c>] ? __kmalloc+0x21c/0x230
22:02:16: [<ffffffffa08b68ad>] ? do_lcfg+0x61d/0x750 [obdclass]
22:02:16: [<ffffffffa08b6a74>] ? lustre_start_simple+0x94/0x200 [obdclass]
22:02:16: [<ffffffffa08f08d1>] ? server_fill_super+0xfd1/0x1a6a [obdclass]
22:02:16: [<ffffffffa08bb084>] ? lustre_fill_super+0xb64/0x2120 [obdclass]
22:02:16: [<ffffffffa08ba520>] ? lustre_fill_super+0x0/0x2120 [obdclass]
22:02:16: [<ffffffff81195a5f>] ? get_sb_nodev+0x5f/0xa0
22:02:16: [<ffffffffa08b2105>] ? lustre_get_sb+0x25/0x30 [obdclass]
22:02:16: [<ffffffff8119509b>] ? vfs_kern_mount+0x7b/0x1b0
22:02:16: [<ffffffff81195242>] ? do_kern_mount+0x52/0x130
22:02:16: [<ffffffff811a7f82>] ? vfs_ioctl+0x22/0xa0
22:02:16: [<ffffffff811b71db>] ? do_mount+0x2fb/0x930
22:02:16: [<ffffffff811b78a0>] ? sys_mount+0x90/0xe0
22:02:16: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b

Please provide additional information about the failure here.

Info required for matching: sanity 208



 Comments   
Comment by Yang Sheng [ 18/May/16 ]

If osd_mount failed and return before osd_oi_init, then osd_ost_seq_fini will be called and trying to grab osl_seq_list_lock. But this lock is not initialized yet.

Comment by Gerrit Updater [ 18/May/16 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/20309
Subject: LU-8147 osd-zfs: fix osd_mount error path
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 275332e1983ac64b194c26d6fa6d4e9e05cd1732

Comment by Gerrit Updater [ 02/Jun/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20309/
Subject: LU-8147 osd-zfs: fix osd_mount error path
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 60270c6488b01db756eb216548f83f2826972854

Comment by Peter Jones [ 02/Jun/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:15:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.