[LU-10615] sanity-scrub: test_10a time out, MDs crashed Created: 06/Feb/18  Updated: 03/Mar/18  Resolved: 03/Mar/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: dne, zfs

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Minh Diep <minh.diep@intel.com>

https://testing.hpdd.intel.com/test_sessions/37708ed7-7a79-4929-8c25-582e7eb81cce

This issue relates to the following test suite run:

https://testing.hpdd.intel.com/test_logs/d7c39a8a-0b13-11e8-a6ad-52540065bddc/show_text

[ 7812.953447] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts
[ 7813.243653] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds3
[ 7816.862431] general protection fault: 0000 [#1] SMP 
[ 7816.863110] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core dm_mod iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd i2c_piix4 joydev pcspkr virtio_balloon i2c_core nfsd parport_pc parport nfs_acl lockd auth_rpcgss grace sunrpc ip_tables ata_generic pata_acpi ext4 mbcache jbd2 virtio_blk ata_piix crct10dif_pclmul crct10dif_common libata 8139too crc32c_intel floppy serio_raw virtio_pci virtio_ring virtio 8139cp mii
[ 7816.872645] CPU: 1 PID: 3099 Comm: OI_scrub Tainted: P           OE  ------------   3.10.0-693.11.6.el7_lustre.x86_64 #1
[ 7816.873669] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 7816.874215] task: ffff880055432f70 ti: ffff880055a80000 task.ti: ffff880055a80000
[ 7816.874926] RIP: 0010:[<ffffffffc0feefb0>]  [<ffffffffc0feefb0>] osd_fld_lookup+0x40/0xc0 [osd_zfs]
[ 7816.875806] RSP: 0018:ffff880055a83c90  EFLAGS: 00010206
[ 7816.876308] RAX: ffff8800361c9040 RBX: ffff8800543de808 RCX: ffff8800543ded40
[ 7816.876979] RDX: 0000000280000401 RSI: 5a5a5a5a5a5a5a5a RDI: ffff880055a83e58
[ 7816.877650] RBP: ffff880055a83cc0 R08: 00000002800003f5 R09: ffff880055a83d20
[ 7816.878323] R10: ffff88007fd1b900 R11: ffffea000188f600 R12: ffff880055a83e58
[ 7816.879001] R13: ffff88005ab6e000 R14: ffff8800543de800 R15: ffff88005ab6e000
[ 7816.879667] FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
[ 7816.880427] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7816.880976] CR2: 00007f7f118103e4 CR3: 00000000019fa000 CR4: 00000000000606e0
[ 7816.881651] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7816.882340] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7816.883018] Call Trace:
[ 7816.883276]  [<ffffffffc0fef1f0>] ? fid_is_on_ost+0x1c0/0x340 [osd_zfs]
[ 7816.883908]  [<ffffffffc0fef820>] osd_get_name_n_idx+0x60/0x4c0 [osd_zfs]
[ 7816.884556]  [<ffffffffc0fefd24>] osd_fid_lookup+0xa4/0x4a0 [osd_zfs]
[ 7816.885174]  [<ffffffffc1000369>] osd_scrub_exec+0xc9/0x960 [osd_zfs]
[ 7816.885801]  [<ffffffffc10035da>] osd_scrub_main+0xb7a/0xf40 [osd_zfs]
[ 7816.886418]  [<ffffffff810c0d30>] ? finish_task_switch+0x50/0x160
[ 7816.887010]  [<ffffffffc1002a60>] ? osd_scrub_next.isra.11+0xb60/0xb60 [osd_zfs]
[ 7816.887709]  [<ffffffff810b252f>] kthread+0xcf/0xe0
[ 7816.888196]  [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
[ 7816.888787]  [<ffffffff816b8798>] ret_from_fork+0x58/0x90
[ 7816.889300]  [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
[ 7816.889884] Code: ff ff ff 49 01 d0 49 39 f0 48 8b 40 78 76 6c 4c 8d 42 f4 be f3 ff ff ff 49 39 f0 77 2e 48 85 c0 74 6d 48 8b 70 10 48 85 f6 74 64 <48> 83 7e 18 00 74 5d 55 83 49 14 03 48 8b 70 10 48 89 e5 e8 48 
[ 7816.892916] RIP  [<ffffffffc0feefb0>] osd_fld_lookup+0x40/0xc0 [osd_zfs]
[ 7816.893573]  RSP <ffff880055a83c90>
[ 7816.893975] ---[ end trace 1ba4cf6a7b189488 ]---
[ 7816.894458] Kernel panic - not syncing: Fatal exception
[ 7816.895316] Kernel Offset:


 Comments   
Comment by Jian Yu [ 08/Feb/18 ]

This failure occurred ten times in one week, which is affecting patch testing on master branch:
https://testing.hpdd.intel.com/test_sets/d10ccb88-0ceb-11e8-a6ad-52540065bddc
https://testing.hpdd.intel.com/test_sets/502c9bb0-0ceb-11e8-a6ad-52540065bddc
https://testing.hpdd.intel.com/test_sets/25e9dcb4-0cdc-11e8-a10a-52540065bddc

Comment by Gerrit Updater [ 09/Feb/18 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/31241
Subject: LU-10615 osd: stop OI scrub before FLDB closed
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a673f6cd0df06a4e2eadef796b9231abcabb99dd

Comment by Gerrit Updater [ 03/Mar/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31241/
Subject: LU-10615 osd: stop OI scrub before FLDB closed
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 54fc9a642d90f102bacd0d79e4b81fd534eb26b5

Generated at Sat Feb 10 02:36:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.