[LU-10297] parallel-scale-nfsv4 test_metabench: ASSERTION( nfound <= inuse->op_count ) failed Created: 29/Nov/17  Updated: 02/Feb/18  Resolved: 11/Dec/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.10.2
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/bbaab274-d4fa-11e7-9c63-52540065bddc.

The sub-test test_metabench failed with the following error:

Timeout occurred after 347 mins, last suite running was parallel-scale-nfsv4, restarting cluster to continue tests

2.10.2 RC1 EL7.4 zfs
MDS crash

[13488.166260] Lustre: DEBUG MARKER: dmesg
[13488.563678] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale-nfsv4 test metabench: metabench ==================================================== 09:11:38 \(1511946698\)
[13488.724793] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test metabench: metabench ==================================================== 09:11:38 (1511946698)
[13641.720773] LustreError: 1187:0:(lod_qos.c:1624:lod_alloc_qos()) ASSERTION( nfound <= inuse->op_count ) failed: nfound:3, op_count:0
[13641.721991] LustreError: 1187:0:(lod_qos.c:1624:lod_alloc_qos()) LBUG
[13641.722604] Pid: 1187, comm: mdt00_011
[13641.722980] 
[13641.722980] Call Trace:
[13641.723386]  [<ffffffffc05c27ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
[13641.724029]  [<ffffffffc05c283c>] lbug_with_loc+0x4c/0xb0 [libcfs]
[13641.724636]  [<ffffffffc125c342>] lod_alloc_qos.constprop.17+0x1582/0x1590 [lod]
[13641.725507]  [<ffffffffc125efe1>] lod_qos_prep_create+0x1291/0x17f0 [lod]
[13641.726161]  [<ffffffffc062c4b6>] ? nvlist_lookup_byte_array+0x26/0x30 [znvpair]
[13641.726921]  [<ffffffffc0fd1cc9>] ? __osd_xattr_get+0xa9/0x210 [osd_zfs]
[13641.727558]  [<ffffffffc0fd0007>] ? osd_fid_lookup+0x47/0x3a0 [osd_zfs]
[13641.728196]  [<ffffffffc125fab8>] lod_prepare_create+0x298/0x3f0 [lod]
[13641.728900]  [<ffffffffc125463e>] lod_declare_striped_create+0x1ee/0x970 [lod]
[13641.729620]  [<ffffffffc077dbc5>] ? sa_object_size+0x15/0x20 [zfs]
[13641.730220]  [<ffffffffc12583d1>] lod_declare_xattr_set+0x221/0xe40 [lod]
[13641.730953]  [<ffffffffc12b2d97>] mdd_create_data+0x487/0x720 [mdd]
[13641.731568]  [<ffffffffc1186f8a>] mdt_mfd_open+0xc5a/0xe70 [mdt]
[13641.732152]  [<ffffffffc118771b>] mdt_finish_open+0x57b/0x690 [mdt]
[13641.732832]  [<ffffffffc1188fcc>] mdt_reint_open+0x179c/0x31a0 [mdt]
[13641.733464]  [<ffffffffc0bbb717>] ? upcall_cache_get_entry+0x3f7/0x8f0 [obdclass]
[13641.734187]  [<ffffffffc0bc042e>] ? lu_ucred+0x1e/0x30 [obdclass]
[13641.734853]  [<ffffffffc116e925>] ? mdt_ucred+0x15/0x20 [mdt]
[13641.735398]  [<ffffffffc116f1f1>] ? mdt_root_squash+0x21/0x430 [mdt]
[13641.736035]  [<ffffffffc117e8a0>] mdt_reint_rec+0x80/0x210 [mdt]
[13641.736676]  [<ffffffffc116030b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
[13641.737313]  [<ffffffffc1160832>] mdt_intent_reint+0x162/0x430 [mdt]
[13641.737993]  [<ffffffffc116b59e>] mdt_intent_policy+0x43e/0xc70 [mdt]
[13641.738644]  [<ffffffffc0d312b7>] ldlm_lock_enqueue+0x387/0x970 [ptlrpc]
[13641.739318]  [<ffffffffc0d5ad03>] ldlm_handle_enqueue0+0x9c3/0x1680 [ptlrpc]
[13641.740102]  [<ffffffffc0d82ee0>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc]
[13641.740847]  [<ffffffffc0de01d2>] tgt_enqueue+0x62/0x210 [ptlrpc]
[13641.741488]  [<ffffffffc0de40d5>] tgt_request_handle+0x925/0x1370 [ptlrpc]
[13641.742174]  [<ffffffffc0d8cf16>] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc]
[13641.742975]  [<ffffffff810ba598>] ? __wake_up_common+0x58/0x90
[13641.743565]  [<ffffffffc0d90652>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[13641.744168]  [<ffffffff81029557>] ? __switch_to+0xd7/0x510
[13641.744770]  [<ffffffff816a9000>] ? __schedule+0x370/0x8b0
[13641.745310]  [<ffffffffc0d8fbc0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
[13641.745927]  [<ffffffff810b099f>] kthread+0xcf/0xe0
[13641.746448]  [<ffffffff810b08d0>] ? kthread+0x0/0xe0
[13641.746936]  [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90
[13641.747498]  [<ffffffff810b08d0>] ? kthread+0x0/0xe0
[13641.747980] 
[13641.748184] Kernel panic - not syncing: LBUG
[13641.748599] CPU: 0 PID: 1187 Comm: mdt00_011 Tainted: P           OE  ------------   3.10.0-693.5.2.el7_lustre.x86_64 #1
[13641.749598] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[13641.750132]  ffff88005a2f2f00 00000000d5284124 ffff88004dbbf558 ffffffff816a3e2d
[13641.750897]  ffff88004dbbf5d8 ffffffff8169dd14 ffffffff00000008 ffff88004dbbf5e8
[13641.751664]  ffff88004dbbf588 00000000d5284124 00000000d5284124 0000000000000246
[13641.752429] Call Trace:
[13641.752678]  [<ffffffff816a3e2d>] dump_stack+0x19/0x1b
[13641.753161]  [<ffffffff8169dd14>] panic+0xe8/0x20d
[13641.753619]  [<ffffffffc05c2854>] lbug_with_loc+0x64/0xb0 [libcfs]
[13641.754206]  [<ffffffffc125c342>] lod_alloc_qos.constprop.17+0x1582/0x1590 [lod]
[13641.754914]  [<ffffffffc125efe1>] lod_qos_prep_create+0x1291/0x17f0 [lod]
[13641.755559]  [<ffffffffc062c4b6>] ? nvlist_lookup_byte_array+0x26/0x30 [znvpair]
[13641.756260]  [<ffffffffc0fd1cc9>] ? __osd_xattr_get+0xa9/0x210 [osd_zfs]
[13641.756892]  [<ffffffffc0fd0007>] ? osd_fid_lookup+0x47/0x3a0 [osd_zfs]
[13641.757517]  [<ffffffffc125fab8>] lod_prepare_create+0x298/0x3f0 [lod]
[13641.758127]  [<ffffffffc125463e>] lod_declare_striped_create+0x1ee/0x970 [lod]
[13641.758822]  [<ffffffffc077dbc5>] ? sa_object_size+0x15/0x20 [zfs]
[13641.759407]  [<ffffffffc12583d1>] lod_declare_xattr_set+0x221/0xe40 [lod]
[13641.760048]  [<ffffffffc12b2d97>] mdd_create_data+0x487/0x720 [mdd]
[13641.760650]  [<ffffffffc1186f8a>] mdt_mfd_open+0xc5a/0xe70 [mdt]
[13641.761217]  [<ffffffffc118771b>] mdt_finish_open+0x57b/0x690 [mdt]
[13641.761818]  [<ffffffffc1188fcc>] mdt_reint_open+0x179c/0x31a0 [mdt]
[13641.762425]  [<ffffffffc0bbb717>] ? upcall_cache_get_entry+0x3f7/0x8f0 [obdclass]
[13641.763143]  [<ffffffffc0bc042e>] ? lu_ucred+0x1e/0x30 [obdclass]
[13641.763723]  [<ffffffffc116e925>] ? mdt_ucred+0x15/0x20 [mdt]
[13641.764266]  [<ffffffffc116f1f1>] ? mdt_root_squash+0x21/0x430 [mdt]
[13641.764879]  [<ffffffffc117e8a0>] mdt_reint_rec+0x80/0x210 [mdt]
[13641.765448]  [<ffffffffc116030b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
[13641.766062]  [<ffffffffc1160832>] mdt_intent_reint+0x162/0x430 [mdt]
[13641.766662]  [<ffffffffc116b59e>] mdt_intent_policy+0x43e/0xc70 [mdt]
[13641.767285]  [<ffffffffc0d312b7>] ldlm_lock_enqueue+0x387/0x970 [ptlrpc]
[13641.767938]  [<ffffffffc0d5ad03>] ldlm_handle_enqueue0+0x9c3/0x1680 [ptlrpc]
[13641.768628]  [<ffffffffc0d82ee0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
[13641.769359]  [<ffffffffc0de01d2>] tgt_enqueue+0x62/0x210 [ptlrpc]
[13641.769961]  [<ffffffffc0de40d5>] tgt_request_handle+0x925/0x1370 [ptlrpc]
[13641.770634]  [<ffffffffc0d8cf16>] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc]
[13641.771346]  [<ffffffff810ba598>] ? __wake_up_common+0x58/0x90
[13641.771925]  [<ffffffffc0d90652>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[13641.772515]  [<ffffffff81029557>] ? __switch_to+0xd7/0x510
[13641.773028]  [<ffffffff816a9000>] ? __schedule+0x370/0x8b0
[13641.773567]  [<ffffffffc0d8fbc0>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
[13641.774255]  [<ffffffff810b099f>] kthread+0xcf/0xe0
[13641.774718]  [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40
[13641.775281]  [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90
[13641.775790]  [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.10.0-693.5.2.el7_lustre.x86_64 (jenkins@trevis-310-el7-x8664-2.trevis.hpdd.intel.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Nov 27 15:30:51 UTC 2017
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.5.2.el7_lustre.x86_64 root=UUID=d6ed6b92-d297-4df2-adc0-6d2fa1ffd5ed ro console=tty0 LANG=en_US.UTF-8 console=ttyS0,115200 net.ifnames=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0 elfcorehdr=867700K
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000001000-0x000000000009f7ff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009f800-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000021000000-0x0000000034f5cfff] usable
[    0.000000] BIOS-e820: [mem 0x0000000034fff800-0x0000000034ffffff] usable
[    0.000000] BIOS-e820: [mem 0x000000007fffa000-0x000000007fffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.4 present.
[    0.000000] Hypervisor detected: KVM


 Comments   
Comment by Peter Jones [ 30/Nov/17 ]

Bobijam

Can you please advise on this one?

Thanks

Peter

Comment by Gerrit Updater [ 01/Dec/17 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/30334
Subject: LU-10297 lod: prepare inuse array always
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bb2ac352282ff26ca13b4daa88b16aa4a8c1310b

Comment by Gerrit Updater [ 11/Dec/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30334/
Subject: LU-10297 lod: prepare inuse array always
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2cca16ba7da4070c319cc6230b7dbb3e7a91e323

Comment by Peter Jones [ 11/Dec/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 12/Dec/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30486
Subject: LU-10297 lod: prepare inuse array always
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 7585ab8889202aa1f7c81f6bb83b4d8c734b121a

Comment by Gerrit Updater [ 02/Feb/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30486/
Subject: LU-10297 lod: prepare inuse array always
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: aaa6b54a34da23ea27154aba69f8c90210ae7784

Generated at Sat Feb 10 02:33:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.