[LU-3368] Interop 2.3.0<->2.4 sanity-scrub test_3: BUG: unable to handle kernel paging request Created: 20/May/13  Updated: 14/Aug/16  Resolved: 14/Aug/16

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.4.1
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Alex Zhuravlev
Resolution: Won't Fix Votes: 0
Labels: yuc2
Environment:

server: lustre-master tag-2.4.50RC1
client: 2.3.0


Severity: 3
Rank (Obsolete): 8333

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/ccfe0fd8-bf03-11e2-a1b0-52540035b04c.

The sub-test test_3 failed with the following error:

test failed to respond and timed out

11:35:20:Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt
11:35:20:Lustre: Skipped 1 previous similar message
11:35:20:BUG: unable to handle kernel paging request at ffff8808660aae40
11:35:21:IP: [<ffffffffa0a164db>] alloc_qos+0x87b/0x2190 [lov]
11:35:21:PGD 1a86063 PUD 0 
11:35:21:Oops: 0000 [#1] SMP 
11:35:21:last sysfs file: /sys/devices/system/cpu/possible
11:35:21:CPU 0 
11:35:21:Modules linked in: cmm(U) osd_ldiskfs(U) mdt(U) mdd(U) mds(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) lustre(U) lquota(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) jbd2 nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
11:35:21:
11:35:21:Pid: 27919, comm: mdt00_002 Not tainted 2.6.32-279.5.1.el6_lustre.gb16fe80.x86_64 #1 Red Hat KVM
11:35:21:RIP: 0010:[<ffffffffa0a164db>]  [<ffffffffa0a164db>] alloc_qos+0x87b/0x2190 [lov]
11:35:21:RSP: 0018:ffff88007c4e1510  EFLAGS: 00010293
11:35:21:RAX: ffff8800660e6e40 RBX: ffff8800648ba948 RCX: ffff88005fc64c60
11:35:21:RDX: 0000000000000005 RSI: ffff880054ccf8f8 RDI: ffff8800648ba918
11:35:21:RBP: ffff88007c4e15c0 R08: 00000000ffff8800 R09: 0000000000000001
11:35:22:R10: ffff8800648ba7f8 R11: 0000000000000005 R12: 0000000000000000
11:35:22:R13: 0000000000000005 R14: 00000000ffff8800 R15: ffff8800648ba798
11:35:22:FS:  00007fef41ad4700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
11:35:22:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
11:35:22:CR2: ffff8808660aae40 CR3: 000000007cf52000 CR4: 00000000000006f0
11:35:23:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
11:35:23:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
11:35:23:Process mdt00_002 (pid: 27919, threadinfo ffff88007c4e0000, task ffff8800513e8aa0)
11:35:23:Stack:
11:35:23: ffff88007c4e15e0 ffffffff8127a72e 0000000000000000 ffff8800648ba7f8
11:35:23:<d> ffffffffffffffff ffff88007c4e1560 0000000000000001 0000000000000246
11:35:24:<d> ffff88005fc64c60 0000000500000050 ffff8800648ba918 ffff88007c4e162c
11:35:24:Call Trace:
11:35:24: [<ffffffff8127a72e>] ? number+0x2ee/0x320
11:35:24: [<ffffffffa0a17f20>] alloc_idx_array+0x130/0xdf0 [lov]
11:35:24: [<ffffffffa0a19b14>] qos_prep_create+0xf4/0x1600 [lov]
11:35:24: [<ffffffffa0a13aba>] lov_prep_create_set+0xea/0x390 [lov]
11:35:24: [<ffffffffa09fa78c>] lov_create+0x1ac/0x1410 [lov]
11:35:24: [<ffffffffa0d02bdb>] ? osd_object_read_unlock+0x9b/0xe0 [osd_ldiskfs]
11:35:24: [<ffffffffa0c13f06>] ? mdd_read_unlock+0x26/0x30 [mdd]
11:35:24: [<ffffffffa0bf8a8c>] mdd_lov_create+0xd0c/0x21c0 [mdd]
11:35:24: [<ffffffffa0c06a0d>] mdd_create+0xdfd/0x2180 [mdd]
11:35:24: [<ffffffffa04f4952>] ? cfs_hash_bd_from_key+0x42/0xe0 [libcfs]
11:35:24: [<ffffffffa04f42f9>] ? cfs_hash_bd_add_locked+0x29/0x90 [libcfs]
11:35:24: [<ffffffffa0cffb4f>] ? osd_xattr_get+0x9f/0x350 [osd_ldiskfs]
11:35:24: [<ffffffffa0914637>] cml_create+0x97/0x250 [cmm]
11:35:24: [<ffffffffa0c70ddf>] ? mdt_version_get_save+0x8f/0xd0 [mdt]
11:35:24: [<ffffffffa0c84b9f>] mdt_reint_open+0x108f/0x18a0 [mdt]
11:35:24: [<ffffffffa0c0d2be>] ? md_ucred+0x1e/0x60 [mdd]
11:35:24: [<ffffffffa0c52235>] ? mdt_ucred+0x15/0x20 [mdt]
11:35:24: [<ffffffffa0c6e151>] mdt_reint_rec+0x41/0xe0 [mdt]
11:35:24: [<ffffffffa0c679aa>] mdt_reint_internal+0x50a/0x810 [mdt]
11:35:24: [<ffffffffa0c67f7d>] mdt_intent_reint+0x1ed/0x500 [mdt]
11:35:24: [<ffffffffa0c64191>] mdt_intent_policy+0x371/0x6a0 [mdt]
11:35:24: [<ffffffffa0792881>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
11:35:24: [<ffffffffa07ba9bf>] ldlm_handle_enqueue0+0x48f/0xf70 [ptlrpc]
11:35:24: [<ffffffffa0c64506>] mdt_enqueue+0x46/0x130 [mdt]
11:35:24: [<ffffffffa0c5b802>] mdt_handle_common+0x922/0x1740 [mdt]
11:35:24: [<ffffffffa0c5c6f5>] mdt_regular_handle+0x15/0x20 [mdt]
11:35:24: [<ffffffffa07eab3c>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
11:35:24: [<ffffffffa04df65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
11:35:24: [<ffffffffa07e1f37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
11:35:24: [<ffffffff810533f3>] ? __wake_up+0x53/0x70
11:35:24: [<ffffffffa07ec111>] ptlrpc_main+0xbf1/0x19e0 [ptlrpc]
11:35:24: [<ffffffffa07eb520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
11:35:24: [<ffffffff8100c14a>] child_rip+0xa/0x20
11:35:24: [<ffffffffa07eb520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
11:35:24: [<ffffffffa07eb520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
11:35:24: [<ffffffff8100c140>] ? child_rip+0x0/0x20
11:35:24:Code: 00 41 8d 45 01 31 d2 f7 f1 8b 03 41 89 d5 83 c0 01 44 89 ea 89 03 48 8b 43 10 44 8b 34 90 41 83 fe ff 74 cc 49 8b 47 58 45 89 f0 <4a> 8b 04 c0 48 85 c0 74 bc f6 80 80 00 00 00 01 74 b3 48 8b 15 
11:35:24:RIP  [<ffffffffa0a164db>] alloc_qos+0x87b/0x2190 [lov]
11:35:24: RSP <ffff88007c4e1510>
11:35:24:CR2: ffff8808660aae40
11:35:24:Initializing cgroup subsys cpuset
11:35:24:Initializing cgroup subsys cpu


 Comments   
Comment by nasf (Inactive) [ 21/May/13 ]

General Lustre-2.3 bug when create file for preparing backup environment, NOT related with OI scrub which was not triggered at that time and would be triggered after the backup/restore.

Comment by Jodi Levi (Inactive) [ 21/May/13 ]

Alex,
Could you please comment on this one?
Thank you!

Comment by Alex Zhuravlev [ 27/May/13 ]

hmm, a bit hard to comment on this. supposed to be 2.3, right?

Comment by Jian Yu [ 03/Jul/13 ]

Lustre server build: https://build.whamcloud.com/job/lustre-chris/25/
Lustre client build: http://build.whamcloud.com/job/lustre-b2_4/13/

sanity-scrub test 7 hit the same failure on MDS:

08:15:18:Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt
08:15:18:Lustre: Skipped 1 previous similar message
08:15:18:BUG: unable to handle kernel paging request at ffff880451182500
08:15:19:IP: [<ffffffffa0a098bb>] alloc_qos+0x87b/0x2190 [lov]
08:15:19:PGD 1a86063 PUD 0 
08:15:19:Oops: 0000 [#1] SMP 
08:15:19:last sysfs file: /sys/devices/system/cpu/possible
08:15:19:CPU 0 
08:15:19:Modules linked in: cmm(U) osd_ldiskfs(U) mdt(U) mdd(U) mds(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) lustre(U) lquota(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) jbd2 nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
08:15:19:
08:15:19:Pid: 29624, comm: mdt00_002 Not tainted 2.6.32-358.11.1.el6_lustre.x86_64 #1 Red Hat KVM
08:15:19:RIP: 0010:[<ffffffffa0a098bb>]  [<ffffffffa0a098bb>] alloc_qos+0x87b/0x2190 [lov]
08:15:19:RSP: 0018:ffff88007b097510  EFLAGS: 00010213
08:15:19:RAX: ffff880079ca9500 RBX: ffff8800376b69c8 RCX: ffff88005168cc60
08:15:19:RDX: 0000000000000006 RSI: ffff88003758e4f8 RDI: ffff8800376b6998
08:15:20:RBP: ffff88007b0975c0 R08: 000000007ae9b200 R09: 0000000000000001
08:15:20:R10: ffff8800376b6878 R11: 0000000000000006 R12: 0000000000000000
08:15:20:R13: 0000000000000006 R14: 000000007ae9b200 R15: ffff8800376b6818
08:15:20:FS:  00007f23b3383700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
08:15:20:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
08:15:20:CR2: ffff880451182500 CR3: 0000000037bee000 CR4: 00000000000006f0
08:15:20:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
08:15:20:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
08:15:21:Process mdt00_002 (pid: 29624, threadinfo ffff88007b096000, task ffff880051ca7540)
08:15:21:Stack:
08:15:21: ffff88007b097670 0000000200000000 ffff880063124528 ffff8800376b6878
08:15:21:<d> 0000000000000000 0000000000001000 0000000000000001 0000000000000246
08:15:21:<d> ffff88005168cc60 0000000600000050 ffff8800376b6998 ffff88007b09762c
08:15:21:Call Trace:
08:15:21: [<ffffffffa0a0b300>] alloc_idx_array+0x130/0xdf0 [lov]
08:15:21: [<ffffffffa0a0cef4>] qos_prep_create+0xf4/0x1600 [lov]
08:15:21: [<ffffffffa0a06e9a>] lov_prep_create_set+0xea/0x390 [lov]
08:15:21: [<ffffffffa09ecdba>] lov_create+0x1aa/0x1410 [lov]
08:15:21: [<ffffffffa0c91bcb>] ? osd_object_read_unlock+0x9b/0xe0 [osd_ldiskfs]
08:15:22: [<ffffffffa0bc5f06>] ? mdd_read_unlock+0x26/0x30 [mdd]
08:15:22: [<ffffffffa0baaa8c>] mdd_lov_create+0xd0c/0x21c0 [mdd]
08:15:22: [<ffffffffa0bb8a0d>] mdd_create+0xdfd/0x2180 [mdd]
08:15:22: [<ffffffffa048f852>] ? cfs_hash_bd_from_key+0x42/0xe0 [libcfs]
08:15:22: [<ffffffffa048f1f9>] ? cfs_hash_bd_add_locked+0x29/0x90 [libcfs]
08:15:22: [<ffffffffa0c8eb4f>] ? osd_xattr_get+0x9f/0x350 [osd_ldiskfs]
08:15:22: [<ffffffffa0ce2637>] cml_create+0x97/0x250 [cmm]
08:15:22: [<ffffffffa0c217bf>] ? mdt_version_get_save+0x8f/0xd0 [mdt]
08:15:22: [<ffffffffa0c35767>] mdt_reint_open+0x1117/0x18b0 [mdt]
08:15:22: [<ffffffffa0bbf2be>] ? md_ucred+0x1e/0x60 [mdd]
08:15:23: [<ffffffffa0c031d5>] ? mdt_ucred+0x15/0x20 [mdt]
08:15:23: [<ffffffffa0c1eb31>] mdt_reint_rec+0x41/0xe0 [mdt]
08:15:23: [<ffffffffa0c0b23a>] mdt_reint_internal+0x4ea/0x7b0 [mdt]
08:15:23: [<ffffffffa0c0b7cd>] mdt_intent_reint+0x1ed/0x500 [mdt]
08:15:23: [<ffffffffa0c0a7e9>] mdt_intent_policy+0x369/0x650 [mdt]
08:15:23: [<ffffffffa071f881>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
08:15:23: [<ffffffffa074798f>] ldlm_handle_enqueue0+0x48f/0xf70 [ptlrpc]
08:15:23: [<ffffffffa0c0a3d6>] mdt_enqueue+0x46/0xf0 [mdt]
08:15:23: [<ffffffffa0c0fda2>] mdt_handle_common+0x922/0x1740 [mdt]
08:15:23: [<ffffffffa0c10c95>] mdt_regular_handle+0x15/0x20 [mdt]
08:15:23: [<ffffffffa0777bdc>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
08:15:24: [<ffffffffa047a5ee>] ? cfs_timer_arm+0xe/0x10 [libcfs]
08:15:24: [<ffffffffa076eeb7>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
08:15:24: [<ffffffff81055ab3>] ? __wake_up+0x53/0x70
08:15:24: [<ffffffffa07791c1>] ptlrpc_main+0xc01/0x19f0 [ptlrpc]
08:15:24: [<ffffffffa07785c0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
08:15:24: [<ffffffff8100c0ca>] child_rip+0xa/0x20
08:15:24: [<ffffffffa07785c0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
08:15:24: [<ffffffffa07785c0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
08:15:24: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
08:15:24:Code: 00 41 8d 45 01 31 d2 f7 f1 8b 03 41 89 d5 83 c0 01 44 89 ea 89 03 48 8b 43 10 44 8b 34 90 41 83 fe ff 74 cc 49 8b 47 58 45 89 f0 <4a> 8b 04 c0 48 85 c0 74 bc f6 80 80 00 00 00 01 74 b3 48 8b 15 
08:15:24:RIP  [<ffffffffa0a098bb>] alloc_qos+0x87b/0x2190 [lov]
08:15:24: RSP <ffff88007b097510>
08:15:24:CR2: ffff880451182500

Maloo report: https://maloo.whamcloud.com/test_sets/1126d04c-e32a-11e2-ba7f-52540035b04c

Comment by Jian Yu [ 15/Aug/13 ]

Lustre server build: http://build.whamcloud.com/job/lustre-b2_3/41/ (2.3.0)
Lustre client build: http://build.whamcloud.com/job/lustre-b2_4/29/

sanity-scrub test 10a hit the same failure on MDS:
https://maloo.whamcloud.com/test_sets/ad037824-0541-11e3-925a-52540035b04c

Comment by Jian Yu [ 04/Sep/13 ]

Lustre client: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)
Lustre server: http://build.whamcloud.com/job/lustre-b2_3/41/ (2.3.0)

The same failure occurred:
https://maloo.whamcloud.com/test_sets/b0ab600a-14fe-11e3-ba63-52540035b04c

Comment by nasf (Inactive) [ 07/Oct/13 ]

LU-3760 is another failure instance.

Comment by Jian Yu [ 26/Nov/13 ]

Lustre client: http://build.whamcloud.com/job/lustre-b2_4/58/
Lustre server: http://build.whamcloud.com/job/lustre-b2_3/41/ (2.3.0)

sanity-scrub test 10b also hit this failure:
https://maloo.whamcloud.com/test_sets/0b925cc4-54ad-11e3-9029-52540035b04c

Comment by James A Simmons [ 14/Aug/16 ]

Old blocker for unsupported version

Generated at Sat Feb 10 01:33:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.