[LU-7158] Interop 2.7.0<->master sanityn test_78: BUG: unable to handle kernel NULL pointer dereference Created: 14/Sep/15  Updated: 31/Mar/16  Resolved: 21/Mar/16

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

client: lustre-master build# 3166 RHEL6.6
server: 2.7.0


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/2cabd07c-5157-11e5-9f68-5254006e85c2.

The sub-test test_78 failed with the following error:

test failed to respond and timed out

ost console

05:39:17:Lustre: DEBUG MARKER: == sanityn test 78: Enable policy and specify tunings right away == 05:39:06 (1441085946)
05:39:17:Lustre: DEBUG MARKER: lctl set_param ost.OSS.*.nrs_orr_quantum=1
05:39:17:Lustre: DEBUG MARKER: lctl set_param ost.OSS.ost_io.nrs_policies=orr
05:39:17:BUG: unable to handle kernel NULL pointer dereference at 000000000000003c
05:39:17:IP: [<ffffffffa0857142>] nrs_orr_ctl+0x122/0x190 [ptlrpc]
05:39:17:PGD 62355067 PUD 79492067 PMD 0 
05:39:17:Oops: 0002 [#1] SMP 
05:39:17:last sysfs file: /sys/devices/system/cpu/online
05:39:17:CPU 1 
05:39:17:Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic libcfs(U) ldiskfs(U) jbd2 nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
05:39:17:
05:39:17:Pid: 19691, comm: lctl Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Red Hat KVM
05:39:17:RIP: 0010:[<ffffffffa0857142>]  [<ffffffffa0857142>] nrs_orr_ctl+0x122/0x190 [ptlrpc]
05:39:17:RSP: 0018:ffff88006ef63d38  EFLAGS: 00010202
05:39:17:RAX: 0000000000000001 RBX: 00000000ffffffda RCX: 0000000000000001
05:39:17:RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88006cf2f340
05:39:17:RBP: ffff88006ef63d38 R08: 0000000000000000 R09: ffff88006ef63e30
05:39:17:R10: 0000000000000002 R11: f000000000000000 R12: ffff88007d742ce0
05:39:17:R13: 0000000000000021 R14: ffff88006cf2f340 R15: ffff88007d61d9c0
05:39:17:FS:  00007fc4f4640700(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
05:39:17:CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
05:39:17:CR2: 000000000000003c CR3: 0000000072e17000 CR4: 00000000000006e0
05:39:17:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
05:39:17:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
05:39:17:Process lctl (pid: 19691, threadinfo ffff88006ef62000, task ffff88007cae4aa0)
05:39:17:Stack:
05:39:17: ffff88006ef63d88 ffffffffa084f9f2 ffff88006ef63e30 ffff88007d742ce8
05:39:17:<d> ffff88006ef63d78 ffff8800374d7ac0 0000000000000000 ffff88007d742c00
05:39:17:<d> 0000000000000001 0000000000000000 ffff88006ef63df8 ffffffffa08513f2
05:39:17:Call Trace:
05:39:17: [<ffffffffa084f9f2>] nrs_policy_ctl+0x1d2/0x280 [ptlrpc]
05:39:17: [<ffffffffa08513f2>] ptlrpc_nrs_policy_control+0xe2/0x2a0 [ptlrpc]
05:39:17: [<ffffffffa085697f>] ptlrpc_lprocfs_nrs_orr_quantum_seq_write+0x16f/0x2d0 [ptlrpc]
05:39:17: [<ffffffff811f994e>] proc_reg_write+0x7e/0xc0
05:39:17: [<ffffffff8118e498>] vfs_write+0xb8/0x1a0
05:39:17: [<ffffffff8118ee61>] sys_write+0x51/0x90
05:39:17: [<ffffffff810e5eae>] ? __audit_syscall_exit+0x25e/0x290
05:39:17: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
05:39:17:Code: 3c 66 89 02 f6 05 22 c6 c6 ff 01 0f 85 7e ff ff ff 0f 1f 84 00 00 00 00 00 31 c0 c9 c3 0f 1f 40 00 0f b7 02 48 8b 57 48 66 85 c0 <66> 89 42 3c 0f 85 4d ff ff ff 48 c7 c7 a0 45 8f a0 48 c7 c2 18 
05:39:17:RIP  [<ffffffffa0857142>] nrs_orr_ctl+0x122/0x190 [ptlrpc]
05:39:17: RSP <ffff88006ef63d38>
05:39:17:CR2: 000000000000003c
05:39:17:Initializing cgroup subsys cpuset
05:39:17:Initializing cgroup subsys cpu


 Comments   
Comment by Sarah Liu [ 06/Oct/15 ]

another instance: https://testing.hpdd.intel.com/test_sets/18cf9f20-6b38-11e5-94a7-5254006e85c2
server: 2.7.0
client: lustre-master/3203

Comment by Saurabh Tandan (Inactive) [ 16/Dec/15 ]

Server: 2.5.5, b2_5_fe/62
Client: Master, Build# 3266, Tag 2.7.64
https://testing.hpdd.intel.com/test_sets/9585852e-a04a-11e5-a33d-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 16/Dec/15 ]

Server: 2.5.5, b2_5_fe/62
Client: Master, Build# 3266, Tag 2.7.64 , RHEL 7
https://testing.hpdd.intel.com/test_sets/097d2928-a05f-11e5-90cc-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 24/Dec/15 ]

Another instance found for the following config:
Server: 2.7.1 , b2_7_fe/34
Client: Master, build# 3276, RHEL 6.7
https://testing.hpdd.intel.com/test_sets/64b8628c-a602-11e5-a14c-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 24/Feb/16 ]

Another instance found for interop - 2.7.1 Server/EL7 Client, tag 2.7.90.
https://testing.hpdd.intel.com/test_sessions/3b9722f8-d2f8-11e5-bf08-5254006e85c2
Another instance found for interop - 2.7.1 Server/EL6.7 Client, tag 2.7.90.
https://testing.hpdd.intel.com/test_sessions/f371534e-d573-11e5-bc47-5254006e85c2
Another instance found for interop - 2.5.5 Server/EL6.7 Client, tag 2.7.90.
https://testing.hpdd.intel.com/test_sessions/e16b8c0c-d634-11e5-82a0-5254006e85c2
Another instance found for interop - 2.5.5 Server/EL7 Client, tag 2.7.90.
https://testing.hpdd.intel.com/test_sessions/ba9d84fe-d300-11e5-be5c-5254006e85c2

Comment by Sarah Liu [ 21/Mar/16 ]

this is caused by the same reason as LU-7605, test_78 should be skipped for older server

Comment by Sarah Liu [ 21/Mar/16 ]

dup of LU-7605

Generated at Sat Feb 10 02:06:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.