Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.8.0
-
MDS and OSS 2.8.0-RRC4
client1: 2.8.0-RC4
client2: 2.7
-
3
-
9223372036854775807
Description
MDS hit the BUG: unable to handle kernel NULL pointer dereference at (null) and reboot when running sanity test_61
MDS and OSS were upgraded from 2.7 RHEL6.7 to 2.8.0-RC4 RHEL6.7 ldiskfs
client1 was upgraded from 2.7 RHEL6.7 to 2.8.0-RC4 RHEL6.7
client2 was remained as 2.7 RHEL6.7
MDS console
Lustre: DEBUG MARKER: == sanity test 61: mmap() writes don't make sync hang ================== 16:27:47 (1457051267) Lustre: *** cfs_fail_loc=15b, val=0*** Lustre: Skipped 1 previous similar message BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8153b933>] down_write+0x23/0x40 PGD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/devices/system/cpu/online CPU 1 Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldiskfs(U) ldiskfs(U) jbd2 lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm sg microcode iTCO_wdt iTCO_vendor_support sb_edac edac_core joydev i2c_i801 lpc_ich mfd_core ioatdma igb dca i2c_algo_bit i2c_core shpchp ext3 jbd mbcache sd_mod crc_t10dif isci libsas scsi_transport_sas ahci mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en ptp pps_core mlx4_core wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: llog_test] Pid: 42865, comm: mdt00_000 Not tainted 2.6.32-573.12.1.el6_lustre.x86_64 #1 Intel Corporation S2600GZ/S2600GZ RIP: 0010:[<ffffffff8153b933>] [<ffffffff8153b933>] down_write+0x23/0x40 RSP: 0018:ffff880815c93700 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffffffff00000001 RSI: ffff8808345fa040 RDI: 0000000000000000 RBP: ffff880815c93710 R08: ffff880815c90000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000064 R12: ffff8804074a2ec0 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88040e522b80 FS: 0000000000000000(0000) GS:ffff880038620000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000001a8d000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mdt00_000 (pid: 42865, threadinfo ffff880815c90000, task ffff8808345fa040) Stack: ffff88040e522b80 0000000000000000 ffff880815c93770 ffffffffa055bad3 <d> ffff880815c93770 0000000000000000 ffff880434fe6b40 ffff88042c7b7440 <d> ffff88040d227024 ffff8804074a2ec0 ffff88042c7b7440 ffff880434fe6b40 Call Trace: [<ffffffffa055bad3>] llog_cat_add_rec+0x403/0x7b0 [obdclass] [<ffffffffa0552239>] llog_add+0x89/0x1c0 [obdclass] [<ffffffffa0ff2e2e>] ? lod_sub_object_index_insert+0x1fe/0x340 [lod] [<ffffffffa104a084>] mdd_changelog_store+0x154/0x320 [mdd] [<ffffffffa104a421>] mdd_changelog_ns_store+0x1d1/0x620 [mdd] [<ffffffffa1061826>] ? mdd_attr_set_internal+0xd6/0x2c0 [mdd] [<ffffffffa1061a8f>] ? mdd_update_time+0x7f/0x1c0 [mdd] [<ffffffffa10568c1>] mdd_create+0x1351/0x1770 [mdd] [<ffffffffa0f1e4c8>] mdo_create+0x18/0x50 [mdt] [<ffffffffa0f26d85>] mdt_reint_open+0x1f55/0x2f50 [mdt] [<ffffffffa07f7bdd>] ? null_alloc_rs+0xcd/0x320 [ptlrpc] [<ffffffffa05b6cbc>] ? upcall_cache_get_entry+0x29c/0x880 [obdclass] [<ffffffffa05bbbf0>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa0f0b57f>] ? ucred_set_jobid+0x5f/0x70 [mdt] [<ffffffffa0f0f1fd>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa0efae4b>] mdt_reint_internal+0x62b/0x9f0 [mdt] [<ffffffffa0efb406>] mdt_intent_reint+0x1f6/0x430 [mdt] [<ffffffffa0ef98be>] mdt_intent_policy+0x4be/0xc70 [mdt] [<ffffffffa076f6c7>] ldlm_lock_enqueue+0x127/0x990 [ptlrpc] [<ffffffffa079a827>] ldlm_handle_enqueue0+0x807/0x14d0 [ptlrpc] [<ffffffffa080dfe1>] ? tgt_lookup_reply+0x31/0x190 [ptlrpc] [<ffffffffa0820171>] tgt_enqueue+0x61/0x230 [ptlrpc] [<ffffffffa0820c2c>] tgt_request_handle+0x8ec/0x1440 [ptlrpc] [<ffffffffa07cdc61>] ptlrpc_main+0xd21/0x1800 [ptlrpc] [<ffffffffa07ccf40>] ? ptlrpc_main+0x0/0x1800 [ptlrpc] [<ffffffff810a0fce>] kthread+0x9e/0xc0 [<ffffffff8100c28a>] child_rip+0xa/0x20 [<ffffffff810a0f30>] ? kthread+0x0/0xc0 [<ffffffff8100c280>] ? child_rip+0x0/0x20 Code: c3 e8 a2 ba b3 ff 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 89 fb e8 ba e2 ff ff 48 ba 01 00 00 00 ff ff ff ff 48 89 d8 <f0> 48 0f c1 10 48 85 d2 74 05 e8 ce 29 d6 ff 48 83 c4 08 5b c9 RIP [<ffffffff8153b933>] down_write+0x23/0x40 RSP <ffff880815c93700> CR2: 0000000000000000 Initializing cgroup subsys cpuset Initializing cgroup subsys cpu Linux version 2.6.32-573.12.1.el6_lustre.x86_64 (jenkins@onyx-7-sdf1-el6-x8664.onyx.hpdd.intel.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) ) #1 SMP Thu Feb 18 11:08:53 PST 2016 Command line: ro root=UUID=f50605c1-7b71-4192-8f8b-afcd8aae7478 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD console=tty0 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM console=ttyS0,115200 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug disable_cpu_apicid=0 memmap=exactmap memmap=574K@4K memmap=133550K@49726K elfcorehdr=183276K memmap=4K$0K memmap=62K$578K memmap=128K$896K memmap=42200K$3067876K memmap=992K#3110076K memmap=488K#3111068K memmap=568K#3111556K memmap=516K#3112124K memmap=294912K$3112960K memmap=4K$4173824K memmap=4K$4174948K memmap=16K$4174960K memmap=4K$4175872K memmap=6016K$4188288K KERNEL supported cpus: