[LU-5507] sanity-quota test_18: Oops: IP: lustre_msg_get_opc+0xe/0x110 [ptlrpc] Created: 20/Aug/14 Updated: 09/Jun/15 Resolved: 05/Jan/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jian Yu | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/80/ |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 15364 | ||||||||
| Description |
|
While running sanity-quota test 18, one of the client nodes hit the following error: [60756.462327] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007^M [60756.465418] IP: [<ffffffffa088a9d1>] lustre_msg_get_opc+0x1/0x100 [ptlrpc]^M [60756.466234] PGD 0 ^M [60756.466234] Oops: 0000 [#1] SMP ^M [60756.466234] CPU 0 ^M [60756.466234] Modules linked in: lustre(EN) obdecho(EN) mgc(EN) lov(EN) osc(EN) mdc(EN) lmv(EN) fid(EN) fld(EN) ptlrpc(EN) obdclass(EN) lvfs(EN) ksocklnd(EN) lnet(EN) libcfs(EN) ext2 sha512_generic sha1_generic md5 crc32c nfs lockd fscache auth_rpcgss nfs_acl sunrpc rdma_ucm rdma_cm iw_cm ib_addr ib_srp scsi_transport_srp scsi_tgt ib_ipoib ib_cm ib_uverbs ib_umad iw_cxgb3 cxgb3 mdio mlx4_en mlx4_ib ib_sa mlx4_core ib_mthca ib_mad ib_core mperf loop dm_mod floppy 8139too ipv6 ipv6_lib rtc_cmos pcspkr virtio_balloon i2c_piix4 8139cp mii button ttm drm_kms_helper drm i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd usbcore usb_common intel_agp intel_gtt scsi_dh_emc scsi_dh_rdac scsi_dh_alua scsi_dh_hp_sw scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix edd ext3 mbcache jbd fan processor ahci libahci libata scsi_mod thermal thermal_sys hwmon [last unloaded: libcfs]^M [60756.466234] Supported: No, Unsupported modules are loaded^M [60756.466234] ^M [60756.466234] Pid: 12735, comm: ptlrpcd_rcv Tainted: G EN 3.0.101-0.35-default #1 Red Hat KVM^M [60756.466234] RIP: 0010:[<ffffffffa088a9d1>] [<ffffffffa088a9d1>] lustre_msg_get_opc+0x1/0x100 [ptlrpc]^M [60756.466234] RSP: 0018:ffff880078f3dcb0 EFLAGS: 00010286^M [60756.466234] RAX: ffff8800201efa08 RBX: 0000000000000000 RCX: 0000000000000002^M [60756.466234] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffffffffffffffff^M [60756.466234] RBP: ffff88006a655500 R08: ffff8800201efa08 R09: 00000000000000d8^M [60756.466234] R10: 000000000000000a R11: 0000000000000000 R12: ffff88006295c800^M [60756.466234] R13: ffff8800201efa08 R14: ffff880079dcbee0 R15: ffff88006e9838f0^M [60756.466234] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000^M [60756.494980] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M [60756.494980] CR2: 0000000000000007 CR3: 000000007ae8a000 CR4: 00000000000006f0^M [60756.494980] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M [60756.494980] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M [60756.494980] Process ptlrpcd_rcv (pid: 12735, threadinfo ffff880078f3c000, task ffff880017dde540)^M [60756.494980] Stack:^M [60756.494980] 0000000000000000 ffffffffa099d04b ffff880079dcbc00 000000c10002ee7e^M [60756.494980] ffff880079dcbc00 000000c10002ee7e ffff880079dcbc00 ffff8800290c0a88^M [60756.494980] ffff8800290c0800 ffffffffa087eb5a 00000000ebc0de01 ffff880079dcbc00^M [60756.494980] Call Trace:^M [60756.494980] [<ffffffffa099d04b>] mdc_replay_open+0xab/0x430 [mdc]^M [60756.494980] [<ffffffffa087eb5a>] ptlrpc_replay_interpret+0x14a/0x740 [ptlrpc]^M [60756.494980] [<ffffffffa0880452>] ptlrpc_check_set+0x532/0x1b30 [ptlrpc]^M [60756.494980] [<ffffffffa08abdcb>] ptlrpcd_check+0x52b/0x550 [ptlrpc]^M [60756.494980] [<ffffffffa08ac32b>] ptlrpcd+0x24b/0x3b0 [ptlrpc]^M [60756.494980] [<ffffffff810829a6>] kthread+0x96/0xa0^M [60756.494980] [<ffffffff8146b164>] kernel_thread_helper+0x4/0x10^M [60756.494980] Code: 89 44 24 48 48 83 c4 58 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 45 31 ed e9 fb fe ff ff 66 66 66 2e 0f 1f 84 00 00 00 00 00 53 ^M [60756.494980] 7f 08 d3 0b d0 0b 48 89 fb 74 73 c7 05 49 0[ 0.000000] Initializing cgroup subsys cpuset^M [ 0.000000] Initializing cgroup subsys cpu^M Maloo report: https://testing.hpdd.intel.com/test_sets/4f4c437a-268b-11e4-84f2-5254006e85c2 |
| Comments |
| Comment by Jian Yu [ 20/Aug/14 ] |
|
Lustre client build: https://build.hpdd.intel.com/job/lustre-b2_5/80/ The same failure occurred: https://testing.hpdd.intel.com/test_sets/ea35137e-266f-11e4-8ee8-5254006e85c2 |
| Comment by Jian Yu [ 21/Aug/14 ] |
|
So far, the failure has not occurred in Lustre b2_5 build #82 and #83. |
| Comment by Jian Yu [ 31/Aug/14 ] |
|
Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/86/ (2.5.3 RC1) The same failure occurred: https://testing.hpdd.intel.com/test_sets/651d9592-30da-11e4-b503-5254006e85c2 |
| Comment by Peter Jones [ 04/Nov/14 ] |
|
This seems to occur sometimes. Any idea why? |
| Comment by Niu Yawei (Inactive) [ 11/Nov/14 ] |
|
Seems it's a race of close vs. open replay, that's introduced in the fix of |
| Comment by Niu Yawei (Inactive) [ 11/Nov/14 ] |
| Comment by Gerrit Updater [ 03/Dec/14 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12667/ |
| Comment by Niu Yawei (Inactive) [ 05/Jan/15 ] |
|
patch landed on master. |