[LU-3160] recovery-random-scale test_fail_client_mds: RIP: cl_object_top+0xe/0x150 [obdclass] - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.4.0
Affects Version/s: Lustre 2.4.0
Labels:
- MB
Environment:

Lustre Branch: master
Lustre Build: http://build.whamcloud.com/job/lustre-master/1396/
Distro/Arch: RHEL6.3/x86_64
Test Group: failover
FAILURE_MODE=HARD

Severity:
3
Rank (Obsolete):
7697

Description

While running recovery-random-scale (failing one random client and then failing mds), dd operation on the other live client (wtm-4vm5) hung and the client crashed:

2013-04-11 03:01:35: dd run starting
+ mkdir -p /mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com
+ /usr/bin/lfs setstripe -c -1 /mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com
+ cd /mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com
++ /usr/bin/lfs df /mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com
+ FREE_SPACE=97381848
+ BLKS=21910915
+ echo 'Free disk space is 97381848, 4k blocks to dd is 21910915'
+ load_pid=2634
+ wait 2634
+ dd bs=4k count=21910915 status=noxfer if=/dev/zero of=/mnt/lustre/d0.dd-wtm-4vm5.rosso.whamcloud.com/dd-file

Console log on wtm-4vm5:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
IP: [<ffffffffa05b4b4e>] cl_object_top+0xe/0x150 [obdclass]
PGD 7b01e067 PUD 7b016067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/possible
CPU 0
Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) nfsd exportfs autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]

Pid: 2683, comm: flush-lustre-1 Not tainted 2.6.32-279.19.1.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffffa05b4b4e>]  [<ffffffffa05b4b4e>] cl_object_top+0xe/0x150 [obdclass]
RSP: 0018:ffff88004c3c5980  EFLAGS: 00010282
RAX: ffff88007bc75800 RBX: ffff88007d1c21e8 RCX: 0000000000000098
RDX: ffff88003e2bb200 RSI: ffffffffa0602400 RDI: 0000000000000098
RBP: ffff88004c3c5990 R08: 0000000000000001 R09: 0000000000000000
R10: 000000000000000f R11: 000000000000000f R12: ffff88007d1bc3d0
R13: 0000000000000004 R14: 0000000000000098 R15: ffff88007bc75800
FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000098 CR3: 000000007b02d000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process flush-lustre-1 (pid: 2683, threadinfo ffff88004c3c4000, task ffff88007b730080)
Stack:
 ffff88004c3c5990 ffff88007d1c21e8 ffff88004c3c59d0 ffffffffa05c575d
<d> 0000000000000000 ffff8800430e3c00 ffff88007d1bc378 0000000000000000
<d> ffff88007d1bf768 ffff88007b020e80 ffff88004c3c5a30 ffffffffa09d3488
Call Trace:
 [<ffffffffa05c575d>] cl_io_sub_init+0x3d/0xc0 [obdclass]
 [<ffffffffa09d3488>] lov_sub_get+0x218/0x690 [lov]
 [<ffffffffa09d5116>] lov_io_iter_init+0xd6/0x480 [lov]
 [<ffffffffa05c279d>] cl_io_iter_init+0x5d/0x110 [obdclass]
 [<ffffffffa05c6d3c>] cl_io_loop+0x4c/0x1b0 [obdclass]
 [<ffffffffa0a5233b>] cl_sync_file_range+0x2fb/0x4e0 [lustre]
 [<ffffffffa0a7ba7f>] ll_writepages+0x6f/0x1a0 [lustre]
 [<ffffffff811255d1>] do_writepages+0x21/0x40
 [<ffffffff8119fe8d>] writeback_single_inode+0xdd/0x290
 [<ffffffff811a029e>] writeback_sb_inodes+0xce/0x180
 [<ffffffff811a03fb>] writeback_inodes_wb+0xab/0x1b0
 [<ffffffff811a079b>] wb_writeback+0x29b/0x3f0
 [<ffffffff814e9c50>] ? thread_return+0x4e/0x76e
 [<ffffffff8107d572>] ? del_timer_sync+0x22/0x30
 [<ffffffff811a0a89>] wb_do_writeback+0x199/0x240
 [<ffffffff811a0b93>] bdi_writeback_task+0x63/0x1b0
 [<ffffffff81090857>] ? bit_waitqueue+0x17/0xd0
 [<ffffffff81134170>] ? bdi_start_fn+0x0/0x100
 [<ffffffff811341f6>] bdi_start_fn+0x86/0x100
 [<ffffffff81134170>] ? bdi_start_fn+0x0/0x100
 [<ffffffff81090626>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff81090590>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: 04 00 00 00 04 00 e8 52 b7 e8 ff 48 c7 c7 60 2b 60 a0 e8 16 b3 e7 ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 <48> 8b 07 0f 1f 80 00 00 00 00 48 89 c2 48 8b 80 88 00 00 00 48
RIP  [<ffffffffa05b4b4e>] cl_object_top+0xe/0x150 [obdclass]
 RSP <ffff88004c3c5980>

Maloo report: https://maloo.whamcloud.com/test_sets/c1c906c6-a294-11e2-81ba-52540035b04c

The console logs in the above Maloo report were not gathered completely due to TT-1107. Please refer to the attachment for the full console logs.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

recovery-random-scale.test_fail_client_mds.console.tar.bz2
22 kB
12/Apr/13 6:27 AM

recovery-random-scale test_fail_client_mds: RIP: cl_object_top+0xe/0x150 [obdclass]

Details

Description

Attachments

Attachments

Activity

People

Dates