Description
See the discussion on from http://review.whamcloud.com/#/c/11025/2/lustre/mdt/mdt_recovery.c.
To reproduce:
MOUNT_2=y llmount.sh mkdir /mnt/lustre/d0 stat /mnt/lustre/d0 lctl set_param fail_loc=0x119 # OBD_FAIL_MDS_REINT_NET_REP touch /mnt/lustre/d0 & sleep 1 lctl set_param fail_loc=0 rmdir /mnt/lustre2/d0
[ 132.938246] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt [ 132.970143] Lustre: *** cfs_fail_loc=119, val=2147483648*** [ 132.972029] LustreError: 3623:0:(ldlm_lib.c:2399:target_send_reply_msg()) @@@ dropping reply req@ffff8801f8edd908 x1473623045964728/t4294967298(0) o36->ac7dcf52-7dca-015b-3e5b-0a840c48c232@0@lo:0/0 lens 488/456 e 0 to 0 dl 1405356492 ref 1 fl Interpret:/0/0 rc 0/0 [ 139.969185] Lustre: 3855:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1405356486/real 1405356486] req@ffff8801f64beb00 x1473623045964728/t0(0) o36->lustre-MDT0000-mdc-ffff880203aa4860@0@lo:12/10 lens 488/1016 e 0 to 1 dl 1405356493 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [ 139.978122] Lustre: lustre-MDT0000-mdc-ffff880203aa4860: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [ 139.983045] Lustre: lustre-MDT0000: Client ac7dcf52-7dca-015b-3e5b-0a840c48c232 (at 0@lo) reconnecting [ 139.984979] Lustre: lustre-MDT0000-mdc-ffff880203aa4860: Connection restored to lustre-MDT0000 (at 0@lo) [ 139.987189] BUG: unable to handle kernel NULL pointer dereference at (null) [ 139.988113] IP: [<ffffffffa046084e>] cl_object_top+0xe/0x150 [obdclass] [ 139.988113] PGD 1f4083067 PUD 1fbfb9067 PMD 0 [ 139.988113] Oops: 0000 [#1] SMP [ 139.988113] last sysfs file: /sys/devices/system/cpu/possible [ 139.988113] CPU 0 [ 139.988113] Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) nodemap(U) osd_ldiskfs(U) ldiskfs(U) exportfs lquota(U) lfsck(U) jbd obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] [ 139.988113] [ 139.988113] Pid: 3855, comm: touch Not tainted 2.6.32-431.5.1.el6.lustre.x86_64 #1 Bochs Bochs [ 139.988113] RIP: 0010:[<ffffffffa046084e>] [<ffffffffa046084e>] cl_object_top+0xe/0x150 [obdclass] [ 139.988113] RSP: 0018:ffff8801f40b9c58 EFLAGS: 00010292 [ 139.988113] RAX: ffff8801f8ed2f60 RBX: ffff880219e9a690 RCX: 0000000000000000 [ 139.988113] RDX: ffff8801f8f8bce8 RSI: ffffffffa04bca20 RDI: 0000000000000000 [ 139.988113] RBP: ffff8801f40b9c68 R08: 0000000000000001 R09: 0000000000000001 [ 139.988113] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880217e903c8 [ 139.988113] R13: 0000000000000002 R14: 0000000000000000 R15: ffff8801f8ed2f60 [ 139.988113] FS: 00007f641a061700(0000) GS:ffff88002f800000(0000) knlGS:0000000000000000 [ 139.988113] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 139.988113] CR2: 0000000000000000 CR3: 00000001f4074000 CR4: 00000000000006f0 [ 139.988113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 139.988113] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 139.988113] Process touch (pid: 3855, threadinfo ffff8801f40b8000, task ffff8801fbf16380) [ 139.988113] Stack: [ 139.988113] ffff8801f40b9c68 ffff880219e9a690 ffff8801f40b9ca8 ffffffffa046fded [ 139.988113] <d> ffff8801f40b9cd0 ffff880217e903a0 ffff880219e9a690 ffff8801f40b9e48 [ 139.988113] <d> ffff880217e903c8 ffff8801f40b9cd4 ffff8801f40b9d08 ffffffffa0e846e9 [ 139.988113] Call Trace: [ 139.988113] [<ffffffffa046fded>] cl_io_init+0x3d/0xe0 [obdclass] [ 139.988113] [<ffffffffa0e846e9>] cl_setattr_ost+0xe9/0x2f0 [lustre] [ 139.988113] [<ffffffffa0e4e98c>] ll_setattr_raw+0xa2c/0x10d0 [lustre] [ 139.988113] [<ffffffff8107b997>] ? current_fs_time+0x27/0x30 [ 139.988113] [<ffffffffa0e4f095>] ll_setattr+0x65/0xd0 [lustre] [ 139.988113] [<ffffffff811c16a8>] notify_change+0x168/0x340 [ 139.988113] [<ffffffff811d601e>] utimes_common+0xde/0x1c0 [ 139.988113] [<ffffffff8119f51b>] ? put_unused_fd+0x3b/0x90 [ 139.988113] [<ffffffff811d61d0>] do_utimes+0xd0/0x170 [ 139.988113] [<ffffffff811d6372>] sys_utimensat+0x32/0x90 [ 139.988113] [<ffffffff81554222>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 139.988113] [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b [ 139.988113] Code: 48 89 df e8 85 71 d2 e0 48 c7 c3 f4 ff ff ff e9 2a ff ff ff 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 <48> 8b 07 0f 1f 80 00 00 00 00 48 89 c2 48 8b 80 a0 00 00 00 48 [ 139.988113] RIP [<ffffffffa046084e>] cl_object_top+0xe/0x150 [obdclass] [ 139.988113] RSP <ffff8801f40b9c58> [ 139.988113] CR2: 0000000000000000