[LU-1766] dd never finishes sometimes Created: 17/Aug/12  Updated: 29/Aug/12  Resolved: 29/Aug/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Jinshan Xiong (Inactive) Assignee: Hongchao Zhang
Resolution: Duplicate Votes: 0
Labels: None

Attachments: Text File stack.txt    
Severity: 3
Rank (Obsolete): 6348

 Description   

I was writing a file with dd and I saw console message on the OST:

LNet: Service thread pid 3461 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Pid: 3461, comm: ll_ost_io01_001

Call Trace:
 [<ffffffffa0079d85>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
 [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0ab248d>] fsfilt_ldiskfs_commit_wait+0x6d/0xf0 [fsfilt_ldiskfs]
 [<ffffffffa0bfa23d>] filter_preprw_write+0xc4d/0x22f0 [obdfilter]
 [<ffffffffa0e0346b>] ? cfs_percpt_lock+0x5b/0x130 [libcfs]
 [<ffffffffa0e769cb>] ? lolnd_send+0x2b/0xb0 [lnet]
 [<ffffffffa0e03374>] ? cfs_percpt_unlock+0x24/0xc0 [libcfs]
 [<ffffffffa0600aab>] ? null_alloc_rs+0x1ab/0x3b0 [ptlrpc]
 [<ffffffffa05edc44>] ? sptlrpc_svc_alloc_rs+0x74/0x2d0 [ptlrpc]
 [<ffffffffa0bfc6e0>] filter_preprw+0x80/0xa0 [obdfilter]
 [<ffffffffa04ea81c>] obd_preprw+0x12c/0x3d0 [ost]
 [<ffffffffa04f198a>] ost_brw_write+0x87a/0x1600 [ost]
 [<ffffffff8127cea6>] ? vsnprintf+0x2b6/0x5f0
 [<ffffffffa05bf07c>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc]
 [<ffffffffa05bf1d8>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc]
 [<ffffffffa04f802c>] ost_handle+0x360c/0x4850 [ost]
 [<ffffffffa0df5541>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa0df1344>] ? libcfs_id2str+0x74/0xb0 [libcfs]
 [<ffffffffa05ce87d>] ptlrpc_server_handle_request+0x40d/0xea0 [ptlrpc]
 [<ffffffffa0de565e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa05c5d07>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
 [<ffffffff810533f3>] ? __wake_up+0x53/0x70
 [<ffffffffa05cfe69>] ptlrpc_main+0xb59/0x1860 [ptlrpc]
 [<ffffffffa05cf310>] ? ptlrpc_main+0x0/0x1860 [ptlrpc]
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffffa05cf310>] ? ptlrpc_main+0x0/0x1860 [ptlrpc]
 [<ffffffffa05cf310>] ? ptlrpc_main+0x0/0x1860 [ptlrpc]
 [<ffffffff8100c140>] ? child_rip+0x0/0x20

After this was seen, the dd process never finished.

I also verify this problem is easily to be seen when the OST is almost full.



 Comments   
Comment by Peter Jones [ 23/Aug/12 ]

Bobijam

Could you please look into this one?

Thanks

Peter

Comment by Peter Jones [ 25/Aug/12 ]

Hongchao

Bobijam is busy with another priority. Could you please look into this 2.3 blocker?

Thanks

Peter

Comment by Hongchao Zhang [ 27/Aug/12 ]

Hi Jay,
could you please help to print the code lines around "filter_preprw_write+0xc4d/0x22f0 [obdfilter]"? Thanks!

Comment by Jinshan Xiong (Inactive) [ 27/Aug/12 ]
(gdb) l *(filter_preprw_write+0xc4d)
0x1f26d is in filter_preprw_write (/Users/jinxiong/srcs/lustre/lustre/include/linux/lustre_fsfilt.h:294).
289	static inline int fsfilt_commit(struct obd_device *obd, struct inode *inode,
290	                                void *handle, int force_sync)
291	{
292	        unsigned long now = jiffies;
293	        int rc = obd->obd_fsops->fs_commit(inode, handle, force_sync);
294	        CDEBUG(D_INFO, "committing handle %p\n", handle);
295	
296	        fsfilt_check_slow(obd, now, "journal start");
297	
298	        return rc;
Comment by Hongchao Zhang [ 27/Aug/12 ]

Hi Jay,
Thanks, and could you please help to confirm whether tha patch(http://review.whamcloud.com/#change,3692) is merged in your local branch?

Comment by Jinshan Xiong (Inactive) [ 29/Aug/12 ]

No, there was no this patch in my local branch at that time. If you no longer reproduce this problem with this patch, this problem is fixed then.

Comment by Peter Jones [ 29/Aug/12 ]

If I understand correctly this is believed to be a duplicate of LU-657

Comment by Jinshan Xiong (Inactive) [ 29/Aug/12 ]

I'm verifying that I can't see this problem in latest master.

Generated at Sat Feb 10 01:19:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.