[LU-1766] dd never finishes sometimes Created: 17/Aug/12 Updated: 29/Aug/12 Resolved: 29/Aug/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jinshan Xiong (Inactive) | Assignee: | Hongchao Zhang |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 6348 |
| Description |
|
I was writing a file with dd and I saw console message on the OST: LNet: Service thread pid 3461 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 3461, comm: ll_ost_io01_001 Call Trace: [<ffffffffa0079d85>] jbd2_log_wait_commit+0xc5/0x140 [jbd2] [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa0ab248d>] fsfilt_ldiskfs_commit_wait+0x6d/0xf0 [fsfilt_ldiskfs] [<ffffffffa0bfa23d>] filter_preprw_write+0xc4d/0x22f0 [obdfilter] [<ffffffffa0e0346b>] ? cfs_percpt_lock+0x5b/0x130 [libcfs] [<ffffffffa0e769cb>] ? lolnd_send+0x2b/0xb0 [lnet] [<ffffffffa0e03374>] ? cfs_percpt_unlock+0x24/0xc0 [libcfs] [<ffffffffa0600aab>] ? null_alloc_rs+0x1ab/0x3b0 [ptlrpc] [<ffffffffa05edc44>] ? sptlrpc_svc_alloc_rs+0x74/0x2d0 [ptlrpc] [<ffffffffa0bfc6e0>] filter_preprw+0x80/0xa0 [obdfilter] [<ffffffffa04ea81c>] obd_preprw+0x12c/0x3d0 [ost] [<ffffffffa04f198a>] ost_brw_write+0x87a/0x1600 [ost] [<ffffffff8127cea6>] ? vsnprintf+0x2b6/0x5f0 [<ffffffffa05bf07c>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc] [<ffffffffa05bf1d8>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc] [<ffffffffa04f802c>] ost_handle+0x360c/0x4850 [ost] [<ffffffffa0df5541>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [<ffffffffa0df1344>] ? libcfs_id2str+0x74/0xb0 [libcfs] [<ffffffffa05ce87d>] ptlrpc_server_handle_request+0x40d/0xea0 [ptlrpc] [<ffffffffa0de565e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa05c5d07>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc] [<ffffffff810533f3>] ? __wake_up+0x53/0x70 [<ffffffffa05cfe69>] ptlrpc_main+0xb59/0x1860 [ptlrpc] [<ffffffffa05cf310>] ? ptlrpc_main+0x0/0x1860 [ptlrpc] [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffffa05cf310>] ? ptlrpc_main+0x0/0x1860 [ptlrpc] [<ffffffffa05cf310>] ? ptlrpc_main+0x0/0x1860 [ptlrpc] [<ffffffff8100c140>] ? child_rip+0x0/0x20 After this was seen, the dd process never finished. I also verify this problem is easily to be seen when the OST is almost full. |
| Comments |
| Comment by Peter Jones [ 23/Aug/12 ] |
|
Bobijam Could you please look into this one? Thanks Peter |
| Comment by Peter Jones [ 25/Aug/12 ] |
|
Hongchao Bobijam is busy with another priority. Could you please look into this 2.3 blocker? Thanks Peter |
| Comment by Hongchao Zhang [ 27/Aug/12 ] |
|
Hi Jay, |
| Comment by Jinshan Xiong (Inactive) [ 27/Aug/12 ] |
(gdb) l *(filter_preprw_write+0xc4d) 0x1f26d is in filter_preprw_write (/Users/jinxiong/srcs/lustre/lustre/include/linux/lustre_fsfilt.h:294). 289 static inline int fsfilt_commit(struct obd_device *obd, struct inode *inode, 290 void *handle, int force_sync) 291 { 292 unsigned long now = jiffies; 293 int rc = obd->obd_fsops->fs_commit(inode, handle, force_sync); 294 CDEBUG(D_INFO, "committing handle %p\n", handle); 295 296 fsfilt_check_slow(obd, now, "journal start"); 297 298 return rc; |
| Comment by Hongchao Zhang [ 27/Aug/12 ] |
|
Hi Jay, |
| Comment by Jinshan Xiong (Inactive) [ 29/Aug/12 ] |
|
No, there was no this patch in my local branch at that time. If you no longer reproduce this problem with this patch, this problem is fixed then. |
| Comment by Peter Jones [ 29/Aug/12 ] |
|
If I understand correctly this is believed to be a duplicate of |
| Comment by Jinshan Xiong (Inactive) [ 29/Aug/12 ] |
|
I'm verifying that I can't see this problem in latest master. |