[LU-2218] lots of small IO causes OST deadlock Created: 22/Oct/12  Updated: 09/May/14  Resolved: 09/May/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.8
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Kit Westneat (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

SLES kernel on debian


Attachments: Text File debug.tar.gz    
Severity: 3
Rank (Obsolete): 5276

 Description   

Sanger have been running into an issue where one of their applications seems to deadlock OSTs. They have an application that does lots of small IO and seems to create and delete a lot of files. It also seems to saturate the network, so there are a lot of bulk IO errors. It looks like the quota and jbd sections are getting into some kind of deadlock. I'm uploading the full logs, but there is a lot of:

Oct 21 11:29:40 lus08-oss2 kernel: [ 1456.264411] [<ffffffff8139ba25>] rwsem_down_failed_common+0x95/0x1e0
Oct 21 11:29:40 lus08-oss2 kernel: [ 1456.264418] [<ffffffff8139bb8f>] rwsem_down_write_failed+0x1f/0x30
Oct 21 11:29:40 lus08-oss2 kernel: [ 1456.264425] [<ffffffff811e8db3>] call_rwsem_down_write_failed+0x13/0x20
Oct 21 11:29:40 lus08-oss2 kernel: [ 1456.264431] [<ffffffff8139ad8c>] down_write+0x1c/0x20
Oct 21 11:29:40 lus08-oss2 kernel: [ 1456.264438] [<ffffffff8114fd3f>] dquot_initialize+0x8f/0x1c0
Oct 21 11:29:40 lus08-oss2 kernel: [ 1456.264453] [<ffffffffa098fff0>] ldiskfs_unlink+0x130/0x270 [ldiskfs]
Oct 21 11:29:40 lus08-oss2 kernel: [ 1456.264484] [<ffffffffa0a18a58>] filter_vfs_unlink+0x2f8/0x500 [obdfilter]
Oct 21 11:29:40 lus08-oss2 kernel: [ 1456.264499] [<ffffffffa0a2c412>] filter_destroy+0x1572/0x1b90 [obdfilter]
Oct 21 11:29:40 lus08-oss2 kernel: [ 1456.264512] [<ffffffffa09e4436>] ost_handle+0x2f36/0x5ef0 [ost]
Oct 21 11:29:40 lus08-oss2 kernel: [ 1456.264539] [<ffffffffa06fb040>] ptlrpc_main+0x1bc0/0x22f0 [ptlrpc]
Oct 21 11:29:40 lus08-oss2 kernel: [ 1456.264574] [<ffffffff81003eba>] child_rip+0xa/0x20
Oct 21 11:29:40 lus08-oss2 kernel: [ 1456.264577]

and

Oct 21 12:02:13 lus08-oss2 kernel: [ 3406.266346] Call Trace:
Oct 21 12:02:13 lus08-oss2 kernel: [ 3406.266366] [<ffffffffa0956006>] start_this_handle+0x356/0x450 [jbd2]
Oct 21 12:02:13 lus08-oss2 kernel: [ 3406.266388] [<ffffffffa09562e0>] jbd2_journal_start+0xa0/0xe0 [jbd2]
Oct 21 12:02:13 lus08-oss2 kernel: [ 3406.266398] [<ffffffffa095632e>] jbd2_journal_force_commit+0xe/0x30 [jbd2]
Oct 21 12:02:13 lus08-oss2 kernel: [ 3406.266415] [<ffffffffa0995ce1>] ldiskfs_force_commit+0xb1/0xe0 [ldiskfs]
Oct 21 12:02:13 lus08-oss2 kernel: [ 3406.266444] [<ffffffffa0a1fab0>] filter_sync+0x80/0x600 [obdfilter]
Oct 21 12:02:13 lus08-oss2 kernel: [ 3406.266457] [<ffffffffa09e039f>] ost_blocking_ast+0x29f/0xa30 [ost]
Oct 21 12:02:13 lus08-oss2 kernel: [ 3406.266485] [<ffffffffa06a36d6>] ldlm_cancel_callback+0x56/0xe0 [ptlrpc]
Oct 21 12:02:13 lus08-oss2 kernel: [ 3406.266504] [<ffffffffa06a37ac>] ldlm_lock_cancel+0x4c/0x190 [ptlrpc]
Oct 21 12:02:13 lus08-oss2 kernel: [ 3406.266528] [<ffffffffa06c3dcf>] ldlm_request_cancel+0x13f/0x380 [ptlrpc]

I asked them to turn down the oss threads to try to reduce contention on the disks and network, but that didn't seem to help. Let me know if there are any other logs you need.



 Comments   
Comment by Peter Jones [ 22/Oct/12 ]

Niu

Could you please help with this one?

Thanks

peter

Comment by Kit Westneat (Inactive) [ 22/Oct/12 ]

Here is additional information from Sanger:

The pipeline runs 200-1500 concurrent jobs; each job creates its own
working directory on lustre and processes ~2000 files within the
directory. The file IO is all small; each job is typically opening a
file, writing a couple of lines, closing the file. The process is then
repeated on another file.

There does not appear to be any concurrent write access to any of the
files. At the end of the job, the working directory is deleted.

All of these IO is happening on un-striped files & directories.

One thing we also noticed; the user concerned was at 98% of their inode
quota, and it is entirely possible that they went over quota at some point.

Unfortunately, their pipeline does not trap errors; it simply re-tries
job if they fail. I have looked through the logs on all of the clients,
and none of them have logged any of the lustre -122 errors that we would
expect to see in an over-quota situation, but maybe the over-quota
errors never makes it back from the deadlocked OSS.

We have upped the user's quota, and re-activated the pipeline, so we
should know in the next 24 hours whether this was the cause or not.

Comment by Guy Coates [ 24/Oct/12 ]

Hi,

We can reproduce the deadlock even when the user is well within quota.

Comment by Kit Westneat (Inactive) [ 24/Oct/12 ]

I've looked more closely at the logs and it looks like there is some kind of deadlock between 3 different functions:

mutex_lock
rwsem_down_failed_common
start_this_handle

They all seem to be called by either filter_destroy or ldlm_cancel_callback originally.

Are there any debug flags that might help get information?

Comment by Niu Yawei (Inactive) [ 24/Oct/12 ]

Hi, Kit

I checked the log, there are indeed many service threads are waiting for starting journal, and somes are waiting on the dqptr_sem, but it seems not necessary be a deadlock. I didn't see lock ordering issues from code so far (there was a deadlock problem, but it's already fixed in LU-952).

Could you try to get the full stacktrace (on the hang OST) by sysrq? I think that could be helpful. Thanks.

Comment by Niu Yawei (Inactive) [ 24/Oct/12 ]

Hi, Yangshen

I saw that ext4-back-dquot-to-rhel54.patch is removed by ba5dd769f66194a80920cf93d6014c78729efaae (LU-674), won't it cause deadlock? (see b23235 530ddd7935bfbea6a9714c69d44da61f5efc098f).

Comment by Niu Yawei (Inactive) [ 24/Oct/12 ]

btw, Kit, what's the exact kernel version on OST?

Comment by Yang Sheng [ 24/Oct/12 ]

SLES kernel may different than RHEL. So the patches cannot apply directly.

Comment by Kit Westneat (Inactive) [ 25/Oct/12 ]

The kernel is version 2.6.32.59-sles-lustre-1.8.8wc1. I tried to see if there were any ext4 bugs fixed in more recent 2.6.32.y kernels, but I couldn't find any. We'll try to get a full stack trace the next time this occurred. We tried this time, but as you can see the first part was not saved for some reason.

Comment by Guy Coates [ 26/Oct/12 ]

We have been running the same workload on one of our 2.2 file-systems for the past few days, and we not able to trigger the problem.

Our plan will be to upgrade the problematic file-system to 2.X. I am happy for you to reduce the priority of the ticket in the meantime.
Thanks,

Guy

Comment by Niu Yawei (Inactive) [ 26/Oct/12 ]

Thank you, Guy. sles kernel isn't officially supported by 1.8.8, so I'm not sure if there is potential deadlock of journal lock with qptr_sem. Anyway, glad to hear you resolved the problem by upgrading.

Generated at Sat Feb 10 01:23:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.