Loading...

XML

Word

Printable

Details

Type: Question/Request
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.4.3
Labels:
- llnl

Epic/Theme:
- Performance
Epic:
- server
Rank (Obsolete):
13557

Description

After observing substantial read I/Os on our systems during an OST umount I took a look at exactly what was causing them. They root cause turns out to be ldlm_cancel_locks_for_export() is calling ofd_lvbo_update() to update the LVB from disk for every lock as its canceled. When there are millions of locks on the server this translates in to a huge amount of IO.

After reading through the code it's not at all clear to me why this is done. How could it be out of date? Why is this required before the lock can be canceled?

[<ffffffffa031c9dc>] cv_wait_common+0x8c/0x100 [spl]
[<ffffffffa031ca68>] __cv_wait_io+0x18/0x20 [spl]
[<ffffffffa046353b>] zio_wait+0xfb/0x1b0 [zfs]
[<ffffffffa03d16bd>] dbuf_read+0x3fd/0x740 [zfs]
[<ffffffffa03d1b89>] __dbuf_hold_impl+0x189/0x480 [zfs]
[<ffffffffa03d1f06>] dbuf_hold_impl+0x86/0xc0 [zfs]
[<ffffffffa03d2f80>] dbuf_hold+0x20/0x30 [zfs]
[<ffffffffa03d9767>] dmu_buf_hold+0x97/0x1d0 [zfs]
[<ffffffffa042de8f>] zap_get_leaf_byblk+0x4f/0x2a0 [zfs]
[<ffffffffa042e14a>] zap_deref_leaf+0x6a/0x80 [zfs]
[<ffffffffa042e510>] fzap_lookup+0x60/0x120 [zfs]
[<ffffffffa0433f11>] zap_lookup_norm+0xe1/0x190 [zfs]
[<ffffffffa0434053>] zap_lookup+0x33/0x40 [zfs]
[<ffffffffa0cf0710>] osd_fid_lookup+0xb0/0x2e0 [osd_zfs]
[<ffffffffa0cea311>] osd_object_init+0x1a1/0x6d0 [osd_zfs]
[<ffffffffa06efc9d>] lu_object_alloc+0xcd/0x300 [obdclass]
[<ffffffffa06f0805>] lu_object_find_at+0x205/0x360 [obdclass]
[<ffffffffa06f0976>] lu_object_find+0x16/0x20 [obdclass]
[<ffffffffa0d80575>] ofd_object_find+0x35/0xf0 [ofd]
[<ffffffffa0d90486>] ofd_lvbo_update+0x366/0xdac [ofd]
[<ffffffffa0831828>] ldlm_cancel_locks_for_export_cb+0x88/0x200 [ptlrpc]
[<ffffffffa059178f>] cfs_hash_for_each_relax+0x17f/0x360 [libcfs]
[<ffffffffa0592fde>] cfs_hash_for_each_empty+0xfe/0x1e0 [libcfs]
[<ffffffffa082c05f>] ldlm_cancel_locks_for_export+0x2f/0x40 [ptlrpc]
[<ffffffffa083b804>] server_disconnect_export+0x64/0x1a0 [ptlrpc]
[<ffffffffa0d717fa>] ofd_obd_disconnect+0x6a/0x1f0 [ofd]
[<ffffffffa06b5d77>] class_disconnect_export_list+0x337/0x660 [obdclass]
[<ffffffffa06b6496>] class_disconnect_exports+0x116/0x2f0 [obdclass]
[<ffffffffa06de9cf>] class_cleanup+0x16f/0xda0 [obdclass]
[<ffffffffa06e06bc>] class_process_config+0x10bc/0x1c80 [obdclass]
[<ffffffffa06e13f9>] class_manual_cleanup+0x179/0x6f0 [obdclass]
[<ffffffffa071615c>] server_put_super+0x5bc/0xf00 [obdclass]
[<ffffffff8118461b>] generic_shutdown_super+0x5b/0xe0
[<ffffffff81184706>] kill_anon_super+0x16/0x60
[<ffffffffa06e3256>] lustre_kill_super+0x36/0x60 [obdclass]
[<ffffffff81184ea7>] deactivate_super+0x57/0x80
[<ffffffff811a2d2f>] mntput_no_expire+0xbf/0x110
[<ffffffff811a379b>] sys_umount+0x7b/0x3a0

Attachments

Activity

People

Assignee:: Mikhail Pershin

Reporter:: Brian Behlendorf

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 15/Apr/14 6:54 PM

Updated:: 27/Aug/19 5:28 PM