[LU-8849] Client hangs on 'lfs ladvise' with large input values Created: 18/Nov/16  Updated: 22/Jul/17  Resolved: 22/Jul/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Major
Reporter: James Nunez (Inactive) Assignee: Li Xi (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

eagle cluster with lustre-master 2.8.60


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I was playing around with input values for 'lfs ladvise' and trying to see what maximums I could send as inputs. The following commands worked with no obvious errors or returned an error:

# lfs ladvise -a willread /lustre/scratch/tfile --length 5000000000
# lfs ladvise -a willread /lustre/scratch/tfile --length 50000000009999999999999
# lfs ladvise -a willread /lustre/scratch/space_file2 --length 500000000099M
# lfs ladvise -a willread /lustre/scratch/space_file2 --length 50000000009999M
ladvise: bad length '50000000009999M'
# lfs ladvise -a willread /lustre/scratch/space_file2 --length 500000000099G
ladvise: bad length '500000000099G'
# lfs ladvise -a willread /lustre/scratch/space_file2 --length 500000000099k -s 0
# lfs ladvise -a willread /lustre/scratch/space_file2 --length 500000000099k -s 1M

The I tried one command too many and I hung the client with

lfs ladvise -a willread /lustre/scratch/space_file2 --length 500000000099k -s 500000000099k

The file resides on OST0:

# lfs getstripe /lustre/scratch/space_file2 
/lustre/scratch/space_file2
lmm_stripe_count:   1
lmm_stripe_size:    1048576
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  0
	obdidx		 objid		 objid		 group
	     0	            15	          0xf	             0

ls -l /lustre/scratch/space_file2 
-rw-r--r-- 1 root root 104857600 Nov 17 21:55 /lustre/scratch/space_file2

dmesg from the client:

INFO: task lfs:26850 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.6.1.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfs           D 0000000000000000     0 26850  15016 0x00000084
 ffff880079cbbaf8 0000000000000086 0000000000000000 ffff88007851a040
 ffff88007851a078 ffff880078503bc0 00001e14781a34aa 0000000000000000
 ffff88007c01f558 0000000101f41458 ffff88007c01fad8 ffff880079cbbfd8
Call Trace:
 [<ffffffff815498b5>] schedule_timeout+0x215/0x2e0
 [<ffffffffa07765d3>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc]
 [<ffffffff81549513>] wait_for_common+0x123/0x180
 [<ffffffff8106c500>] ? default_wake_function+0x0/0x20
 [<ffffffffa09bb3e0>] ? lov_io_end_wrapper+0x0/0x100 [lov]
 [<ffffffff8154964d>] wait_for_completion+0x1d/0x20
 [<ffffffffa0b336f0>] osc_io_ladvise_end+0x40/0x60 [osc]
 [<ffffffffa05e7a30>] cl_io_end+0x60/0x150 [obdclass]
 [<ffffffffa05e8410>] ? cl_io_start+0x0/0x140 [obdclass]
 [<ffffffffa09bb4d1>] lov_io_end_wrapper+0xf1/0x100 [lov]
 [<ffffffffa09bb228>] lov_io_call+0x88/0x120 [lov]
 [<ffffffffa09bccac>] lov_io_end+0x4c/0xf0 [lov]
 [<ffffffffa05e7a30>] cl_io_end+0x60/0x150 [obdclass]
 [<ffffffffa05eb362>] cl_io_loop+0xc2/0x1b0 [obdclass]
 [<ffffffffa0a33ade>] ll_file_ioctl+0x2ece/0x3eb0 [lustre]
 [<ffffffff8115a409>] ? handle_mm_fault+0x299/0x3d0
 [<ffffffff81052204>] ? __do_page_fault+0x1f4/0x500
 [<ffffffff811af522>] vfs_ioctl+0x22/0xa0
 [<ffffffff811af9ea>] do_vfs_ioctl+0x3aa/0x580
 [<ffffffff811afc41>] sys_ioctl+0x81/0xa0
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b

On OST0 dmesg:

LNet: Service thread pid 7920 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Pid: 7920, comm: ll_ost_io00_012

Call Trace:
 [<ffffffff8113e758>] ? __alloc_pages_nodemask+0x548/0x950
 [<ffffffff81177d9a>] ? alloc_pages_current+0xaa/0x110
 [<ffffffff8112eca7>] ? __page_cache_alloc+0x87/0x90
 [<ffffffff8112fa3f>] ? find_or_create_page+0x4f/0xb0
 [<ffffffffa0d3a412>] ? osd_bufs_get+0x2b2/0x4b0 [osd_ldiskfs]
 [<ffffffffa0e96fd0>] ? ofd_ladvise_hdl+0x8f0/0xdc0 [ofd]
 [<ffffffffa0840201>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
 [<ffffffffa08a449c>] ? tgt_request_handle+0x8ec/0x1440 [ptlrpc]
 [<ffffffffa0850ba1>] ? ptlrpc_main+0xd31/0x1800 [ptlrpc]
 [<ffffffffa084fe70>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
 [<ffffffff810a640e>] ? kthread+0x9e/0xc0
 [<ffffffff8100c28a>] ? child_rip+0xa/0x20
 [<ffffffff810a6370>] ? kthread+0x0/0xc0
 [<ffffffff8100c280>] ? child_rip+0x0/0x20

LustreError: dumping log to /tmp/lustre-log.1479431579.7920

After running some tests on a different client to see if the file system was responsive, dmesg on OST0 has:

Lustre: 7912:0:(service.c:1331:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
  req@ffff88007ace3450 x1551266786862416/t0(0) o21->5959cf4b-6c57-f313-be15-ccf7ea98876e@10.100.4.158@tcp:-1/-1 lens 504/432 e 1 to 0 dl 1479432506 ref 2 fl Interpret:/0/0 rc 0/0
Lustre: scratch-OST0000: Client 5959cf4b-6c57-f313-be15-ccf7ea98876e (at 10.100.4.158@tcp) reconnecting
Lustre: scratch-OST0000: Connection restored to 5959cf4b-6c57-f313-be15-ccf7ea98876e (at 10.100.4.158@tcp)
LNet: Service thread pid 31614 was inactive for 762.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Pid: 31614, comm: ll_ost_io00_019

Call Trace:
 [<ffffffff8107405a>] __cond_resched+0x2a/0x40
 [<ffffffff8113e339>] ? __alloc_pages_nodemask+0x129/0x950
 [<ffffffff8117f10c>] ? transfer_objects+0x5c/0x80
 [<ffffffff8118d2a5>] ? mem_cgroup_charge_common+0xa5/0xd0
 [<ffffffff8118e0b0>] ? mem_cgroup_cache_charge+0xc0/0xd0
 [<ffffffff81177d9a>] ? alloc_pages_current+0xaa/0x110
 [<ffffffff8112eca7>] ? __page_cache_alloc+0x87/0x90
 [<ffffffff8112fa3f>] ? find_or_create_page+0x4f/0xb0
 [<ffffffffa0d3a412>] ? osd_bufs_get+0x2b2/0x4b0 [osd_ldiskfs]
 [<ffffffffa0e96fd0>] ? ofd_ladvise_hdl+0x8f0/0xdc0 [ofd]
 [<ffffffffa0840201>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
 [<ffffffffa08a449c>] ? tgt_request_handle+0x8ec/0x1440 [ptlrpc]
 [<ffffffffa0850ba1>] ? ptlrpc_main+0xd31/0x1800 [ptlrpc]
 [<ffffffffa084fe70>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
 [<ffffffff810a640e>] ? kthread+0x9e/0xc0
 [<ffffffff8100c28a>] ? child_rip+0xa/0x20
 [<ffffffff810a6370>] ? kthread+0x0/0xc0
 [<ffffffff8100c280>] ? child_rip+0x0/0x20

LustreError: dumping log to /tmp/lustre-log.1479432668.31614
Lustre: scratch-OST0000: Client 28158de5-48d2-2404-815b-3bb36c5b7839 (at 10.100.4.157@tcp) reconnecting
Lustre: scratch-OST0000: Connection restored to 28158de5-48d2-2404-815b-3bb36c5b7839 (at 10.100.4.157@tcp)

I do not know how repeatable this is. I tried to do another 'lfs advise' call on a different node and it is hung also. On this node I ran the following commands until the last one hung:

# lfs ladvise -a willread /lustre/scratch/space_file2 --length 500000000099k -s 0 
[root@eagle-48vm5 ~]# lfs ladvise -a willread /lustre/scratch/space_file2 --length 500000000099k -s 0k
[root@eagle-48vm5 ~]# lfs ladvise -a willread /lustre/scratch/space_file2 --length 500000000099k -s 0M
[root@eagle-48vm5 ~]# lfs ladvise -a willread /lustre/scratch/space_file2 --length 500000000099k -s 0T
[root@eagle-48vm5 ~]# lfs ladvise -a willread /lustre/scratch/space_file2 --length 500000000099k -s 0G
[root@eagle-48vm5 ~]# lfs ladvise -a willread /lustre/scratch/space_file2 --length 500000000099k -s 1G

With dmesg on this client:

Lustre: 7929:0:(client.c:2111:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1479431906/real 1479431906]  req@ffff88007bd859c0 x1551266786862416/t0(0) o21->scratch-OST0000-osc-ffff880078ae4800@10.100.4.155@tcp:6/4 lens 504/432 e 1 to 1 dl 1479432507 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
Lustre: scratch-OST0000-osc-ffff880078ae4800: Connection to scratch-OST0000 (at 10.100.4.155@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: scratch-OST0000-osc-ffff880078ae4800: Connection restored to 10.100.4.155@tcp (at 10.100.4.155@tcp)
INFO: task lfs:26783 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.6.1.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfs           D 0000000000000001     0 26783  15019 0x00000084
 ffff88007b5a7af8 0000000000000086 ffff88007b5a7ac0 ffff88007b5a7abc
 0000000000000020 ffff88007f824380 00001e649680e0ec ffff880002216ec0
 0000000000000400 0000000101f954bb ffff880037abe5f8 ffff88007b5a7fd8
Call Trace:
 [<ffffffff815498b5>] schedule_timeout+0x215/0x2e0
 [<ffffffffa07765d3>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc]
 [<ffffffffa07ac493>] ? ptlrpcd_add_req+0x1e3/0x2a0 [ptlrpc]
 [<ffffffffa078e0c0>] ? lustre_swab_ladvise+0x0/0x50 [ptlrpc]
 [<ffffffff81549513>] wait_for_common+0x123/0x180
 [<ffffffff8106c500>] ? default_wake_function+0x0/0x20
 [<ffffffffa09bb3e0>] ? lov_io_end_wrapper+0x0/0x100 [lov]
 [<ffffffff8154964d>] wait_for_completion+0x1d/0x20
 [<ffffffffa0b336f0>] osc_io_ladvise_end+0x40/0x60 [osc]
 [<ffffffffa05e7a30>] cl_io_end+0x60/0x150 [obdclass]
 [<ffffffffa05e8410>] ? cl_io_start+0x0/0x140 [obdclass]
 [<ffffffffa09bb4d1>] lov_io_end_wrapper+0xf1/0x100 [lov]
 [<ffffffffa09bb228>] lov_io_call+0x88/0x120 [lov]
 [<ffffffffa09bccac>] lov_io_end+0x4c/0xf0 [lov]
 [<ffffffffa05e7a30>] cl_io_end+0x60/0x150 [obdclass]
 [<ffffffffa05eb362>] cl_io_loop+0xc2/0x1b0 [obdclass]
 [<ffffffffa0a33ade>] ll_file_ioctl+0x2ece/0x3eb0 [lustre]
 [<ffffffff8115a409>] ? handle_mm_fault+0x299/0x3d0
 [<ffffffff81052204>] ? __do_page_fault+0x1f4/0x500
 [<ffffffff811af522>] vfs_ioctl+0x22/0xa0
 [<ffffffff811af9ea>] do_vfs_ioctl+0x3aa/0x580
 [<ffffffff811afc41>] sys_ioctl+0x81/0xa0
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b


 Comments   
Comment by Andreas Dilger [ 18/Nov/16 ]

James, is this ldiskfs or ZFS? Not sure that it matters, but just in case.

Li Xi, can you please make a patch that limits the ladvise start and end to the object's size, so that it doesn't try filling the pages beyond the end of the object. Ideally, we would also use FIEMAP or SEEK_HOLE/SEEK_DATA to skip holes in the object.

Comment by James Nunez (Inactive) [ 18/Nov/16 ]

All testing reported here is with ldiskfs.

Comment by Andreas Dilger [ 25/Feb/17 ]

Li Xi, any update on this?

Comment by Li Xi (Inactive) [ 19/May/17 ]

Sorry for the delay. We will work on this.

Comment by Gerrit Updater [ 19/May/17 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/27209
Subject: LU-8849 ofd: Client hanges on ladvise with large start values
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b7772195e13edc573b33c49ff2a7e5de2d4de0bf

Comment by Gerrit Updater [ 22/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27209/
Subject: LU-8849 ofd: Client hanges on ladvise with large start values
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 829a24f644ceb453124b72c94584404d2e97f4f9

Comment by Minh Diep [ 22/Jul/17 ]

fix landed in lustre 2.11.0

Generated at Sat Feb 10 02:21:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.