Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Medium
Fix Version/s: None
Affects Version/s: Lustre 2.17.0, Lustre 2.18.0
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

I was testing out a workload that was reported in GitHub:

# python
Python 3.9.17 (main, Aug  9 2023, 00:00:00) 
[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> data = np.random.random((6000, 500, 40))
>>> np.save("/mnt/test.npy", data)

on a recent build (2.17.50-85-gc606081, Ubuntu 6.8.0-88-generic x86_64 kernel).

This is trying to do a single 3GiB write() system call. In the GitHub issue, strace is showing a fault returned to the caller:

write(4, "\360\313"..., 3072000000) = -1 EFAULT (Bad address)

In my testing (possibly affected by kernel or Lustre version), rather than returning an error my python3 process hung in IO (unkillable idle sleep) after having written out about 1GiB of memory:

# pgrep python
627533
# cat /proc/627533/stack
osc_extent_wait+0x44f/0x640 [osc]
osc_cache_wait_range+0x12f/0xa70 [osc]
osc_io_fsync_end+0x7c/0xd0 [osc]
cl_io_end+0x7b/0x190 [obdclass]
lov_io_end_wrapper+0xe5/0xf0 [lov]
lov_io_fsync_end+0x81/0x1e0 [lov]
cl_io_end+0x7b/0x190 [obdclass]
cl_io_loop+0xf2/0x240 [obdclass]
cl_sync_file_range+0x2b0/0x370 [lustre]
ll_writepages+0x1e3/0x2a0 [lustre]
do_writepages+0xd0/0x1f0
filemap_fdatawrite_wbc+0x75/0xb0
__filemap_fdatawrite_range+0x58/0x80
filemap_write_and_wait_range+0x59/0xc0
direct_write_fallback+0x4c/0xe0
__generic_file_write_iter+0xa4/0xd0
vvp_io_write_start+0x22d/0x10b0 [lustre]
cl_io_start+0x87/0x190 [obdclass]
cl_io_loop+0xe5/0x240 [obdclass]
ll_file_io_generic+0x666/0x1300 [lustre]
do_file_write_iter+0x7f1/0xe10 [lustre]
ll_file_write_iter+0x4b/0x110 [lustre]
vfs_write+0x2a8/0x480

GDB shows this to be at wait_event_idle() in osc_extent_wait():

# gdb lustre/osc/osc.ko
(gdb)  list *(osc_extent_wait+0x44f)
0x3019f is in osc_extent_wait (/usr/src/lustre-head/lustre/osc/osc_cache.c:994).
989		if (rc == 0) {
990			OSC_EXTENT_DUMP(D_ERROR, ext,
991				"%s: wait ext to %u timedout, recovery in progress?\n",
992				cli_name(osc_cli(obj)), state);
993	
994			wait_event_idle(ext->oe_waitq,
995					smp_load_acquire(&ext->oe_state) == state);
996		}
997		if (ext->oe_rc < 0)
998			rc = ext->oe_rc;

and indeed this error message was printed on the client console:

LustreError: 627533:0:(osc_cache.c:990:osc_extent_wait()) extent 000000000fdab39e@{[228352 -> 229375/229375],
    [3|1|-|active|wiuY|00000000d732b228], [4218880|1024|+|-|00000000ab2def22|1024|0000000000000000]}
    myth-OST0002-osc-ffff89deea9bf000: wait ext to 0 timedout, recovery in progress?
LustreError: 627533:0:(osc_cache.c:990:osc_extent_wait()) ### extent: 000000000fdab39e ns: myth-OST0002-osc-ffff89deea9bf000
    lock: 00000000ab2def22/0xd01e87e953e26bd2 lrc: 4/0,1 mode: PW/PW res: [0x2c0000400:0x3d757:0x0].0x0
    rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) gid 0 flags: 0x800020000020000 nid: local
    remote: 0x198a66a0ec5e0d26 expref: -99 pid: 627533 timeout: 0 lvb_type: 1 lru_score: 4 lru_type: 1

It isn't clear why the process is waiting, since the client and server are both responsive to all other commands (ls -l on the file, etc).

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

patched_2-15-90client_debug.log
8.91 MB
10/Mar/26 5:33 AM
patched_2-17client_debug.log
8.91 MB
10/Mar/26 5:35 AM

duplicates

LU-19536 fault_in_iov_iter_readable() with a spinlock held

In Progress

is related to

LU-19536 fault_in_iov_iter_readable() with a spinlock held

In Progress

is related to

LU-5469 Intermittent Clients hang during IO

Resolved

DDN-6837 Loading...

Assignee:: Patrick Farrell

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 26/Feb/26 8:40 PM

Updated:: 20/May/26 6:23 AM

Resolved:: 06/Mar/26 1:47 AM

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates