Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None
Environment:

Hide
[Lustre version]
https://review.whamcloud.com/#/c/fs/lustre-release/+/47730/
=>patch set 1

[Lustre client machine]
- OS: Ubuntu20.04
- kernel: 5.4.0-131-generic
- Cache: 175GB[1]
- remarks:
  - dirty_writeback_centisecs=0
  - write back chace: aging_keep mode
  - Lustre's maximum inode number is about 160,000,000.

[1] Total RAM(250GB) * dirty_background_ratio(70%) = 175GB

[2]
$ lfs df -i
UUID Inodes IUsed IFree IUse% Mounted on
exafs-MDT0000_UUID 83050496 6256 83044240 1% /mnt/lustre[MDT:0]
exafs-MDT0001_UUID 83050496 267 83050229 1% /mnt/lustre[MDT:1]
exafs-OST0000_UUID 26214400 1927 26212473 1% /mnt/lustre[OST:0]
exafs-OST0001_UUID 26214400 1834 26212566 1% /mnt/lustre[OST:1]
exafs-OST0002_UUID 26214400 1995 26212405 1% /mnt/lustre[OST:2]
exafs-OST0003_UUID 26214400 1898 26212502 1% /mnt/lustre[OST:3]

filesystem_summary: 104856469 6523 104849946 1% /mnt/lustre

Show
[Lustre version] https://review.whamcloud.com/#/c/fs/lustre-release/+/47730/ =>patch set 1 [Lustre client machine] - OS: Ubuntu20.04 - kernel: 5.4.0-131-generic - Cache: 175GB[1] - remarks:   - dirty_writeback_centisecs=0   - write back chace: aging_keep mode   - Lustre's maximum inode number is about 160,000,000. [1] Total RAM(250GB) * dirty_background_ratio(70%) = 175GB [2] $ lfs df -i UUID Inodes IUsed IFree IUse% Mounted on exafs-MDT0000_UUID 83050496 6256 83044240 1% /mnt/lustre[MDT:0] exafs-MDT0001_UUID 83050496 267 83050229 1% /mnt/lustre[MDT:1] exafs-OST0000_UUID 26214400 1927 26212473 1% /mnt/lustre[OST:0] exafs-OST0001_UUID 26214400 1834 26212566 1% /mnt/lustre[OST:1] exafs-OST0002_UUID 26214400 1995 26212405 1% /mnt/lustre[OST:2] exafs-OST0003_UUID 26214400 1898 26212502 1% /mnt/lustre[OST:3] filesystem_summary: 104856469 6523 104849946 1% /mnt/lustre

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Overview

I'm interested in WBC(Write back cache) feature[1]. So I'm testing the WBC in development.

WBC mostly works well. But, my lustre client machine crashed on some testing.

[1] https://jira.whamcloud.com/browse/LU-10938

Detail of crash

I couldn't access to the lustre mounted directory after presenting error messages on some testing. And I couldn't unmount the lustre mounted directory. And sometimes my machine crashed after that.

My machine crashed at the following timings.

Immediately after an error occurred.
A few hours after an error occurred.
The machine shutdown process was stopped after the error occurred/

There were several patterns of error messages.

Pattern1: Request sent has timed out for sent delay

Nov  9 18:49:20 anila kernel: [ 3667.591798] Lustre: 6549:0:(client.c:2284:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1667987353/real 0]     req@0000000002a4f44e x1749008903039168/t0(0) o36->exafs-MDT0000-mdc-ffff97086cd1d800@192.168.10.43@tcp:12/10 lens 520/576 e 0 to 1 dl 1667987360 ref 3 fl Rpc:Xr/0/ffffffff rc 0/-1 job:''
Nov  9 18:49:20 anila kernel: [ 3667.591810] Lustre: exafs-MDT0000-mdc-ffff97086cd1d800: Connection to exafs-MDT0000 (at 192.168.10.43@tcp) was lost; in progress operations using this service     will wait for recovery to complete
Nov  9 18:49:24 anila kernel: [ 3671.431810] Lustre: 4874:0:(client.c:2284:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1667987355/real 0]     req@000000003297c839 x1749008903114560/t0(0) o4->exafs-OST0000-osc-ffff97086cd1d800@192.168.10.43@tcp:6/4 lens 488/448 e 0 to 1 dl 1667987364 ref 3 fl Rpc:Xr/0/ffffffff rc 0/-1 job:''
Nov  9 18:49:24 anila kernel: [ 3671.431814] Lustre: 4874:0:(client.c:2284:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Nov  9 18:49:24 anila kernel: [ 3671.431833] Lustre: exafs-OST0000-osc-ffff97086cd1d800: Connection to exafs-OST0000 (at 192.168.10.43@tcp) was lost; in progress operations using this service     will wait for recovery to complete
Nov  9 18:49:25 anila kernel: [ 3672.395923] Lustre: exafs-MDT0000-mdc-ffff97086cd1d800: Connection restored to 192.168.10.43@tcp (at 192.168.10.43@tcp)
Nov  9 18:49:33 anila kernel: [ 3680.231831] Lustre: 4901:0:(client.c:2284:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1667987365/real 1667987371]    req@00000000be7eddcb x1749008903114624/t0(0) o4->exafs-OST0000-osc-ffff97086cd1d800@192.168.10.43@tcp:6/4 lens 488/448 e 0 to 1 dl 1667987373 ref 2 fl Rpc:XQr/2/ffffffff rc 0/-1 job:''
Nov  9 18:49:33 anila kernel: [ 3680.231835] Lustre: 4901:0:(client.c:2284:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Nov  9 18:49:33 anila kernel: [ 3680.231853] Lustre: exafs-OST0000-osc-ffff97086cd1d800: Connection to exafs-OST0000 (at 192.168.10.43@tcp) was lost; in progress operations using this service     will wait for recovery to complete
Nov  9 18:49:41 anila kernel: [ 3688.839825] Lustre: 4877:0:(client.c:2284:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1667987369/real 1667987371]    req@00000000eedcb0b0 x1749008903115136/t0(0) o4->exafs-OST0000-osc-ffff97086cd1d800@192.168.10.43@tcp:6/4 lens 488/448 e 0 to 1 dl 1667987381 ref 2 fl Rpc:XQr/2/ffffffff rc -11/-1 job:''
Nov  9 18:49:41 anila kernel: [ 3688.839829] Lustre: 4877:0:(client.c:2284:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Nov  9 18:49:43 anila kernel: [ 3690.631834] Lustre: exafs-MDT0000-mdc-ffff97086cd1d800: Connection to exafs-MDT0000 (at 192.168.10.43@tcp) was lost; in progress operations using this service     will wait for recovery to complete
Nov  9 18:49:46 anila kernel: [ 3693.991839] Lustre: 4892:0:(client.c:2284:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1667987365/real 1667987371]    req@000000006d74b528 x1749008903117952/t0(0) o4->exafs-OST0000-osc-ffff97086cd1d800@192.168.10.43@tcp:6/4 lens 488/448 e 0 to 1 dl 1667987386 ref 2 fl Rpc:XQr/0/ffffffff rc 0/-1 job:''
Nov  9 18:49:46 anila kernel: [ 3693.991843] Lustre: 4892:0:(client.c:2284:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
Nov  9 18:49:56 anila kernel: [ 3703.943822] Lustre: 4872:0:(client.c:2284:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1667987388/real 0]     req@0000000017b47a96 x1749008903197504/t0(0) o400->exafs-OST0002-osc-ffff97086cd1d800@192.168.10.44@tcp:28/4 lens 224/224 e 0 to 1 dl 1667987396 ref 2 fl Rpc:XNr/0/ffffffff rc 0/-1 job:''
Nov  9 18:49:56 anila kernel: [ 3703.943826] Lustre: 4872:0:(client.c:2284:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Nov  9 18:49:56 anila kernel: [ 3703.943840] Lustre: exafs-OST0002-osc-ffff97086cd1d800: Connection to exafs-OST0002 (at 192.168.10.44@tcp) was lost; in progress operations using this service     will wait for recovery to complete
Nov  9 18:49:56 anila kernel: [ 3703.943846] LustreError: 166-1: MGC192.168.10.43@tcp: Connection to MGS (at 192.168.10.43@tcp) was lost; in progress operations using this service will fail
Nov  9 18:49:58 anila kernel: [ 3705.255880] LustreError: 4849:0:(events.c:211:client_bulk_callback()) event type 1, status -5, desc 000000005fad487b
Nov  9 18:50:10 anila kernel: [ 3717.255825] Lustre: exafs-MDT0001-mdc-ffff97086cd1d800: Connection to exafs-MDT0001 (at 192.168.10.44@tcp) was lost; in progress operations using this service     will wait for recovery to complete
Nov  9 18:50:10 anila kernel: [ 3717.953515] Lustre: exafs-MDT0001-mdc-ffff97086cd1d800: Connection restored to 192.168.10.44@tcp (at 192.168.10.44@tcp)
Nov  9 18:50:16 anila kernel: [ 3723.655861] LustreError: 4841:0:(events.c:211:client_bulk_callback()) event type 1, status -113, desc 00000000eea9e18b
Nov  9 18:50:16 anila kernel: [ 3723.655955] LustreError: 4841:0:(events.c:211:client_bulk_callback()) Skipped 7 previous similar messages

Pattern2: revalidate FID error( while the machine shutdown process )

LNetError: 4850:0:(lib-move.c:2250:lnet_handle_find_routed_path()) peer 192.168.10.43@tcp has no available nets
LNetError: 4860:0:(lib-move.c:2250:lnet_handle_find_routed_path()) Skipped 27 previous similar messages
systemd-shutdown[1]: Syncing filesystems and block devices - timed out, issuing SIGKILL ti PID 12341.
LustreError: 9158:0:(file.c:5124:ll_inode_revalidate_fini()) exafs: revalidate FID [0x200000007:0x1:0x0] error: rc = -4]

Pattern3: Bad RIP value

Nov 15 18:19:34 anila kernel: [ 2660.738158] INFO: task sync:5644 blocked for more than 120 seconds.
Nov 15 18:19:34 anila kernel: [ 2660.738223] Tainted: P OE 5.4.0-131-generic #147-Ubuntu
Nov 15 18:19:34 anila kernel: [ 2660.738274] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 15 18:19:34 anila kernel: [ 2660.738333] sync D 0 5644 5172 0x00004000
Nov 15 18:19:34 anila kernel: [ 2660.738337] Call Trace:
Nov 15 18:19:34 anila kernel: [ 2660.738350] __schedule+0x2e3/0x740
Nov 15 18:19:34 anila kernel: [ 2660.738355] schedule+0x42/0xb0
Nov 15 18:19:34 anila kernel: [ 2660.738364] wb_wait_for_completion+0x56/0x90
Nov 15 18:19:34 anila kernel: [ 2660.738371] ? __wake_up_pollfree+0x40/0x40
Nov 15 18:19:34 anila kernel: [ 2660.738375] sync_inodes_sb+0xd8/0x2a0
Nov 15 18:19:34 anila kernel: [ 2660.738382] sync_inodes_one_sb+0x15/0x20
Nov 15 18:19:34 anila kernel: [ 2660.738389] iterate_supers+0xa3/0x100
Nov 15 18:19:34 anila kernel: [ 2660.738393] ? default_file_splice_write+0x30/0x30
Nov 15 18:19:34 anila kernel: [ 2660.738398] ksys_sync+0x42/0xb0
Nov 15 18:19:34 anila kernel: [ 2660.738402] __ia32_sys_sync+0xe/0x20
Nov 15 18:19:34 anila kernel: [ 2660.738409] do_syscall_64+0x57/0x190
Nov 15 18:19:34 anila kernel: [ 2660.738415] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 15 18:19:34 anila kernel: [ 2660.738419] RIP: 0033:0x7f34a384022b
Nov 15 18:19:34 anila kernel: [ 2660.738428] Code: Bad RIP value.
Nov 15 18:19:34 anila kernel: [ 2660.738431] RSP: 002b:00007ffd325c6158 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
Nov 15 18:19:34 anila kernel: [ 2660.738434] RAX: ffffffffffffffda RBX: 00007ffd325c6298 RCX: 00007f34a384022b
Nov 15 18:19:34 anila kernel: [ 2660.738436] RDX: 00007f34a391a501 RSI: 0000000000000000 RDI: 00007f34a38e0bc0
Nov 15 18:19:34 anila kernel: [ 2660.738437] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
Nov 15 18:19:34 anila kernel: [ 2660.738439] R10: 00007f34a3914640 R11: 0000000000000246 R12: 0000000000000000
Nov 15 18:19:34 anila kernel: [ 2660.738440] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Pattern4: Out of memory

Out of memory message

Tests that caused the error

Write a 20GB file

I set the 100ms latency of the network used for Luster. I set it by tc command. The purpose of latency is to simulate the case where the client is far away from the Lustre server.
I ran the following command to write a 20GB file.

 # ./ior --dataPacketType=timestamp -C -Q 1 -g -G 1836258435 -k -e -o /mnt/lustre/io500/test/ior_file_easy -t 1m -F -w -D 300 -O stoneWallingWearOut=1 -a POSIX -b 20480m

fsync was executed because -e option was specified.
Error didn't occur when sync was not executed.
I think that syncing a 20GB file is a ordinary operation.

Execute IOR, MDTest continuously.

I didn't set latency.
I executed IOR, MDTest continuously without fsync.

 # mpiexec -np 4 ./ior --dataPacketType=timestamp -g -G 1836258167 -k -o /mnt/lustre/io500/test/ior-easy/ior_file_easy -t 1m -b 25600m -F -w -D 300 -O stoneWallingWearOut=1 -a POSIX
 # sudo -n bash -c "echo 3 > /proc/sys/vm/drop_caches"
 # mpiexec -np 4 ./mdtest --dataPacketType=timestamp -n 1000000 -u -L -F -P -G -1572531335 -N 0 -d /mnt/lustre/io500/test/mdtest-easy -C -W 300 -a POSIX
 # sudo -n bash -c "echo 3 > /proc/sys/vm/drop_caches"
 # mpiexec -np 4 ./ior --dataPacketType=timestamp -g -G=1836268855 -k -o /mnt/lustre/io500/test/ior-hard/file -t 47008 -b 47008 -s 500000 -w -D 300 -a POSIX -O stoneWallingWearOut=1
 # sudo -n bash -c "echo 3 > /proc/sys/vm/drop_caches"
 # mpiexec -np 4 ./mdtest --dataPacketType=timestamp -n 1000000 -t -w 3901 -e 3901 -P -G=-1572520647 -N 0 -F -d /mnt/lustre/io500/test/mdtest-hard -C -W 300 -a POSIX

IOR-easy write: 100GB
MDTest-easy write: 4,000,000 files
IOR-hard write: 500000 segment (About 85GB)
MDTest-hard write: 4,000,000 files
Error didn't occur when I didn't execute fsync.

Sync a lot of files

I didn't set latency.
I ran MDTest without fsync. A lot of files were cached in WBC.
I ran sync.

 # mpiexec -np 4 ./mdtest --dataPacketType=timestamp -n 1000000 -u -L -F -P -G -1572531335 -N 0 -d /mnt/lustre/test/mdtest-easy -C -W 300 -a POSIX
 # sync

my prediction

Releasing WBC will take a lot of time if I set the latency or sync a lot of files. As a result, request will be timed out.

No error occurred when I ran IOR/MDTest with fsync option in no-latency environment. So I think that syncing in small steps will inhibit this error.

Attachments

Issue Links

is related to

LU-10938 Metadata writeback cache support

Open

Activity

[LU-16361] Lustre client was crashed when I used WBC

There are no comments yet on this issue.

People

Assignee:: WC Triage

Reporter:: Honjo Rikimaru

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 02/Dec/22 8:58 AM

Updated:: 06/Dec/22 12:25 AM