Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
[Lustre version]
https://review.whamcloud.com/#/c/fs/lustre-release/+/47730/
=>patch set 1
[Lustre client machine]
- OS: Ubuntu20.04
- kernel: 5.4.0-131-generic
- Cache: 175GB[1]
- remarks:
- dirty_writeback_centisecs=0
- write back chace: aging_keep mode
- Lustre's maximum inode number is about 160,000,000.
[1] Total RAM(250GB) * dirty_background_ratio(70%) = 175GB
[2]
$ lfs df -i
UUID Inodes IUsed IFree IUse% Mounted on
exafs-MDT0000_UUID 83050496 6256 83044240 1% /mnt/lustre[MDT:0]
exafs-MDT0001_UUID 83050496 267 83050229 1% /mnt/lustre[MDT:1]
exafs-OST0000_UUID 26214400 1927 26212473 1% /mnt/lustre[OST:0]
exafs-OST0001_UUID 26214400 1834 26212566 1% /mnt/lustre[OST:1]
exafs-OST0002_UUID 26214400 1995 26212405 1% /mnt/lustre[OST:2]
exafs-OST0003_UUID 26214400 1898 26212502 1% /mnt/lustre[OST:3]
filesystem_summary: 104856469 6523 104849946 1% /mnt/lustre[Lustre version] https://review.whamcloud.com/#/c/fs/lustre-release/+/47730/ =>patch set 1 [Lustre client machine] - OS: Ubuntu20.04 - kernel: 5.4.0-131-generic - Cache: 175GB[1] - remarks: - dirty_writeback_centisecs=0 - write back chace: aging_keep mode - Lustre's maximum inode number is about 160,000,000. [1] Total RAM(250GB) * dirty_background_ratio(70%) = 175GB [2] $ lfs df -i UUID Inodes IUsed IFree IUse% Mounted on exafs-MDT0000_UUID 83050496 6256 83044240 1% /mnt/lustre[MDT:0] exafs-MDT0001_UUID 83050496 267 83050229 1% /mnt/lustre[MDT:1] exafs-OST0000_UUID 26214400 1927 26212473 1% /mnt/lustre[OST:0] exafs-OST0001_UUID 26214400 1834 26212566 1% /mnt/lustre[OST:1] exafs-OST0002_UUID 26214400 1995 26212405 1% /mnt/lustre[OST:2] exafs-OST0003_UUID 26214400 1898 26212502 1% /mnt/lustre[OST:3] filesystem_summary: 104856469 6523 104849946 1% /mnt/lustre
-
3
-
9223372036854775807
Description
Overview
I'm interested in WBC(Write back cache) feature[1]. So I'm testing the WBC in development.
WBC mostly works well. But, my lustre client machine crashed on some testing.
[1] https://jira.whamcloud.com/browse/LU-10938
Detail of crash
I couldn't access to the lustre mounted directory after presenting error messages on some testing. And I couldn't unmount the lustre mounted directory. And sometimes my machine crashed after that.
My machine crashed at the following timings.
- Immediately after an error occurred.
- A few hours after an error occurred.
- The machine shutdown process was stopped after the error occurred/
There were several patterns of error messages.
Pattern1: Request sent has timed out for sent delay
Nov 9 18:49:20 anila kernel: [ 3667.591798] Lustre: 6549:0:(client.c:2284:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1667987353/real 0] req@0000000002a4f44e x1749008903039168/t0(0) o36->exafs-MDT0000-mdc-ffff97086cd1d800@192.168.10.43@tcp:12/10 lens 520/576 e 0 to 1 dl 1667987360 ref 3 fl Rpc:Xr/0/ffffffff rc 0/-1 job:'' Nov 9 18:49:20 anila kernel: [ 3667.591810] Lustre: exafs-MDT0000-mdc-ffff97086cd1d800: Connection to exafs-MDT0000 (at 192.168.10.43@tcp) was lost; in progress operations using this service will wait for recovery to complete Nov 9 18:49:24 anila kernel: [ 3671.431810] Lustre: 4874:0:(client.c:2284:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1667987355/real 0] req@000000003297c839 x1749008903114560/t0(0) o4->exafs-OST0000-osc-ffff97086cd1d800@192.168.10.43@tcp:6/4 lens 488/448 e 0 to 1 dl 1667987364 ref 3 fl Rpc:Xr/0/ffffffff rc 0/-1 job:'' Nov 9 18:49:24 anila kernel: [ 3671.431814] Lustre: 4874:0:(client.c:2284:ptlrpc_expire_one_request()) Skipped 1 previous similar message Nov 9 18:49:24 anila kernel: [ 3671.431833] Lustre: exafs-OST0000-osc-ffff97086cd1d800: Connection to exafs-OST0000 (at 192.168.10.43@tcp) was lost; in progress operations using this service will wait for recovery to complete Nov 9 18:49:25 anila kernel: [ 3672.395923] Lustre: exafs-MDT0000-mdc-ffff97086cd1d800: Connection restored to 192.168.10.43@tcp (at 192.168.10.43@tcp) Nov 9 18:49:33 anila kernel: [ 3680.231831] Lustre: 4901:0:(client.c:2284:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1667987365/real 1667987371] req@00000000be7eddcb x1749008903114624/t0(0) o4->exafs-OST0000-osc-ffff97086cd1d800@192.168.10.43@tcp:6/4 lens 488/448 e 0 to 1 dl 1667987373 ref 2 fl Rpc:XQr/2/ffffffff rc 0/-1 job:'' Nov 9 18:49:33 anila kernel: [ 3680.231835] Lustre: 4901:0:(client.c:2284:ptlrpc_expire_one_request()) Skipped 1 previous similar message Nov 9 18:49:33 anila kernel: [ 3680.231853] Lustre: exafs-OST0000-osc-ffff97086cd1d800: Connection to exafs-OST0000 (at 192.168.10.43@tcp) was lost; in progress operations using this service will wait for recovery to complete Nov 9 18:49:41 anila kernel: [ 3688.839825] Lustre: 4877:0:(client.c:2284:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1667987369/real 1667987371] req@00000000eedcb0b0 x1749008903115136/t0(0) o4->exafs-OST0000-osc-ffff97086cd1d800@192.168.10.43@tcp:6/4 lens 488/448 e 0 to 1 dl 1667987381 ref 2 fl Rpc:XQr/2/ffffffff rc -11/-1 job:'' Nov 9 18:49:41 anila kernel: [ 3688.839829] Lustre: 4877:0:(client.c:2284:ptlrpc_expire_one_request()) Skipped 1 previous similar message Nov 9 18:49:43 anila kernel: [ 3690.631834] Lustre: exafs-MDT0000-mdc-ffff97086cd1d800: Connection to exafs-MDT0000 (at 192.168.10.43@tcp) was lost; in progress operations using this service will wait for recovery to complete Nov 9 18:49:46 anila kernel: [ 3693.991839] Lustre: 4892:0:(client.c:2284:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1667987365/real 1667987371] req@000000006d74b528 x1749008903117952/t0(0) o4->exafs-OST0000-osc-ffff97086cd1d800@192.168.10.43@tcp:6/4 lens 488/448 e 0 to 1 dl 1667987386 ref 2 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' Nov 9 18:49:46 anila kernel: [ 3693.991843] Lustre: 4892:0:(client.c:2284:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Nov 9 18:49:56 anila kernel: [ 3703.943822] Lustre: 4872:0:(client.c:2284:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1667987388/real 0] req@0000000017b47a96 x1749008903197504/t0(0) o400->exafs-OST0002-osc-ffff97086cd1d800@192.168.10.44@tcp:28/4 lens 224/224 e 0 to 1 dl 1667987396 ref 2 fl Rpc:XNr/0/ffffffff rc 0/-1 job:'' Nov 9 18:49:56 anila kernel: [ 3703.943826] Lustre: 4872:0:(client.c:2284:ptlrpc_expire_one_request()) Skipped 1 previous similar message Nov 9 18:49:56 anila kernel: [ 3703.943840] Lustre: exafs-OST0002-osc-ffff97086cd1d800: Connection to exafs-OST0002 (at 192.168.10.44@tcp) was lost; in progress operations using this service will wait for recovery to complete Nov 9 18:49:56 anila kernel: [ 3703.943846] LustreError: 166-1: MGC192.168.10.43@tcp: Connection to MGS (at 192.168.10.43@tcp) was lost; in progress operations using this service will fail Nov 9 18:49:58 anila kernel: [ 3705.255880] LustreError: 4849:0:(events.c:211:client_bulk_callback()) event type 1, status -5, desc 000000005fad487b Nov 9 18:50:10 anila kernel: [ 3717.255825] Lustre: exafs-MDT0001-mdc-ffff97086cd1d800: Connection to exafs-MDT0001 (at 192.168.10.44@tcp) was lost; in progress operations using this service will wait for recovery to complete Nov 9 18:50:10 anila kernel: [ 3717.953515] Lustre: exafs-MDT0001-mdc-ffff97086cd1d800: Connection restored to 192.168.10.44@tcp (at 192.168.10.44@tcp) Nov 9 18:50:16 anila kernel: [ 3723.655861] LustreError: 4841:0:(events.c:211:client_bulk_callback()) event type 1, status -113, desc 00000000eea9e18b Nov 9 18:50:16 anila kernel: [ 3723.655955] LustreError: 4841:0:(events.c:211:client_bulk_callback()) Skipped 7 previous similar messages
Pattern2: revalidate FID error( while the machine shutdown process )
LNetError: 4850:0:(lib-move.c:2250:lnet_handle_find_routed_path()) peer 192.168.10.43@tcp has no available nets LNetError: 4860:0:(lib-move.c:2250:lnet_handle_find_routed_path()) Skipped 27 previous similar messages systemd-shutdown[1]: Syncing filesystems and block devices - timed out, issuing SIGKILL ti PID 12341. LustreError: 9158:0:(file.c:5124:ll_inode_revalidate_fini()) exafs: revalidate FID [0x200000007:0x1:0x0] error: rc = -4]
Pattern3: Bad RIP value
Nov 15 18:19:34 anila kernel: [ 2660.738158] INFO: task sync:5644 blocked for more than 120 seconds. Nov 15 18:19:34 anila kernel: [ 2660.738223] Tainted: P OE 5.4.0-131-generic #147-Ubuntu Nov 15 18:19:34 anila kernel: [ 2660.738274] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 15 18:19:34 anila kernel: [ 2660.738333] sync D 0 5644 5172 0x00004000 Nov 15 18:19:34 anila kernel: [ 2660.738337] Call Trace: Nov 15 18:19:34 anila kernel: [ 2660.738350] __schedule+0x2e3/0x740 Nov 15 18:19:34 anila kernel: [ 2660.738355] schedule+0x42/0xb0 Nov 15 18:19:34 anila kernel: [ 2660.738364] wb_wait_for_completion+0x56/0x90 Nov 15 18:19:34 anila kernel: [ 2660.738371] ? __wake_up_pollfree+0x40/0x40 Nov 15 18:19:34 anila kernel: [ 2660.738375] sync_inodes_sb+0xd8/0x2a0 Nov 15 18:19:34 anila kernel: [ 2660.738382] sync_inodes_one_sb+0x15/0x20 Nov 15 18:19:34 anila kernel: [ 2660.738389] iterate_supers+0xa3/0x100 Nov 15 18:19:34 anila kernel: [ 2660.738393] ? default_file_splice_write+0x30/0x30 Nov 15 18:19:34 anila kernel: [ 2660.738398] ksys_sync+0x42/0xb0 Nov 15 18:19:34 anila kernel: [ 2660.738402] __ia32_sys_sync+0xe/0x20 Nov 15 18:19:34 anila kernel: [ 2660.738409] do_syscall_64+0x57/0x190 Nov 15 18:19:34 anila kernel: [ 2660.738415] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Nov 15 18:19:34 anila kernel: [ 2660.738419] RIP: 0033:0x7f34a384022b Nov 15 18:19:34 anila kernel: [ 2660.738428] Code: Bad RIP value. Nov 15 18:19:34 anila kernel: [ 2660.738431] RSP: 002b:00007ffd325c6158 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2 Nov 15 18:19:34 anila kernel: [ 2660.738434] RAX: ffffffffffffffda RBX: 00007ffd325c6298 RCX: 00007f34a384022b Nov 15 18:19:34 anila kernel: [ 2660.738436] RDX: 00007f34a391a501 RSI: 0000000000000000 RDI: 00007f34a38e0bc0 Nov 15 18:19:34 anila kernel: [ 2660.738437] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 Nov 15 18:19:34 anila kernel: [ 2660.738439] R10: 00007f34a3914640 R11: 0000000000000246 R12: 0000000000000000 Nov 15 18:19:34 anila kernel: [ 2660.738440] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Pattern4: Out of memory
Out of memory message
Tests that caused the error
Write a 20GB file
- I set the 100ms latency of the network used for Luster. I set it by tc command. The purpose of latency is to simulate the case where the client is far away from the Lustre server.
- I ran the following command to write a 20GB file.
# ./ior --dataPacketType=timestamp -C -Q 1 -g -G 1836258435 -k -e -o /mnt/lustre/io500/test/ior_file_easy -t 1m -F -w -D 300 -O stoneWallingWearOut=1 -a POSIX -b 20480m
- fsync was executed because -e option was specified.
- Error didn't occur when sync was not executed.
- I think that syncing a 20GB file is a ordinary operation.
Execute IOR, MDTest continuously.
- I didn't set latency.
- I executed IOR, MDTest continuously without fsync.
# mpiexec -np 4 ./ior --dataPacketType=timestamp -g -G 1836258167 -k -o /mnt/lustre/io500/test/ior-easy/ior_file_easy -t 1m -b 25600m -F -w -D 300 -O stoneWallingWearOut=1 -a POSIX # sudo -n bash -c "echo 3 > /proc/sys/vm/drop_caches" # mpiexec -np 4 ./mdtest --dataPacketType=timestamp -n 1000000 -u -L -F -P -G -1572531335 -N 0 -d /mnt/lustre/io500/test/mdtest-easy -C -W 300 -a POSIX # sudo -n bash -c "echo 3 > /proc/sys/vm/drop_caches" # mpiexec -np 4 ./ior --dataPacketType=timestamp -g -G=1836268855 -k -o /mnt/lustre/io500/test/ior-hard/file -t 47008 -b 47008 -s 500000 -w -D 300 -a POSIX -O stoneWallingWearOut=1 # sudo -n bash -c "echo 3 > /proc/sys/vm/drop_caches" # mpiexec -np 4 ./mdtest --dataPacketType=timestamp -n 1000000 -t -w 3901 -e 3901 -P -G=-1572520647 -N 0 -F -d /mnt/lustre/io500/test/mdtest-hard -C -W 300 -a POSIX
- IOR-easy write: 100GB
- MDTest-easy write: 4,000,000 files
- IOR-hard write: 500000 segment (About 85GB)
- MDTest-hard write: 4,000,000 files
- Error didn't occur when I didn't execute fsync.
Sync a lot of files
- I didn't set latency.
- I ran MDTest without fsync. A lot of files were cached in WBC.
- I ran sync.
# mpiexec -np 4 ./mdtest --dataPacketType=timestamp -n 1000000 -u -L -F -P -G -1572531335 -N 0 -d /mnt/lustre/test/mdtest-easy -C -W 300 -a POSIX # sync
my prediction
Releasing WBC will take a lot of time if I set the latency or sync a lot of files. As a result, request will be timed out.
No error occurred when I ran IOR/MDTest with fsync option in no-latency environment. So I think that syncing in small steps will inhibit this error.
Attachments
Issue Links
- is related to
-
LU-10938 Metadata writeback cache support
- Open