[LU-4292] Lustre data loss when client hang up Created: 22/Nov/13 Updated: 26/Apr/17 Resolved: 26/Apr/17 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Alexander | Assignee: | James Nunez (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | server | ||
| Environment: |
OS RHEL 6.4, 6.3, Lustre versions lustre 2.1.6, 2.3, 2.4.1, 2.5. The error was watched on the different hardwares. |
||
| Severity: | 3 |
| Rank (Obsolete): | 11775 |
| Description |
|
Error watched on Lustre 2.1.6 and later. Rhel 6.x data loss. If the file was open or modified approximately in 30 seconds before the server failed, all data in this file will be erased. Somebody watched a similar error? |
| Comments |
| Comment by Andreas Dilger [ 23/Nov/13 ] |
|
Could you please explain your testing process further? Does this affect existing files that are opened, or only newly created files? If the server fails, but the client is still running, and then the server is restarted, is there data loss? |
| Comment by Alexander [ 25/Nov/13 ] |
|
There is a prototype consisting of 2 servers. One server of meta data and object data. The second server is used as the client. Let's allow in file system already there was a certain file_1.txt file in which "a, b, c" was written. If from the client to execute record in this file "echo d >> /mnt/file_1.txt" that in file /proc/fs/osc/lustre-OST-*/cur_dirty_bytes will appear instead of 0 4096 (byte). Data are stored in "dirty_data" about 30 seconds then probably it is carried out from disk writing. And so, if the client machine fails before these will expire 30 seconds that in the /mnt/file_1.txt file will be gone not only those data which were just written (by means of the "echo d >> /mnt/file_1.txt" command), but also those that were written to it. I.e. this file will be empty. |
| Comment by Peter Jones [ 27/Nov/13 ] |
|
James Could you please see if you can reproduce this reported behavior? Thanks Peter |
| Comment by Alexander [ 09/Jan/14 ] |
|
It would be desirable to learn, whether it managed to you to repeat this situation? |
| Comment by James Nunez (Inactive) [ 09/Jan/14 ] |
|
Alexander, I'm having trouble recreating this problem. I have a single server with MDS/MGS and OSS and a single client. I write to a file using "echo" and I can see the cur_dirty_bytes stay at 4096 for a while and then change to 0, but I can't get any data loss or corruption. To fail the client I've used reboot and shutdown immediately after the echo command was issued. So, I may be doing things differently from you. How did you fail the client? |
| Comment by Alexander [ 10/Jan/14 ] |
|
James, In case of reboot or poweroff loss of data isn't watched. Loss of data is watched in case of hangup of an operating system or for example forced switch-off of power supply from the client. Respectively, if the client hangs up or will be compulsorily disconnected from power supply at that moment while in the cur_dirty_bytes file there will be a value 4096 that there will be a loss of data. |
| Comment by James Nunez (Inactive) [ 13/Jan/14 ] |
|
Alexander, I've moved my testing to using VMs so that I can shut off machines to mimic a node going down. Here is what I can reproduce, using the b2_3 branch of Lustre: I have a single node MDS and OSS and two clients. One client writes data to a file called test_file and the write completes normally. That same client writes data to the existing file, I print cur_dirty_bytes which is 4096 and then I power off the VM, which does not flush the cache. I go to the second client and try and list the contents of the Lustre file system and it hangs. If I control C the listing, I get something like: [root@client2 ~]# ls -l /lustre/scratch/ ^Cls: cannot access /lustre/scratch/test_file: Interrupted system call total 0 -????????? ? ? ? ? ? test_file If I wait for a couple of minutes, the original file/data is restored, but the data in flight never makes it to the file: [root@client2 ~]# ls /lustre/scratch/ client_2_touch test_file [root@client2 ~]# more /lustre/scratch/test_file a, b, c d, e, f g, h, i Does this match your experience? If you wait a few minutes, does the file that your client was writing to, /mnt/file_1 in your example, come back and is the original data "a, b, c" there? |
| Comment by Alexander [ 14/Jan/14 ] |
|
James, The experiment made by you matches that I saw off except for one point. I made experiment not on the VM and on real servers, respectively the cache on real servers was cleared (dropped). |
| Comment by Jinshan Xiong (Inactive) [ 15/Jan/14 ] |
|
Hi Alexander, when you saw the empty, do you notice what's the file size? You can try to write something into the file, and then make sure the content has been written to OST, and then reproduce this problem. For example: echo "hello world" > test_file; After you've reproduced the problem, please check what the file size is. Thanks |
| Comment by Alexander [ 16/Jan/14 ] |
|
Hi Jinshan, I wrote similar procedure: #!/bin/bash path=/proc/fs/lustre/osc ost_num=test-OST000 name=cur_dirty_bytes while [ 1 ]; do for j in 0 1; do c=`cat $(path)/$(ost_num)$(j)*/$(name)`; if [ $(c) -ne 0 ]; then sync; fi done sleep 0.3; done This script forces to drop the memories this from a cache (if they there are) on a disk with an interval approximately to equal 1 second. But it only minimizes possibility of a similar situation. On default about 30 seconds are stored in file system of cur_dirty_bytes then they automatically register from a memory cache in a disk. Thus the probability of loss of data remains if the client hangs up in that timepoint when the memories this in a cache still will be and the sync command yet didn't pass. |
| Comment by James Nunez (Inactive) [ 16/Jan/14 ] |
|
Alexander, Can you reproduce the data loss on your cluster/nodes? In other words, is this a persistent problem, does it happen occasionally or did it stop happening? Thanks |
| Comment by Alexander [ 17/Jan/14 ] |
|
James, Yes at me this problem arises stablly, in case of described by me in the previous messages conditions. I can reproduce this error on the equipment also. |
| Comment by Alexander [ 03/Mar/14 ] |
|
James, Hello, whether there were at you any news on this subject? |
| Comment by James Nunez (Inactive) [ 12/Mar/14 ] |
|
Alexander, Nothing new to report; I am not able to reproduce this error. Would you please let me know what your mount options are when you experience the data loss? Also, do you have logs from these nodes when the data loss happens and would you make those available to me? Thanks, |
| Comment by Alexandr [ 14/Apr/14 ] |
|
James, I reproduced this situation on the following prototype: The sequence of actions was following: On VM2 write data to /mnt/file: On VM1 do: The file at /mnt have size 79650. Force poweroff VM2. On VM1 do: After about four minutes of waiting the command was executed also the file it seems not empty, but in case of repeated execution of the same command we receive: [root@n1 common]# date && ll /mnt/file && date The file is empty. The VM2 server thus remained switched off. The output of file /var/log/messages of the VM1 server from the moment of removal of a supply from VM2 is given below: Apr 10 02:58:40 n1 kernel: Lustre: 3118:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1397113113/real 1397113113] req@ffff88005dc90400 x1464979110756852/t0(0) o106->lustre-OST0000@11.0.0.32@tcp:15/16 lens 296/280 e 0 to 1 dl 1397113120 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 |
| Comment by James Nunez (Inactive) [ 10/Mar/17 ] |
|
Alexander, Thanks |
| Comment by James Nunez (Inactive) [ 26/Apr/17 ] |
|
Please open a new ticket if you are still experiencing this problem. |