Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
None
-
3
-
15577
Description
This issue was created by maloo for Minh Diep <minh.diep@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/d54472f6-2fc6-11e4-957a-5254006e85c2.
The sub-test test_1 failed with the following error:
old and new files are different: rc=22
Info required for matching: runtests 1
Attachments
- after-umount-log
- 3.39 MB
- copy-end-log
- 3.41 MB
Activity
I've verified that with the recent landing of http://review.whamcloud.com/#/c/12103 in master I no longer see this failure when running the test on el7 clients. I think this ticket can be marked Fixed.
The umount will hang while invoke with CL_FSYNC_ALL in replay-single test_89.
{no format}[ 8119.181431] umount D ffff88003fc14580 0 4225 3612 0x00000080
[ 8119.181431] ffff880020cb7ae0 0000000000000086 ffff880020cb7fd8 0000000000014580
[ 8119.181431] ffff880020cb7fd8 0000000000014580 ffff880012100b60 ffff880009f50e80
[ 8119.181431] ffff880009f50e88 7fffffffffffffff ffff880012100b60 0000000000000000
[ 8119.181431] Call Trace:
[ 8119.181431] [<ffffffff815e71b9>] schedule+0x29/0x70
[ 8119.181431] [<ffffffff815e50b9>] schedule_timeout+0x209/0x2d0
[ 8119.181431] [<ffffffffa0783d60>] ? ptlrpcd_add_req+0x210/0x300 [ptlrpc]
[ 8119.181431] [<ffffffffa0764370>] ? lustre_swab_obdo+0x100/0x100 [ptlrpc]
[ 8119.181431] [<ffffffff815e76e6>] wait_for_completion+0x116/0x170
[ 8119.181431] [<ffffffff81097700>] ? wake_up_state+0x20/0x20
[ 8119.181431] [<ffffffffa096a944>] osc_io_fsync_end+0x74/0xa0 [osc]
[ 8119.181431] [<ffffffffa0572a2d>] cl_io_end+0x5d/0x150 [obdclass]
[ 8119.181431] [<ffffffffa09b61fb>] lov_io_end_wrapper+0xdb/0xe0 [lov]
[ 8119.181431] [<ffffffffa09b63a4>] lov_io_fsync_end+0x84/0x1c0 [lov]
[ 8119.181431] [<ffffffffa0572a2d>] cl_io_end+0x5d/0x150 [obdclass]
[ 8119.181431] [<ffffffffa0576733>] cl_io_loop+0xb3/0x190 [obdclass]
[ 8119.181431] [<ffffffffa0f0022b>] cl_sync_file_range+0x40b/0x610 [lustre]
[ 8119.181431] [<ffffffffa0f13fba>] ll_delete_inode+0x10a/0x230 [lustre]
[ 8119.181431] [<ffffffff811db91e>] ? inode_wait_for_writeback+0x2e/0x40
[ 8119.181431] [<ffffffff811cae27>] evict+0xa7/0x170
[ 8119.181431] [<ffffffff811caf2e>] dispose_list+0x3e/0x50
[ 8119.181431] [<ffffffff811cbb14>] evict_inodes+0x114/0x140
[ 8119.181431] [<ffffffff811b1fc8>] generic_shutdown_super+0x48/0xe0
[ 8119.181431] [<ffffffff811b2242>] kill_anon_super+0x12/0x20
[ 8119.181431] [<ffffffffa055880a>] lustre_kill_super+0x7a/0x80 [obdclass]
[ 8119.181431] [<ffffffff811b265d>] deactivate_locked_super+0x3d/0x60
[ 8119.181431] [<ffffffff811b26c6>] deactivate_super+0x46/0x60
[ 8119.181431] [<ffffffff811cf455>] mntput_no_expire+0xc5/0x120
[ 8119.181431] [<ffffffff811d058f>] SyS_umount+0x9f/0x3c0
[ 8119.181431] [<ffffffff815f2119>] system_call_fastpath+0x16/0x1b{no format}
But CL_FSYNC_LOCAL not. I have updated patch.
It looks like the inode is called with redirty_tail() and then put in the list of b_dirty with newer timestamp. This is why it was skipped in the 2nd __sync_filesystem() therefore dirty data is not flushed. I think Yang Sheng's patch is good to fix the problem.
When a filesystem is being umounted, it will sync dirty data by calling sync_filesystem(). Then it calls __sync_filesystem() twice: the first time just starts the write back and the 2nd time is to wait for the write back to finish. Both call should reach ll_writepages() but for unknown reason, the 2nd call didn't reach it.
My temporary solution is to make ll_writepages() to wait for write back to finish anyway:
diff --git a/lustre/llite/rw.c b/lustre/llite/rw.c index ff25525..778609b 100644 --- a/lustre/llite/rw.c +++ b/lustre/llite/rw.c @@ -1080,7 +1080,7 @@ int ll_writepages(struct address_space *mapping, struct writeback_control *wbc) } } - mode = CL_FSYNC_NONE; + mode = CL_FSYNC_LOCAL; if (wbc->sync_mode == WB_SYNC_ALL) mode = CL_FSYNC_LOCAL;
I can't see this issue any more after applying the above patch.
Yang Sheng, if you have any time, can you please investigate it further to figure out why the 2nd __sync_filesystem() didn't reach ll_writepages()?
Thanks,
Jinshan
Hi Yang Sheng, Thank you very much for the reproduce script and log, I'm looking at it now.
As JinShan asked in gerrit. Attached logs for investigation. The runtests can hardly to collect log since it copy too much files. So i use a small script to reproduce this issue. It can produce log more easy to collection and analyzed.
modprobe mbcache sh llmount.sh #echo "mdc osc class log llite lov mgc" >/proc/sys/lnet/subsystem_debug echo "class osc mdc rpc lov llite" >/proc/sys/lnet/subsystem_debug echo -1 >/proc/sys/lnet/debug lctl set_param osc/*/max_rpcs_in_flight=1 while : ; do rm -rf /mnt/lustre/ddd mkdir - p /mnt/lustre/ddd lctl mark "copy start" cp /aaa/* /mnt/lustre/ddd/ lctl mark "copy end" lctl dk >copy-end-log nm=`ls /proc/fs/lustre/llite/` umount /mnt/lustre lctl dk > after-umount-log mount -t lustre 192.168.122.111@tcp:/lustre /mnt/lustre ls -lR /mnt/lustre/ddd |grep " 0 " && exit echo "==========================new loop=============" done
This script just copy files from /aaa to /mnt/lustre/ddd. The file list is
-rw-r--r-- 1 root root 16 9月 21 13:57 adjtime -rw-r--r-- 1 root root 1518 9月 21 13:57 aliases -rw-r--r-- 1 root root 12288 9月 21 13:57 aliases.db -rw------- 1 root root 541 9月 21 13:57 anacrontab -rw-r--r-- 1 root root 55 9月 21 13:57 asound.conf -rw-r--r-- 1 root root 1 9月 21 13:57 at.deny -rw-r--r-- 1 root root 2835 9月 21 13:57 bashrc -rw-r--r-- 1 root root 1157 9月 21 13:57 chrony.conf -rw-r----- 1 root root 62 9月 21 13:57 chrony.keys -rw-r--r-- 1 root root 451 9月 21 13:57 crontab -rw-r--r-- 1 root root 1602 9月 21 13:57 csh.cshrc -rw-r--r-- 1 root root 841 9月 21 13:57 csh.login -rw-r--r-- 1 root root 5090 9月 21 13:57 DIR_COLORS -rw-r--r-- 1 root root 5725 9月 21 13:57 DIR_COLORS.256color -rw-r--r-- 1 root root 4669 9月 21 13:57 DIR_COLORS.lightbgcolor -rw-r--r-- 1 root root 25213 9月 21 13:57 dnsmasq.conf -rw-r--r-- 1 root root 1285 9月 21 13:57 dracut.conf -rw-r--r-- 1 root root 112 9月 21 13:57 e2fsck.conf -rw-r--r-- 1 root root 1317 9月 21 13:57 ethertypes -rw-r--r-- 1 root root 730 9月 21 13:57 favicon.png -rw-r--r-- 1 root root 64 9月 21 13:57 filesystems -rw-r--r-- 1 root root 20 9月 21 13:57 fprintd.conf -rw-r--r-- 1 root root 501 9月 21 13:57 fstab -rw-r--r-- 1 root root 265 9月 21 13:57 gdbinit -rw-r--r-- 1 root root 94 9月 21 13:57 GREP_COLORS -rw-r--r-- 1 root root 601 9月 21 13:57 group -rw-r--r-- 1 root root 573 9月 21 13:57 group- -rw-r--r-- 1 root root 4672 9月 21 13:57 grub2.cfg ---------- 1 root root 467 9月 21 13:57 gshadow ---------- 1 root root 458 9月 21 13:57 gshadow- -rw-r--r-- 1 root root 9 9月 21 13:57 host.conf -rw-r--r-- 1 root root 3 9月 21 13:57 hostname -rw-r--r-- 1 root root 194 9月 21 13:57 hosts -rw-r--r-- 1 root root 370 9月 21 13:57 hosts.allow -rw-r--r-- 1 root root 460 9月 21 13:57 hosts.deny -rw-r--r-- 1 root root 491 9月 21 13:57 inittab -rw-r--r-- 1 root root 942 9月 21 13:57 inputrc -rw-r--r-- 1 root root 23 9月 21 13:57 issue -rw-r--r-- 1 root root 22 9月 21 13:57 issue.net -rw-r--r-- 1 root root 6088 9月 21 13:57 kdump.conf -rw-r--r-- 1 root root 495 9月 21 13:57 krb5.conf -rw-r--r-- 1 root root 548 9月 21 13:57 ldev.conf -rw-r--r-- 1 root root 29632 9月 21 13:57 ld.so.cache -rw-r--r-- 1 root root 28 9月 21 13:57 ld.so.conf -rw-r----- 1 root root 191 9月 21 13:57 libaudit.conf -rw-r--r-- 1 root root 2391 9月 21 13:57 libuser.conf -rw-r--r-- 1 root root 19 9月 21 13:57 locale.conf -rw-r--r-- 1 root root 3519 9月 21 13:57 localtime -rw-r--r-- 1 root root 2028 9月 21 13:57 login.defs -rw-r--r-- 1 root root 662 9月 21 13:57 logrotate.conf -r--r--r-- 1 root root 33 9月 21 13:57 machine-id -rw-r--r-- 1 root root 111 9月 21 13:57 magic -rw-r--r-- 1 root root 1968 9月 21 13:57 mail.rc -rw-r--r-- 1 root root 5122 9月 21 13:57 makedumpfile.conf.sample -rw-r--r-- 1 root root 5171 9月 21 13:57 man_db.conf -rw-r--r-- 1 root root 956 9月 21 13:57 mke2fs.conf -r--r--r-- 1 root root 2140 9月 21 13:57 mtab -rw-r--r-- 1 root root 570 9月 21 13:57 my.cnf -rw-r--r-- 1 root root 8892 9月 21 13:57 nanorc -rw-r--r-- 1 root root 58 9月 21 13:57 networks -rw-r--r-- 1 root root 1717 9月 21 13:57 nsswitch.conf -rw-r--r-- 1 root root 1726 9月 21 13:57 nsswitch.conf.bak -rw-r--r-- 1 root root 493 9月 21 13:57 os-release -rw-r--r-- 1 root root 1163 9月 21 13:57 passwd -rw-r--r-- 1 root root 1126 9月 21 13:57 passwd- -rw-r--r-- 1 root root 2872 9月 21 13:57 pinforc -rw-r--r-- 1 root root 233 9月 21 13:57 printcap -rw-r--r-- 1 root root 1750 9月 21 13:57 profile -rw-r--r-- 1 root root 6545 9月 21 13:57 protocols -rw-r--r-- 1 root root 477 9月 21 13:57 rc.local -rw-r--r-- 1 root root 52 9月 21 13:57 redhat-release -rw-r--r-- 1 root root 55 9月 21 13:57 resolv.conf -rw-r--r-- 1 root root 1634 9月 21 13:57 rpc -rw-r--r-- 1 root root 458 9月 21 13:57 rsyncd.conf -rw-r--r-- 1 root root 3232 9月 21 13:57 rsyslog.conf -rw-r--r-- 1 root root 927 9月 21 13:57 rwtab -rw------- 1 root root 221 9月 21 13:57 securetty -rw-r--r-- 1 root root 670293 9月 21 13:57 services -rw-r--r-- 1 root root 216 9月 21 13:57 sestatus.conf ---------- 1 root root 728 9月 21 13:57 shadow ---------- 1 root root 701 9月 21 13:57 shadow- -rw-r--r-- 1 root root 95 9月 21 13:57 shells -rw-r--r-- 1 root root 256 9月 21 13:57 sos.conf -rw-r--r-- 1 root root 212 9月 21 13:57 statetab -rw-r----- 1 root root 1786 9月 21 13:57 sudo.conf -r--r----- 1 root root 4000 9月 21 13:57 sudoers -rw-r----- 1 root root 3181 9月 21 13:57 sudo-ldap.conf -rw-r--r-- 1 root root 225 9月 21 13:57 sysctl.conf -rw-r--r-- 1 root root 52 9月 21 13:57 system-release -rw-r--r-- 1 root root 45 9月 21 13:57 system-release-cpe -rw-r--r-- 1 root root 375 9月 21 13:57 trusted-key.key -rw-r--r-- 1 root root 508 9月 21 13:57 updatedb.conf -rw-r--r-- 1 root root 889 9月 21 13:57 usb_modeswitch.conf -rw-r--r-- 1 root root 37 9月 21 13:57 vconsole.conf -rw-r--r-- 1 root root 1982 9月 21 13:57 vimrc -rw-r--r-- 1 root root 1982 9月 21 13:57 virc -rw-r--r-- 1 root root 4479 9月 21 13:57 wgetrc -rw-r--r-- 1 root root 813 9月 21 13:57 yum.conf
Below is problem file list(zero size after umount & mount):
-rw-r----- 1 root root 0 10月 20 10:06 chrony.keys -rw-r--r-- 1 root root 0 10月 20 10:06 DIR_COLORS.lightbgcolor -rw-r--r-- 1 root root 0 10月 20 10:06 ethertypes -rw-r--r-- 1 root root 0 10月 20 10:06 filesystems -rw-r--r-- 1 root root 0 10月 20 10:06 GREP_COLORS -rw-r--r-- 1 root root 0 10月 20 10:06 group- ---------- 1 root root 0 10月 20 10:06 gshadow -rw-r--r-- 1 root root 0 10月 20 10:06 host.conf -rw-r--r-- 1 root root 0 10月 20 10:06 hosts -rw-r--r-- 1 root root 0 10月 20 10:06 inputrc -rw-r--r-- 1 root root 0 10月 20 10:06 issue.net -rw-r--r-- 1 root root 0 10月 20 10:06 krb5.conf -rw-r--r-- 1 root root 0 10月 20 10:06 ld.so.cache -rw-r----- 1 root root 0 10月 20 10:06 libaudit.conf -rw-r--r-- 1 root root 0 10月 20 10:06 libuser.conf -rw-r--r-- 1 root root 0 10月 20 10:06 locale.conf -r--r--r-- 1 root root 0 10月 20 10:06 machine-id -r--r--r-- 1 root root 0 10月 20 10:06 mtab -rw-r--r-- 1 root root 0 10月 20 10:06 nsswitch.conf
Landed for 2.7