[LU-4018] acceptance-small test suite: umount hang with "/lustre/scratch2 is still busy" Created: 27/Sep/13  Updated: 08/Feb/18  Resolved: 08/Feb/18

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: James Nunez (Inactive) Assignee: Jinshan Xiong (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

lustre-master build #1687 tag 2.4.93
OpenSFS cluster with combined MGS/MDS, single OSS with two OSTs, four clients; one agent (c07), one with robinhood/db running (c08) and just running as Lustre clients (c09, c10)


Severity: 3
Rank (Obsolete): 10784

 Description   

There is an issue when running acceptance-small where the file system hangs during or right after racer test and will sometime return after several hours or will not return after several (~6) hours. This is a very long description. So, here is the outline of the testing and results:
1. Run acceptance-small with normal Lustre configuration and HSM not enabled. acceptance-small completes.
2. Run sanity-hsm.sh with HSM enabled. sanity-hsm completes.
3. Run acceptance-small with HSM enabled. acceptance-small will not complete.

The problem of acceptance-small not completing is very reproducible; every time I run acceptance-small with HSM enabled, it hangs on some unmount/cleanup call in the tests.

Here are the details:

On a newly build Lustre file system, I ran the acceptance-small suite without starting HSM (I left hsm_control = stopped, didn't start the copy tool, no agents registered, didn't mount the archive) , but had all the agent variables in my configuration file set. acceptance-small completed and when it came to sanity-hsm, the last test in acceptance-small, HSM ran fine, some test failures, but all tests completed. There are some failures here that could be related to this mount/unmount problem. For example, lfsck fails with

oss: removing objects from /dev/sda3 on ost1: 
oss: mount: mount point /mnt/ost1 does not exist
pdsh@c10: oss: ssh exited with exit code 32

Test results are at https://maloo.whamcloud.com/test_sessions/eb9d697c-2658-11e3-8d26-52540035b04c.

Next ran sanity-hsm with HSM enabled (started agent, started copytool, registered agent, enabled HSM, etc.) and the tests completed with some failures. Results are at https://maloo.whamcloud.com/test_sessions/15535878-2639-11e3-8d26-52540035b04c.

Next started acceptance-small without changing any parameters, i.e. HSM was enabled as above. Everything looked like it was running fine until I saw messages on the client console that kicked off the tests (c10):

...
file_create: FILE=/lustre/scratch/racer/1 SIZE=66368
file_create: FILE=/lustre/scratch/racer/4 SIZE=155632
file_create: FILE=/lustre/scratch/racer/7 SIZE=36848
file_create: FILE=/lustre/scratch/racer/4 SIZE=198040
file_create: FILE=/lustre/scratch/racer/12 SIZE=70920
file_create: FILE=/lustre/scratch/racer/17 SIZE=163576
racer cleanup
racer cleanup
racer cleanup
racer cleanup
sleeping 5 sec ...
sleeping 5 sec ...
sleeping 5 sec ...
sleeping 5 sec ...
there should be NO racer processes:
root      7383  0.0  0.0 103244   880 ?        S    15:59   0:00 grep -E file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec
root      7385  0.0  0.0 103244   880 ?        S    15:59   0:00 grep -E file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec
there should be NO racer processes:
Filesystem           1K-blocks      Used Available Use% Mounted on
mds@o2ib:/scratch       374928    346576      8576  98% /lustre/scratch
We survived /usr/lib64/lustre/tests/racer/racer.sh for 300 seconds.
there should be NO racer processes:
root      3805  0.0  0.0 103244   872 ?        S    15:59   0:00 grep -E file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec
root      3807  0.0  0.0 103244   872 ?        S    15:59   0:00 grep -E file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec
Filesystem           1K-blocks      Used Available Use% Mounted on
mds@o2ib:/scratch       374928    346576      8576  98% /lustre/scratch
We survived /usr/lib64/lustre/tests/racer/racer.sh for 300 seconds.
root     31318  0.0  0.0 103244   892 ?        S    15:59   0:00 grep -E file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec
root     31320  0.0  0.0 103244   888 ?        R    15:59   0:00 grep -E file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec
Filesystem           1K-blocks      Used Available Use% Mounted on
mds@o2ib:/scratch       374928    346576      8576  98% /lustre/scratch
We survived /usr/lib64/lustre/tests/racer/racer.sh for 300 seconds.
there should be NO racer processes:
root     30362  0.0  0.0 103244   892 ?        S    15:59   0:00 grep -E file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec
Filesystem           1K-blocks      Used Available Use% Mounted on
mds@o2ib:/scratch       374928    346576      8576  98% /lustre/scratch
We survived /usr/lib64/lustre/tests/racer/racer.sh for 300 seconds.
pid=16205 rc=0

there should be NO racer processes:
root      3805  0.0  0.0 103244   872 ?        S    15:59   0:00 grep -E file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec
root      3807  0.0  0.0 103244   872 ?        S    15:59   0:00 grep -E file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec
Filesystem           1K-blocks      Used Available Use% Mounted on
mds@o2ib:/scratch       374928    346576      8576  98% /lustre/scratch2
We survived /usr/lib64/lustre/tests/racer/racer.sh for 300 seconds.
root     31318  0.0  0.0 103244   892 ?        S    15:59   0:00 grep -E file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec
root     31320  0.0  0.0 103244   896 ?        R    15:59   0:00 grep -E file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec
Filesystem           1K-blocks      Used Available Use% Mounted on
mds@o2ib:/scratch       374928    346576      8576  98% /lustre/scratch2
We survived /usr/lib64/lustre/tests/racer/racer.sh for 300 seconds.
there should be NO racer processes:
root     30362  0.0  0.0 103244   892 ?        S    15:59   0:00 grep -E file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec
root     30364  0.0  0.0 103244   892 ?        S    15:59   0:00 grep -E file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec
Filesystem           1K-blocks      Used Available Use% Mounted on
mds@o2ib:/scratch       374928    346576      8576  98% /lustre/scratch2
We survived /usr/lib64/lustre/tests/racer/racer.sh for 300 seconds.
pid=16206 rc=0
Resetting fail_loc on all nodes...done.
PASS 1 (306s)
== racer test complete, duration 312 sec == 15:59:37 (1380149977)
Stopping clients: c07,c08,c09,c10 /lustre/scratch2 (opts:)
Stopping client c07 /lustre/scratch2 opts:
Stopping client c09 /lustre/scratch2 opts:
Stopping client c08 /lustre/scratch2 opts:
Stopping client c10 /lustre/scratch2 opts:
COMMAND   PID USER   FD   TYPE      DEVICE SIZE/OFF               NODE NAME
dd      24324 root    1w   REG 3953,876890    31744 144115574404849344 /lustre/scratch2/racer/1
/lustre/scratch2 is still busy, wait one second
COMMAND   PID USER   FD   TYPE      DEVICE SIZE/OFF               NODE NAME
dd      23480 root    1w   REG 3953,876890  3895296 144115574438378493 /lustre/scratch/racer/13
COMMAND   PID USER   FD   TYPE      DEVICE SIZE/OFF               NODE NAME
dd       9884 root    1w   REG 3953,876890 21794816 144115574438363138 /lustre/scratch/racer/11
dd      30142 root    1w   REG 3953,876890 13414400 144115574388043790 /lustre/scratch2/racer/3
/lustre/scratch2 is still busy, wait one second
COMMAND   PID USER   FD   TYPE      DEVICE SIZE/OFF               NODE NAME
dd      11466 root    1w   REG 3953,876890  3895296 144115574438378493 /lustre/scratch2/racer/13 (deleted)
dd      16589 root    1w   REG 3953,876890 21794816 144115574438363138 /lustre/scratch/racer/11
dd      24466 root    1w   REG 3953,876890    31744 144115574404849344 /lustre/scratch/racer/1
/lustre/scratch2 is still busy, wait one second
/lustre/scratch2 is still busy, wait one second
/lustre/scratch2 is still busy, wait one second

And the "still busy message" continued for several hours. After approximately 3.5 hours, the tests started running again. So, I let things run and went to bed. When I checked back in the morning, the testing was stuck in conf-sanity with the following on the c10 console:

oss: Are the lustre modules loaded?
oss: Check /etc/modprobe.conf and /proc/filesystems
pdsh@c10: oss: ssh exited with exit code 19
Start of /dev/sda3 on ost1 failed 19
pdsh@c10: mds: ssh exited with exit code 3
can't get osc.scratch-OST0000-osc-MDT0000.ost_server_uuid by list_param in 40 secs
Go with osc.scratch-OST0000-osc-MDT0000.ost_server_uuid directly
mds:  rpc : @@@@@@ FAIL: can't put import for osc.scratch-OST0000-osc-MDT0000.ost_server_uuid into FULL state after 40 sec, have  
mds:   Trace dump:
mds:   = /usr/lib64/lustre/tests/test-framework.sh:4264:error_noexit()
mds:   = /usr/lib64/lustre/tests/test-framework.sh:4291:error()
mds:   = /usr/lib64/lustre/tests/test-framework.sh:5374:_wait_import_state()
mds:   = /usr/lib64/lustre/tests/test-framework.sh:5393:wait_import_state()
mds:   = rpc.sh:20:main()
mds: Dumping lctl log to /tmp/test_logs/1380167996/rpc..*.1380168036.log
mds: Dumping logs only on local client.
pdsh@c10: mds: ssh exited with exit code 1
conf-sanity test_54a: @@@@@@ FAIL: import is not in FULL state 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4264:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4291:error()
  = /usr/lib64/lustre/tests/test-framework.sh:5481:_wait_osc_import_state()
  = /usr/lib64/lustre/tests/test-framework.sh:5496:wait_osc_import_state()
  = /usr/lib64/lustre/tests/conf-sanity.sh:99:gen_config()
  = /usr/lib64/lustre/tests/conf-sanity.sh:109:reformat_and_config()
  = /usr/lib64/lustre/tests/conf-sanity.sh:3321:test_54a()
  = /usr/lib64/lustre/tests/test-framework.sh:4530:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:4563:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4433:run_test()
  = /usr/lib64/lustre/tests/conf-sanity.sh:3323:main()
Dumping lctl log to /tmp/test_logs/2013-09-25/143504/conf-sanity.test_54a.*.1380168037.log
oss: Host key verification failed.
oss: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
mds: Host key verification failed.
oss: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
mds: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
mds: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
pdsh@c10: oss: ssh exited with exit code 255
pdsh@c10: mds: ssh exited with exit code 255
FAIL 54a (100s)

== conf-sanity test 54b: test llverfs and partial verify of filesystem == 21:00:38 (1380168038)
start mds service on mds
Starting mds1:   /dev/sda3 /lustre/scratch/mdt0
mds: mount.lustre: according to /etc/mtab /dev/sda3 is already mounted on /lustre/scratch/mdt0
pdsh@c10: mds: ssh exited with exit code 17
Start of /dev/sda3 on mds1 failed 17
conf-sanity test_54b: @@@@@@ FAIL: MDT start failed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4264:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4291:error()
  = /usr/lib64/lustre/tests/conf-sanity.sh:231:setup()
  = /usr/lib64/lustre/tests/conf-sanity.sh:3331:test_54b()
  = /usr/lib64/lustre/tests/test-framework.sh:4530:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:4563:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4433:run_test()
  = /usr/lib64/lustre/tests/conf-sanity.sh:3336:main()
Dumping lctl log to /tmp/test_logs/2013-09-25/143504/conf-sanity.test_54b.*.1380168039.log
oss: Host key verification failed.
oss: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
oss: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
mds: Host key verification failed.
pdsh@c10: oss: ssh exited with exit code 255
mds: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
mds: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
pdsh@c10: mds: ssh exited with exit code 255
FAIL 54b (2s)

== conf-sanity test 55: check lov_objid size == 21:00:40 (1380168040)
Stopping /lustre/scratch/mdt0 (opts:-f) on mds
pdsh@c10: mds: ssh exited with exit code 1

   Permanent disk data:
Target:     scratch:MDT0000
Index:      0
Lustre FS:  scratch
Mount type: ldiskfs
Flags:      0x65
              (MDT MGS first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: sys.timeout=20 lov.stripesize=1048576 lov.stripecount=0 mdt.identity_upcall=/usr/sbin/l_getidentity

device size = 20481MB
formatting backing filesystem ldiskfs on /dev/sda3
	target name  scratch:MDT0000
	4k blocks     50000
	options        -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_itable_init,lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L scratch:MDT0000  -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_itable_init,lazy_journal_init -F /dev/sda3 50000
Writing CONFIGS/mountdata
pdsh@c10: oss: ssh exited with exit code 1
pdsh@c10: oss: ssh exited with exit code 1

   Permanent disk data:
Target:     scratch:OST03ff
Index:      1023
Lustre FS:  scratch
Mount type: ldiskfs
Flags:      0x62
              (OST first_time update )
Persistent mount opts: errors=remount-ro
Parameters: mgsnode=192.168.2.105@o2ib sys.timeout=20

device size = 12292MB
formatting backing filesystem ldiskfs on /dev/sda3
	target name  scratch:OST03ff
	4k blocks     50000
	options        -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize=4290772992,lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L scratch:OST03ff  -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize=4290772992,lazy_journal_init -F /dev/sda3 50000
Writing CONFIGS/mountdata
start mds service on mds
Starting mds1:   /dev/sda3 /lustre/scratch/mdt0
Started scratch-MDT0000
start ost1 service on oss
Starting ost1:   /dev/sda3 /lustre/scratch/ost1
oss: mount.lustre: mount /dev/sda3 at /lustre/scratch/ost1 failed: No such device
oss: Are the lustre modules loaded?
oss: Check /etc/modprobe.conf and /proc/filesystems
pdsh@c10: oss: ssh exited with exit code 19
Start of /dev/sda3 on ost1 failed 19
mount scratch on /lustre/scratch.....
Starting client: c10: -o user_xattr,flock mds@o2ib:/scratch /lustre/scratch

At this time, the MDT was mounted, but nothing else was mounted; no OSTs were mounted and none of the nodes had the Lustre file system mounted.

Started testing again with a clean system with 2.4.93 meaning used loadjenkins to load a new OS and formatted all storage. After everything was up and running, I enabled HSM on the MDS, registered agent, started copytool, but I left the agent (c07) off of the remote client list in case it was having problems with mount/unmounts. Leaving the agent out didn't help anything as you can see below. sanity, sanityn, sanity-benchmark and metadata-updates all ran, but when racer was running or just completed, the following was on c10 console:

....
== racer test complete, duration 312 sec == 14:47:42 (1380232062)
Stopping clients: c08,c09,c10 /lustre/scratch2 (opts:)
Stopping client c10 /lustre/scratch2 opts:
Stopping client c08 /lustre/scratch2 opts:
Stopping client c09 /lustre/scratch2 opts:
COMMAND   PID USER   FD   TYPE      DEVICE  SIZE/OFF               NODE NAME
dd       1344 root    1w   REG 3953,876890  11075584 144115473691245005 /lustre/scratch/racer/10
dd      10191 root    1w   REG 3953,876890 171792384 144115473741559103 /lustre/scratch2/racer/19
dd      13174 root    1w   REG 3953,876890 172275712 144115473691238078 /lustre/scratch/racer/19
cat     13200 root    1w   REG 3953,876890 342456320 144115473724745995 /lustre/scratch2/racer/6
cat     13200 root    3r   REG 3953,876890 172275712 144115473691238078 /lustre/scratch2/racer/19
cat     14311 root    1w   REG 3953,876890   2856960 144115473691246648 /lustre/scratch/racer/4
cat     14311 root    3r   REG 3953,876890   2857984 144115473775086376 /lustre/scratch/racer/12
dd      29283 root    1w   REG 3953,876890    373760 144115473758289951 /lustre/scratch/racer/6
dd      29288 root    1w   REG 3953,876890    373760 144115473758289951 /lustre/scratch2/racer/13
dd      32281 root    1w   REG 3953,876890   4388864 144115473758324783 /lustre/scratch2/racer/7
/lustre/scratch2 is still busy, wait one second
COMMAND   PID USER   FD      TYPE      DEVICE  SIZE/OFF               NODE NAME
cat      1744 root    1w      REG 3953,876890   1141760 144115473791881355 /lustre/scratch2/racer/12
cat      1744 root    3r      REG 3953,876890   1142784 144115473741554890 /lustre/scratch2/racer/14
dd       2731 root    1w      REG 3953,876890  40736768 144115473791888614 /lustre/scratch2/racer/4
dd       3692 root    1w      REG 3953,876890 159873024 144115473724773982 /lustre/scratch/racer/15
cat      7919 root    1w      REG 3953,876890   4327424 144115473791855853 /lustre/scratch2/racer/14
cat      7919 root    3r  unknown 3953,876890                              /lustre/scratch2/racer/8/4
dd      16792 root    1w      REG 3953,876890    424960 144115473758289949 /lustre/scratch2/racer/16
dd      16795 root    1w      REG 3953,876890    424960 144115473758289949 /lustre/scratch/racer/16
dd      16826 root    1w      REG 3953,876890    934912 144115473775067150 /lustre/scratch2/racer/14
dd      16851 root    1w      REG 3953,876890    934912 144115473775067150 /lustre/scratch/racer/14
cat     19347 root    3r      REG 3953,876890  11075584 144115473691245005 /lustre/scratch/racer/16
cat     30055 root    1w      REG 3953,876890 342456320 144115473724745995 /lustre/scratch/racer/1
cat     30055 root    3r      REG 3953,876890 172275712 144115473691238078 /lustre/scratch/racer/4
/lustre/scratch2 is still busy, wait one second
COMMAND   PID USER   FD      TYPE      DEVICE SIZE/OFF               NODE NAME
cat      9753 root    3r      REG 3953,876890  4389888 144115473758324783 /lustre/scratch/racer/7
dd      19612 root    1w      REG 3953,876890  2859008 144115473775086376 /lustre/scratch2/racer/12
dd      20439 root    1w  unknown 3953,876890                             /lustre/scratch/racer/8/1
dd      20860 root    1w      REG 3953,876890  1143808 144115473741554890 /lustre/scratch/racer/5
dd      23548 root    1w      REG 3953,876890   374784 144115473758289951 /lustre/scratch/racer/13
dd      27659 root    1w      REG 3953,876890  4328448 144115473691239547 /lustre/scratch2/racer/6
/lustre/scratch2 is still busy, wait one second
/lustre/scratch2 is still busy, wait one second
/lustre/scratch2 is still busy, wait one second
...

MDS and OSS all mounted, agent has scratch mounted and copytool running, c08, c09 and c10 all have the Lustre file system mounted on /lustre/scratch and /lustre/scratch2. From the MDS dmesg:

LustreError: 25757:0:(mdt_hsm_cdt_agent.c:338:mdt_hsm_agent_send()) scratch-MDT0000: Cannot find agent for archive 0: rc = -11
LustreError: 25757:0:(mdt_hsm_cdt_agent.c:338:mdt_hsm_agent_send()) Skipped 179 previous similar messages

The agent did mount the Lustre file system, but is not taking part in the testing, i.e. not on the remote client list.

On the agent, I see:

LustreError: 166-1: MGC192.168.2.105@o2ib: Connection to MGS (at 192.168.2.105@o2ib) was lost; in progress operations using this service will fail
LustreError: 11-0: scratch-OST0001-osc-ffff88082d9d7000: Communicating with 192.168.2.106@o2ib, operation obd_ping failed with -107.
Lustre: Evicted from MGS (at 192.168.2.105@o2ib) after server handle changed from 0xc3f51c4989c55d76 to 0xc3f51c4989e1bc9e
Lustre: MGC192.168.2.105@o2ib: Connection restored to MGS (at 192.168.2.105@o2ib)

On all the client nodes, in dmesg I see:

LustreError: 26096:0:(file.c:3068:ll_inode_revalidate_fini()) scratch: revalidate FID [0x200004283:0x9f09:0x0] error: rc = -116

Only on c10 do I see:

Lustre: DEBUG MARKER: == sanity-benchmark test bonnie: bonnie++ == 14:14:25 (1380230065)
Lustre: DEBUG MARKER: min OST has 19147140kB available, using 28720710kB file size
INFO: task bonnie++:19773 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bonnie++      D 0000000000000004     0 19773  19510 0x00000080
 ffff8807b77a7d38 0000000000000082 0000000000000000 ffffffff81277679
 0000000000016700 ffff8800458d01f0 ffff8807b77a7cf8 ffffffff810573c5
 ffff8807b797dab8 ffff8807b77a7fd8 000000000000fb88 ffff8807b797dab8
Call Trace:
 [<ffffffff81277679>] ? cpumask_next_and+0x29/0x50
 [<ffffffff810573c5>] ? select_idle_sibling+0x95/0x150
 [<ffffffff8150efa5>] schedule_timeout+0x215/0x2e0
 [<ffffffff81065d54>] ? enqueue_task_fair+0x64/0x100
 [<ffffffff81055f96>] ? enqueue_task+0x66/0x80
 [<ffffffff8150ec23>] wait_for_common+0x123/0x180
 [<ffffffff81063410>] ? default_wake_function+0x0/0x20
 [<ffffffff8150ed3d>] wait_for_completion+0x1d/0x20
 [<ffffffff811ac055>] writeback_inodes_sb_nr+0x75/0xa0
 [<ffffffff811ac0c3>] writeback_inodes_sb+0x43/0x50
 [<ffffffff811b1f9e>] __sync_filesystem+0x3e/0x90
 [<ffffffff811b20e8>] sync_filesystems+0xf8/0x130
 [<ffffffff811b2177>] sys_sync+0x17/0x40
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task bonnie++:19773 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bonnie++      D 0000000000000004     0 19773  19510 0x00000080
 ffff8807b77a7d38 0000000000000082 0000000000000000 ffffffff81277679
 0000000000016700 ffff8800458d01f0 ffff8807b77a7cf8 ffffffff810573c5
 ffff8807b797dab8 ffff8807b77a7fd8 000000000000fb88 ffff8807b797dab8
Call Trace:
 [<ffffffff81277679>] ? cpumask_next_and+0x29/0x50
 [<ffffffff810573c5>] ? select_idle_sibling+0x95/0x150
 [<ffffffff8150efa5>] schedule_timeout+0x215/0x2e0
 [<ffffffff81065d54>] ? enqueue_task_fair+0x64/0x100
 [<ffffffff81055f96>] ? enqueue_task+0x66/0x80
 [<ffffffff8150ec23>] wait_for_common+0x123/0x180
 [<ffffffff81063410>] ? default_wake_function+0x0/0x20
 [<ffffffff8150ed3d>] wait_for_completion+0x1d/0x20
 [<ffffffff811ac055>] writeback_inodes_sb_nr+0x75/0xa0
 [<ffffffff811ac0c3>] writeback_inodes_sb+0x43/0x50
 [<ffffffff811b1f9e>] __sync_filesystem+0x3e/0x90
 [<ffffffff811b20e8>] sync_filesystems+0xf8/0x130
 [<ffffffff811b2177>] sys_sync+0x17/0x40
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task bonnie++:19773 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bonnie++      D 0000000000000004     0 19773  19510 0x00000080
 ffff8807b77a7d38 0000000000000082 0000000000000000 ffffffff81277679
 0000000000016700 ffff8800458d01f0 ffff8807b77a7cf8 ffffffff810573c5
 ffff8807b797dab8 ffff8807b77a7fd8 000000000000fb88 ffff8807b797dab8
Call Trace:
 [<ffffffff81277679>] ? cpumask_next_and+0x29/0x50
 [<ffffffff810573c5>] ? select_idle_sibling+0x95/0x150
 [<ffffffff8150efa5>] schedule_timeout+0x215/0x2e0
 [<ffffffff81065d54>] ? enqueue_task_fair+0x64/0x100
 [<ffffffff81055f96>] ? enqueue_task+0x66/0x80
 [<ffffffff8150ec23>] wait_for_common+0x123/0x180
 [<ffffffff81063410>] ? default_wake_function+0x0/0x20
 [<ffffffff8150ed3d>] wait_for_completion+0x1d/0x20
 [<ffffffff811ac055>] writeback_inodes_sb_nr+0x75/0xa0
 [<ffffffff811ac0c3>] writeback_inodes_sb+0x43/0x50
 [<ffffffff811b1f9e>] __sync_filesystem+0x3e/0x90
 [<ffffffff811b20e8>] sync_filesystems+0xf8/0x130
 [<ffffffff811b2177>] sys_sync+0x17/0x40
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task bonnie++:19773 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bonnie++      D 0000000000000004     0 19773  19510 0x00000080
 ffff8807b77a7d38 0000000000000082 0000000000000000 ffffffff81277679
 0000000000016700 ffff8800458d01f0 ffff8807b77a7cf8 ffffffff810573c5
 ffff8807b797dab8 ffff8807b77a7fd8 000000000000fb88 ffff8807b797dab8
Call Trace:
 [<ffffffff81277679>] ? cpumask_next_and+0x29/0x50
 [<ffffffff810573c5>] ? select_idle_sibling+0x95/0x150
 [<ffffffff8150efa5>] schedule_timeout+0x215/0x2e0
 [<ffffffff81065d54>] ? enqueue_task_fair+0x64/0x100
 [<ffffffff81055f96>] ? enqueue_task+0x66/0x80
 [<ffffffff8150ec23>] wait_for_common+0x123/0x180
 [<ffffffff81063410>] ? default_wake_function+0x0/0x20
 [<ffffffff8150ed3d>] wait_for_completion+0x1d/0x20
 [<ffffffff811ac055>] writeback_inodes_sb_nr+0x75/0xa0
 [<ffffffff811ac0c3>] writeback_inodes_sb+0x43/0x50
 [<ffffffff811b1f9e>] __sync_filesystem+0x3e/0x90
 [<ffffffff811b20e8>] sync_filesystems+0xf8/0x130
 [<ffffffff811b2177>] sys_sync+0x17/0x40
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task bonnie++:19773 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bonnie++      D 0000000000000004     0 19773  19510 0x00000080
 ffff8807b77a7d38 0000000000000082 0000000000000000 ffffffff81277679
 0000000000016700 ffff8800458d01f0 ffff8807b77a7cf8 ffffffff810573c5
 ffff8807b797dab8 ffff8807b77a7fd8 000000000000fb88 ffff8807b797dab8
Call Trace:
 [<ffffffff81277679>] ? cpumask_next_and+0x29/0x50
 [<ffffffff810573c5>] ? select_idle_sibling+0x95/0x150
 [<ffffffff8150efa5>] schedule_timeout+0x215/0x2e0
 [<ffffffff81065d54>] ? enqueue_task_fair+0x64/0x100
 [<ffffffff81055f96>] ? enqueue_task+0x66/0x80
 [<ffffffff8150ec23>] wait_for_common+0x123/0x180
 [<ffffffff81063410>] ? default_wake_function+0x0/0x20
 [<ffffffff8150ed3d>] wait_for_completion+0x1d/0x20
 [<ffffffff811ac055>] writeback_inodes_sb_nr+0x75/0xa0
 [<ffffffff811ac0c3>] writeback_inodes_sb+0x43/0x50
 [<ffffffff811b1f9e>] __sync_filesystem+0x3e/0x90
 [<ffffffff811b20e8>] sync_filesystems+0xf8/0x130
 [<ffffffff811b2177>] sys_sync+0x17/0x40
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
LNet: 19792:0:(debug.c:218:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
Lustre: DEBUG MARKER: SKIP: sanity-benchmark test_iozone skipping SLOW test iozone
...
Lustre: DEBUG MARKER: == racer test 1: racer on clients: c08,c09,c10 DURATION=300 == 14:42:36 (1380231756)
LustreError: 30894:0:(mdc_locks.c:915:mdc_enqueue()) ldlm_cli_enqueue: -2
LustreError: 29189:0:(file.c:3068:ll_inode_revalidate_fini()) scratch: revalidate FID [0x200004282:0x3fd:0x0] error: rc = -116

While scratch2 is still busy, on the clients I see variations of:

 [root@c09 ~]# lsof /lustre/scratch2
COMMAND   PID USER   FD   TYPE      DEVICE SIZE/OFF               NODE NAME
cat      9753 root    3r   REG 3953,876890 23490560 144115473758324783 /lustre/scratch/racer/7
dd      19612 root    1w   REG 3953,876890 21051392 144115473775086376 /lustre/scratch2/racer/12
dd      20860 root    1w   REG 3953,876890 18997248 144115473741554890 /lustre/scratch/racer/5
dd      23548 root    1w   REG 3953,876890  4968448 144115473758289951 /lustre/scratch/racer/13
dd      27659 root    1w   REG 3953,876890 20217856 144115473691239547 /lustre/scratch2/racer/6

and

 [root@c09 ~]# ps ax | grep racer
 2328 pts/0    S+     0:00 grep racer
 9753 ?        S      0:10 cat /lustre/scratch/racer/16
19612 ?        D      0:08 dd if=/dev/zero of=/lustre/scratch2/racer/12 bs=1k count=90648
20860 ?        S      0:08 dd if=/dev/zero of=/lustre/scratch/racer/5 bs=1k count=141568
23548 ?        S      0:02 dd if=/dev/zero of=/lustre/scratch/racer/17 bs=1k count=180680
27659 ?        S      0:08 dd if=/dev/zero of=/lustre/scratch2/racer/0 bs=1k count=62512

After five hours, the state of the nodes has not changed. I was able to upload partial logs for this last run at https://maloo.whamcloud.com/test_sessions/a4370948-2721-11e3-b741-52540035b04c



 Comments   
Comment by James Nunez (Inactive) [ 27/Sep/13 ]

acceptance-small completed overnight. Results are being uploaded now.

Comment by Jodi Levi (Inactive) [ 27/Sep/13 ]

Jinshan,
Can you please look into this one?
Thank you!

Comment by James Nunez (Inactive) [ 27/Sep/13 ]

Here are the full results that were finally produced after letting the tests run over night: https://maloo.whamcloud.com/test_sessions/277b4f38-2779-11e3-8a10-52540035b04c

sanity-hsm was expected to fail because the agent was not included in the remote client list.

Comment by Jinshan Xiong (Inactive) [ 03/Oct/13 ]

This bug demonstrates a problem with dlm lock contention. In my reproduce, I saw the following processes are still running:

29989 pts/0    D      0:05 dd if=/dev/zero of=/lustre/scratch/racer/18 bs=1k count=49672
30128 pts/0    D      0:04 dd if=/dev/zero of=/lustre/scratch/racer/0 bs=1k count=13936
30324 pts/0    S      0:04 dd if=/dev/zero of=/lustre/scratch2/racer/0 bs=1k count=236576
30390 pts/0    S      0:04 dd if=/dev/zero of=/lustre/scratch2/racer/18 bs=1k count=258440

So files are being written by two clients with partial page size, so for each write, it has to read the page from OST and then write it immediately, then the lock will be canceled by OST because it was conflicted with another writer.

I will work out a fix soon.

Comment by Jinshan Xiong (Inactive) [ 03/Oct/13 ]

This is not a blocker and not HSM related.

Comment by Andreas Dilger [ 15/Oct/13 ]

There are a number of potential fixes for this kind of problem. One possibility is to send small writes using lockless IO from the client. This should be possible using mostly existing functionality, just changing the CLIO IO layer to use this method in cases of high lock contention and small writes. Another possibility is to use the "small IO with request" method as implemented in http://review.whamcloud.com/3690, but that patch never showed much improvement. If there was a workload that it improved significantly it would be more interesting to land.

Comment by Jinshan Xiong (Inactive) [ 15/Oct/13 ]

From what I have seen from this issue, the two clients were not writing the exact same region of the same file. The lock contention is because the problem of original LU-3027 patch which is keen on extending locks to [0, EOF). In the new LU-3027, lls_ever_canceled was introduced to solve the problem. If the top lock has ever been canceled due to contention, it will not be matched again.

There are a number of potential fixes for this kind of problem. One possibility is to send small writes using lockless IO from the client. This should be possible using mostly existing functionality, just changing the CLIO IO layer to use this method in cases of high lock contention and small writes.

In current extent lock implementation, if there are too many conflicted locks existing on the server side, clients will be informed to use lockless IO. Clients don't have this kind of information to decide when to use lockless IO for small writes, any ideas?

Comment by Andreas Dilger [ 15/Oct/13 ]

It also makes sense to use larger writes and smaller files, since I don't think the file_create.sh test is intended to exercise sub-page writes (it does no data verification at all) but rather just doing writes to files undergoing other operations. I've submitted http://review.whamcloud.com/7951 with a patch to improve the writes used by racer.

It would also be useful to add in a separate file_fsx.sh test that runs fsx on a file until it fails, then restarts. The goal wouldn't be that fsx runs correctly (it will be clobbered by other test scripts), but rather to have fsx produce strange IO patterns (read, write, O_DIRECT, mmap, truncate, open/close) on the same file(s) that are adding new operations into the mix.

Comment by Peter Jones [ 01/Jul/14 ]

http://review.whamcloud.com/7951 has been merged into master. Does this ticket need to remain open still or can it be marked as resolved?

Comment by Jinshan Xiong (Inactive) [ 08/Feb/18 ]

close old tickets

Generated at Sat Feb 10 01:38:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.