[LU-11381] sanity-flr test 201 hangs Created: 14/Sep/18  Updated: 05/Oct/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: always_except

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Test 201 was added to sanity-flr with the patch https://review.whamcloud.com/#/c/29097/. In that same patch test 201 was added to the ALWAYS_EXCEPT list. Bobijam says test 201 is “… a data mover watcher example monitoring FLR file change and resync the changed file and will not quit the loop.”

The last thing seen in the test log is

== sanity-flr test 201: FLR data mover =============================================================== 21:23:44 (1536960224)
CMD: trevis-47vm12 /usr/sbin/lctl --device lustre-MDT0000 changelog_register -n
Starting client: trevis-47vm9.trevis.whamcloud.com:  -o user_xattr,flock trevis-47vm12@tcp:/lustre /mnt/lustre2
CMD: trevis-47vm9.trevis.whamcloud.com mkdir -p /mnt/lustre2
CMD: trevis-47vm9.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-47vm12@tcp:/lustre /mnt/lustre2

There's nothing obviously wrong in the console logs.

The code for test 201 is

2098 test_201() {
2099         local delay=${RESYNC_DELAY:-5}
2100 
2101         MDT0=$($LCTL get_param -n mdc.*.mds_server_uuid |
2102                awk '{ gsub(/_UUID/,""); print $1 }' | head -n1)
2103 
2104         trap cleanup_test_201 EXIT
2105 
2106         CL_USER=$(do_facet $SINGLEMDS $LCTL --device $MDT0 \
2107                         changelog_register -n)
2108 
2109         mkdir -p $MOUNT2 && mount_client $MOUNT2
2110 
2111         local index=0
2112         while :; do
2113                 local log=$($LFS changelog $MDT0 $index | grep FLRW)
2114                 [ -z "$log" ] && { sleep 1; continue; }
2115 
2116                 index=$(echo $log | awk '{print $1}')
2117                 local ts=$(date -d "$(echo $log | awk '{print $3}')" "+%s" -u)
2118                 local fid=$(echo $log | awk '{print $6}' | sed -e 's/t=//')
2119                 local file=$($LFS fid2path $MOUNT2 $fid 2> /dev/null)
2120 
2121                 ((++index))
2122                 [ -z "$file" ] && continue
2123 
2124                 local now=$(date +%s)
2125 
2126                 echo "file: $file $fid was modified at $ts, now: $now, " \
2127                      "will be resynced at $((ts+delay))"
2128 
2129                 [ $now -lt $((ts + delay)) ] && sleep $((ts + delay - now))
2130 
2131                 mirror_io resync $file
2132                 echo "$file resync done"
2133         done
2134 
2135         cleanup_test_201
2136 }
2137 run_test 201 "FLR data mover"

This ticket is to track the issues and the solutions for this test.

Logs for sanity-flr test 201 hang are at
https://jira.whamcloud.com/browse/LU-11381



 Comments   
Comment by Gerrit Updater [ 14/Sep/18 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33171
Subject: LU-11381 tests: run sanity-flr test 201
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 33e3e1bff42461ec71cc4b53207a5c87a380f4e5

Comment by Gerrit Updater [ 05/Oct/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33171/
Subject: LU-11381 tests: sanity-flr 201 skip information
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 239290c299fb875dcd1ac652d73b875ec9d07335

Comment by James Nunez (Inactive) [ 05/Oct/18 ]

The patch that landed only adds this ticket number to the ALWAYS_EXCEPT line for test 201. sanity-flr test 201 still fails and this ticket should remain open until it is fixed.

Generated at Sat Feb 10 02:43:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.