Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11381

sanity-flr test 201 hangs

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.0
    • 3
    • 9223372036854775807

    Description

      Test 201 was added to sanity-flr with the patch https://review.whamcloud.com/#/c/29097/. In that same patch test 201 was added to the ALWAYS_EXCEPT list. Bobijam says test 201 is “… a data mover watcher example monitoring FLR file change and resync the changed file and will not quit the loop.”

      The last thing seen in the test log is

      == sanity-flr test 201: FLR data mover =============================================================== 21:23:44 (1536960224)
      CMD: trevis-47vm12 /usr/sbin/lctl --device lustre-MDT0000 changelog_register -n
      Starting client: trevis-47vm9.trevis.whamcloud.com:  -o user_xattr,flock trevis-47vm12@tcp:/lustre /mnt/lustre2
      CMD: trevis-47vm9.trevis.whamcloud.com mkdir -p /mnt/lustre2
      CMD: trevis-47vm9.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-47vm12@tcp:/lustre /mnt/lustre2
      

      There's nothing obviously wrong in the console logs.

      The code for test 201 is

      2098 test_201() {
      2099         local delay=${RESYNC_DELAY:-5}
      2100 
      2101         MDT0=$($LCTL get_param -n mdc.*.mds_server_uuid |
      2102                awk '{ gsub(/_UUID/,""); print $1 }' | head -n1)
      2103 
      2104         trap cleanup_test_201 EXIT
      2105 
      2106         CL_USER=$(do_facet $SINGLEMDS $LCTL --device $MDT0 \
      2107                         changelog_register -n)
      2108 
      2109         mkdir -p $MOUNT2 && mount_client $MOUNT2
      2110 
      2111         local index=0
      2112         while :; do
      2113                 local log=$($LFS changelog $MDT0 $index | grep FLRW)
      2114                 [ -z "$log" ] && { sleep 1; continue; }
      2115 
      2116                 index=$(echo $log | awk '{print $1}')
      2117                 local ts=$(date -d "$(echo $log | awk '{print $3}')" "+%s" -u)
      2118                 local fid=$(echo $log | awk '{print $6}' | sed -e 's/t=//')
      2119                 local file=$($LFS fid2path $MOUNT2 $fid 2> /dev/null)
      2120 
      2121                 ((++index))
      2122                 [ -z "$file" ] && continue
      2123 
      2124                 local now=$(date +%s)
      2125 
      2126                 echo "file: $file $fid was modified at $ts, now: $now, " \
      2127                      "will be resynced at $((ts+delay))"
      2128 
      2129                 [ $now -lt $((ts + delay)) ] && sleep $((ts + delay - now))
      2130 
      2131                 mirror_io resync $file
      2132                 echo "$file resync done"
      2133         done
      2134 
      2135         cleanup_test_201
      2136 }
      2137 run_test 201 "FLR data mover"
      

      This ticket is to track the issues and the solutions for this test.

      Logs for sanity-flr test 201 hang are at
      https://jira.whamcloud.com/browse/LU-11381

      Attachments

        Activity

          People

            wc-triage WC Triage
            jamesanunez James Nunez (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: