[LU-6955] Migrating directory with failed MDT Created: 04/Aug/15  Updated: 05/Aug/15  Resolved: 04/Aug/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Di Wang Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

1. Setup lustre with 4 MDTs, 4 OSTs and 1 client.
2. Create 1 directory (/mnt/lustre/migrate_dir) on MDT0 and 100k files within the directory.
3. Start migrate the directory from MDT0 to MDT1. After 30 seconds reboot both MDT0 and MDT1.
Note: the reboot must happen during the migration, usually migrating 100k files should take much more than 30 seconds in current 2.6.
1. lfs migrate -m 1 -v /mnt/lustre/migrate_dir # with -v you can see the progress of migration.
4. After the MDT0 and MDT1 are restarted and re-mount, and recovery finished, client client will be able to access the 100k files. Creating files under /mnt/lustre/migrate_dir should be denied.
5. Continue the migration with same command:
1. lfs migrate -m 1 /mnt/lustre/migrate_dir
6. Check migrate_dir and files under migrate_dir are located on MDT1.
7. run lfsck to check the result.
8. No errors will be present.



 Comments   
Comment by Di Wang [ 04/Aug/15 ]

Test log

[root@c01 ~]# ls /mnt/lustre/
[root@c01 ~]# cd /usr/lib64/lustre/tests/
[root@c01 tests]# mkdir /mnt/lustre/test1
[root@c01 tests]# ./createmany -o /mnt/lustre/test1/f-%d 1048576
 - created 10000 (time 1438703652.54 total 4.84 last 4.84)
 - created 20000 (time 1438703657.59 total 9.89 last 5.05)
 - created 30000 (time 1438703662.65 total 14.95 last 5.06)
 - created 40000 (time 1438703667.79 total 20.09 last 5.14)
 - created 50000 (time 1438703672.86 total 25.16 last 5.07)
 - created 60000 (time 1438703677.93 total 30.23 last 5.07)
 - created 70000 (time 1438703683.01 total 35.31 last 5.08)
 - created 80000 (time 1438703688.14 total 40.44 last 5.14)
 - created 90000 (time 1438703693.32 total 45.62 last 5.18)
 - created 100000 (time 1438703698.48 total 50.78 last 5.16)
 - created 110000 (time 1438703703.59 total 55.89 last 5.11)
 - created 120000 (time 1438703708.75 total 61.05 last 5.15)
 - created 130000 (time 1438703713.85 total 66.15 last 5.10)
 - created 140000 (time 1438703719.00 total 71.30 last 5.15)
 - created 150000 (time 1438703724.22 total 76.52 last 5.22)
 - created 160000 (time 1438703729.41 total 81.71 last 5.19)
 - created 170000 (time 1438703734.57 total 86.87 last 5.15)
 - created 180000 (time 1438703739.63 total 91.93 last 5.07)
 - created 190000 (time 1438703744.71 total 97.01 last 5.08)
 - created 200000 (time 1438703749.78 total 102.08 last 5.07)
 - created 210000 (time 1438703754.86 total 107.16 last 5.08)
 - created 220000 (time 1438703760.02 total 112.33 last 5.16)
 - created 230000 (time 1438703765.17 total 117.47 last 5.14)
 - created 240000 (time 1438703770.24 total 122.54 last 5.08)
 - created 250000 (time 1438703775.32 total 127.62 last 5.07)
 - created 260000 (time 1438703780.33 total 132.64 last 5.02)
 - created 270000 (time 1438703785.31 total 137.61 last 4.98)
 - created 280000 (time 1438703790.41 total 142.71 last 5.10)
 - created 290000 (time 1438703795.54 total 147.84 last 5.13)
 - created 300000 (time 1438703800.63 total 152.93 last 5.09)
 - created 310000 (time 1438703805.70 total 158.00 last 5.07)
 - created 320000 (time 1438703810.85 total 163.15 last 5.14)
 - created 330000 (time 1438703816.09 total 168.39 last 5.24)
 - created 340000 (time 1438703821.35 total 173.65 last 5.26)
 - created 350000 (time 1438703826.55 total 178.85 last 5.20)
 - created 360000 (time 1438703831.62 total 183.92 last 5.07)
 - created 370000 (time 1438703836.67 total 188.97 last 5.06)
 - created 380000 (time 1438703841.83 total 194.13 last 5.15)
 - created 390000 (time 1438703847.01 total 199.31 last 5.19)
 - created 400000 (time 1438703852.11 total 204.41 last 5.10)
 - created 410000 (time 1438703857.19 total 209.49 last 5.07)
 - created 420000 (time 1438703862.38 total 214.68 last 5.19)
 - created 430000 (time 1438703867.48 total 219.78 last 5.10)
 - created 440000 (time 1438703872.90 total 225.20 last 5.41)
 - created 450000 (time 1438703878.17 total 230.47 last 5.27)
 - created 460000 (time 1438703883.32 total 235.62 last 5.15)
 - created 470000 (time 1438703888.47 total 240.77 last 5.15)
 - created 480000 (time 1438703893.75 total 246.05 last 5.28)
 - created 490000 (time 1438703899.01 total 251.31 last 5.26)
 - created 500000 (time 1438703904.15 total 256.45 last 5.14)
 - created 510000 (time 1438703909.34 total 261.64 last 5.19)
 - created 520000 (time 1438703914.59 total 266.89 last 5.25)
 - created 530000 (time 1438703919.73 total 272.03 last 5.14)
 - created 540000 (time 1438703924.85 total 277.15 last 5.12)
 - created 550000 (time 1438703929.99 total 282.29 last 5.14)
 - created 560000 (time 1438703935.10 total 287.40 last 5.11)
 - created 570000 (time 1438703940.29 total 292.59 last 5.19)
 - created 580000 (time 1438703945.41 total 297.71 last 5.12)
 - created 590000 (time 1438703950.59 total 302.89 last 5.18)
 - created 600000 (time 1438703955.72 total 308.02 last 5.13)
 - created 610000 (time 1438703960.78 total 313.08 last 5.06)
 - created 620000 (time 1438703965.98 total 318.28 last 5.21)
 - created 630000 (time 1438703971.12 total 323.42 last 5.14)
 - created 640000 (time 1438703976.24 total 328.54 last 5.12)
 - created 650000 (time 1438703981.35 total 333.65 last 5.11)
 - created 660000 (time 1438703986.48 total 338.78 last 5.13)
 - created 670000 (time 1438703991.63 total 343.93 last 5.15)
 - created 680000 (time 1438703996.77 total 349.07 last 5.14)
 - created 690000 (time 1438704001.96 total 354.26 last 5.18)
 - created 700000 (time 1438704007.10 total 359.40 last 5.14)
 - created 710000 (time 1438704012.23 total 364.53 last 5.13)
 - created 720000 (time 1438704017.36 total 369.66 last 5.14)
 - created 730000 (time 1438704022.49 total 374.79 last 5.13)
 - created 740000 (time 1438704027.64 total 379.94 last 5.15)
 - created 750000 (time 1438704032.80 total 385.10 last 5.16)
 - created 760000 (time 1438704037.99 total 390.29 last 5.19)
 - created 770000 (time 1438704043.11 total 395.41 last 5.12)
 - created 780000 (time 1438704048.31 total 400.61 last 5.20)
 - created 790000 (time 1438704053.45 total 405.75 last 5.14)
 - created 800000 (time 1438704058.56 total 410.86 last 5.11)
 - created 810000 (time 1438704063.86 total 416.16 last 5.30)
 - created 820000 (time 1438704069.38 total 421.68 last 5.52)
 - created 830000 (time 1438704074.92 total 427.22 last 5.54)
 - created 840000 (time 1438704080.29 total 432.59 last 5.37)
 - created 850000 (time 1438704085.56 total 437.86 last 5.27)
 - created 860000 (time 1438704090.73 total 443.03 last 5.17)
 - created 870000 (time 1438704095.85 total 448.15 last 5.12)
 - created 880000 (time 1438704101.01 total 453.31 last 5.15)
 - created 890000 (time 1438704106.20 total 458.50 last 5.19)
 - created 900000 (time 1438704111.27 total 463.57 last 5.07)
 - created 910000 (time 1438704116.38 total 468.68 last 5.11)
 - created 920000 (time 1438704121.54 total 473.84 last 5.16)
 - created 930000 (time 1438704126.69 total 478.99 last 5.14)
 - created 940000 (time 1438704131.80 total 484.10 last 5.11)
 - created 950000 (time 1438704137.22 total 489.52 last 5.42)
 - created 960000 (time 1438704142.63 total 494.93 last 5.40)
 - created 970000 (time 1438704148.05 total 500.35 last 5.42)
 - created 980000 (time 1438704153.58 total 505.88 last 5.53)
 - created 990000 (time 1438704158.84 total 511.14 last 5.26)
 - created 1000000 (time 1438704163.98 total 516.28 last 5.15)
 - created 1010000 (time 1438704169.13 total 521.43 last 5.15)
 - created 1020000 (time 1438704174.14 total 526.44 last 5.00)
 - created 1030000 (time 1438704179.33 total 531.63 last 5.19)
 - created 1040000 (time 1438704184.46 total 536.76 last 5.13)
total: 1048576 creates in 541.19 seconds: 1937.53 creates/second
[root@c01 tests]# lfs migrate -m 1 /mnt/lustre/test1

Broadcast message from root@mds01
	(unknown) at 9:14 ...

The system is going down for halt NOW!
Connection to mds01 closed by remote host.
Connection to mds01 closed.
[di.wang@opensfs ~]$ ssh root@mds01
root@mds01's password: 
Last login: Tue Aug  4 08:48:50 2015 from headnode.lab.opensfs.org
[root@mds01 ~]# mount -t lustre /dev/sdc1 /lustre/mdt1
warning: /lustre/mdt1: cannot resolve: No such file or directory
[root@mds01 ~]# mount -t lustre /dev/sdc1 /lustre/mds1
mount.lustre: increased /sys/block/sdc/queue/max_sectors_kb from 1024 to 16383
[root@mds01 ~]# mount -t lustre /dev/sdc2 /lustre/mds2
[root@mds01 ~]# mount -t lustre /dev/sdc3 /lustre/mds3
[root@mds01 ~]# mount -t lustre /dev/sdc4 /lustre/mds4
[root@mds01 ~]# ssh root@c01
root@c01's password: 
Last login: Tue Aug  4 08:49:10 2015 from mds01-ib.lab.opensfs.org
[root@c01 ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1             20642428   1723816  17870036   9% /
tmpfs                 16434912         0  16434912   0% /dev/shm
192.168.0.1:/scratch 416433152 289242112 106037248  74% /scratch
192.168.0.1:/home    416433152 289242112 106037248  74% /home
mds01@o2ib:/lustre    83624752   1855204  77420568   3% /mnt/lustre
[root@c01 ~]# lfs getdirstripe /mnt/lustre/test1
/mnt/lustre/test1
lmv_stripe_count: 0 lmv_stripe_offset: 1
[root@c01 ~]# 
[root@c01 ~]# cat /proc/fs/lustre/version 
lustre: 2.7.57
kernel: patchless_client
build:  jenkins-arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel-3126-gf5f05e3-PRISTINE-2.6.32-431.29.2.el6.x86_64
[root@c01 ~]# lctl dl
  0 UP mgc MGC192.168.2.125@o2ib b088cc07-fabb-0aef-b555-8ed7ddd89d7a 5
  1 UP lov lustre-clilov-ffff88082e573400 b3af4bde-dbc7-620b-7d05-4af03efda18c 4
  2 UP lmv lustre-clilmv-ffff88082e573400 b3af4bde-dbc7-620b-7d05-4af03efda18c 4
  3 UP mdc lustre-MDT0000-mdc-ffff88082e573400 b3af4bde-dbc7-620b-7d05-4af03efda18c 5
  4 UP osc lustre-OST0000-osc-ffff88082e573400 b3af4bde-dbc7-620b-7d05-4af03efda18c 5
  5 UP osc lustre-OST0001-osc-ffff88082e573400 b3af4bde-dbc7-620b-7d05-4af03efda18c 5
  6 UP osc lustre-OST0002-osc-ffff88082e573400 b3af4bde-dbc7-620b-7d05-4af03efda18c 5
  7 UP osc lustre-OST0003-osc-ffff88082e573400 b3af4bde-dbc7-620b-7d05-4af03efda18c 5
  8 UP mdc lustre-MDT0001-mdc-ffff88082e573400 b3af4bde-dbc7-620b-7d05-4af03efda18c 5
  9 UP mdc lustre-MDT0002-mdc-ffff88082e573400 b3af4bde-dbc7-620b-7d05-4af03efda18c 5
 10 UP mdc lustre-MDT0003-mdc-ffff88082e573400 b3af4bde-dbc7-620b-7d05-4af03efda18c 5
[root@c01 ~]# 
Comment by Di Wang [ 05/Aug/15 ]

LFSCK checking result

[root@mds01 ~]# lctl 
lctl > lfsck_start -M lustre-MDT0000 -A -t namespace
Started LFSCK on the device lustre-MDT0000: scrub namespace
lctl > lfsck_start -M lustre-MDT0001 -A -t namespace
Started LFSCK on the device lustre-MDT0001: scrub namespace
lctl > lfsck_start -M lustre-MDT0002 -A -t namespace
Started LFSCK on the device lustre-MDT0002: scrub namespace
lctl > lfsck_start -M lustre-MDT0003 -A -t namespace
Started LFSCK on the device lustre-MDT0003: scrub namespace
lctl > q

[root@mds01 ~]# cat /proc/fs/lustre/mdd/lustre-MDT0000/lfsck_namespace 
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets
last_completed_time: 1438728360
time_since_last_completed: 12512 seconds
latest_start_time: 1438728030
time_since_latest_start: 12842 seconds
last_checkpoint_time: 1438728360
time_since_last_checkpoint: 12512 seconds
latest_start_position: 77, N/A, N/A
last_checkpoint_position: 26188800, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 4
checked_phase2: 1048591
updated_phase1: 0
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 2
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 0
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 0
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
success_count: 4
run_time_phase1: 12 seconds
run_time_phase2: 123 seconds
average_speed_phase1: 0 items/sec
average_speed_phase2: 8525 objs/sec
average_speed_total: 7767 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
[root@mds01 ~]# cat /proc/fs/lustre/mdd/lustre-MDT0001/lfsck_namespace 
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets
last_completed_time: 1438741230
time_since_last_completed: 42 seconds
latest_start_time: 1438741026
time_since_latest_start: 246 seconds
last_checkpoint_time: 1438741230
time_since_last_checkpoint: 42 seconds
latest_start_position: 77, N/A, N/A
last_checkpoint_position: 24617473, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 1048578
checked_phase2: 17
updated_phase1: 0
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 2
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 0
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 0
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
success_count: 5
run_time_phase1: 203 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 5165 items/sec
average_speed_phase2: 17 objs/sec
average_speed_total: 5140 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
[root@mds01 ~]# cat /proc/fs/lustre/mdd/lustre-MDT0002/lfsck_namespace 
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets
last_completed_time: 1438741230
time_since_last_completed: 53 seconds
latest_start_time: 1438741026
time_since_latest_start: 257 seconds
last_checkpoint_time: 1438741230
time_since_last_checkpoint: 53 seconds
latest_start_position: 77, N/A, N/A
last_checkpoint_position: 24093697, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 1
checked_phase2: 16
updated_phase1: 0
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 1
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 0
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 0
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
success_count: 5
run_time_phase1: 0 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 1 items/sec
average_speed_phase2: 16 objs/sec
average_speed_total: 8 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
[root@mds01 ~]# cat /proc/fs/lustre/mdd/lustre-MDT0003/lfsck_namespace 
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets
last_completed_time: 1438741230
time_since_last_completed: 59 seconds
latest_start_time: 1438741026
time_since_latest_start: 263 seconds
last_checkpoint_time: 1438741230
time_since_last_checkpoint: 59 seconds
latest_start_position: 77, N/A, N/A
last_checkpoint_position: 19379713, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 1
checked_phase2: 16
updated_phase1: 0
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 1
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 0
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 0
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
success_count: 5
run_time_phase1: 0 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 1 items/sec
average_speed_phase2: 16 objs/sec
average_speed_total: 8 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
Generated at Sat Feb 10 02:04:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.