Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.1.1
-
None
-
Lustre servers are running 2.6.32-220.el6, with Lustre 2.1.1.rc4.
Lustre clients are running 2.6.38.2, with special code created for this release, with http://review.whamcloud.com/#change,2170.
-
2
-
6427
Description
Customer creates a directory on one node, puts a file in that directory. Then, on a different client, tries to recursively remove the directory. Then, back on the first client, tries making the directory again. This fails.
Here are the exact steps:
usrs400 $ mkdir /mnt/lustre/foo
usrs400 $ touch /mnt/lustre/foo/bar
usrs399 $ rm -rf /mnt/lustre/foo
usrs400 $ mkdir /mnt/lustre/foo
mkdir: cannot create directory `foo': File exists
usrs400 $ ls /mnt/lustre/foo
ls: cannot access foo: No such file or directory
usrs400 $ mkdir /mnt/lustre/foo
mkdir: cannot create directory `foo': File exists
usrs400 $ rmdir /mnt/lustre/foo
rmdir: failed to remove `foo': No such file or directory
usrs400 $ mkdir /mnt/lustre/foo
The customer has waited 10 minutes for this to complete. For the following output from /var/log/messages, the customer only waited a second or two. Also, the customer unmounted and remounted the clients, to keep things simpler.
usrs400:/var/log/messages:
Mar 27 12:26:57 usrs400 kernel: [677137.616534] Lustre: Lustre: Build Version: ../lustre/scripts-20120222220600-PRISTINE-../lustre/scripts
Mar 27 12:26:57 usrs400 kernel: [677137.758165] Lustre: Added LNI 192.168.185.6@tcp [8/256/0/180]
Mar 27 12:26:57 usrs400 kernel: [677137.758237] Lustre: Accept secure, port 988
Mar 27 12:27:01 usrs400 kernel: [677141.529389] Lustre: MGC192.168.185.35@tcp: Reactivating import
Mar 27 12:27:01 usrs400 kernel: [677141.568728] LustreError: 31684:0:(obd_config.c:1147:class_process_proc_param()) writing proc entry checksum_pages err -11
Mar 27 12:27:01 usrs400 kernel: [677141.774591] Lustre: Client xxxxxx-client has started
Mar 27 12:27:32 usrs400 kernel: [677173.081643] LustreError: 31698:0:(file.c:2228:ll_inode_revalidate_fini()) failure -2 inode 144115239078592513
Mar 27 12:27:32 usrs400 kernel: [677173.208438] LustreError: 31698:0:(file.c:2228:ll_inode_revalidate_fini()) failure -2 inode 144115239078592513
usrs399:/var/log/messages:
Mar 27 12:26:20 usrs399 kernel: [677221.853364] Lustre: Lustre: Build Version: ../lustre/scripts-20120222220600-PRISTINE-../lustre/scripts
Mar 27 12:26:20 usrs399 kernel: [677221.993703] Lustre: Added LNI 192.168.185.7@tcp [8/256/0/180]
Mar 27 12:26:20 usrs399 kernel: [677221.993777] Lustre: Accept secure, port 988
Mar 27 12:26:39 usrs399 kernel: [677240.559658] Lustre: MGC192.168.185.35@tcp: Reactivating import
Mar 27 12:26:39 usrs399 kernel: [677240.592294] LustreError: 621:0:(obd_config.c:1147:class_process_proc_param()) writing proc entry checksum_pages err -11
Mar 27 12:26:39 usrs399 kernel: [677240.825955] Lustre: Client xxxxxx-client has started
On the MDS (I think that there is some clock skew):
Mar 27 12:26:39 ts-xxxxxxxx-01 kernel: Lustre: 2523:0:(ldlm_lib.c:877:target_handle_connect()) MGS: connection from b92afcf0-1504-ed4b-819e-d31039236758@192.168.185.7@tcp t0 exp (null) cur 1332851199 last 0
Mar 27 12:26:39 ts-xxxxxxxx-01 kernel: Lustre: 2523:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGS->NET_0x20000c0a8b907_UUID netid 20000: select flavor null
Mar 27 12:26:39 ts-xxxxxxxx-01 kernel: Lustre: 10205:0:(ldlm_lib.c:877:target_handle_connect()) xxxxxx-MDT0000: connection from 11909088-d4a2-77fa-030f-1c9e2a493436@192.168.185.7@tcp t0 exp (null) cur 1332851199 last 0
Mar 27 12:26:39 ts-xxxxxxxx-01 kernel: Lustre: 10205:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import xxxxxx-MDT0000->NET_0x20000c0a8b907_UUID netid 20000: select flavor null
Mar 27 12:27:01 ts-xxxxxxxx-01 kernel: Lustre: 2523:0:(ldlm_lib.c:877:target_handle_connect()) MGS: connection from 8927fc00-6820-4bea-63ad-35146f43bb3e@192.168.185.6@tcp t0 exp (null) cur 1332851221 last 0
Mar 27 12:27:01 ts-xxxxxxxx-01 kernel: Lustre: 2523:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGS->NET_0x20000c0a8b906_UUID netid 20000: select flavor null
Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) header@ffff88061fac1ec0[0x0, 1, [0x200000be0:0x1:0x0] hash lru]
header@ffff88061fac1ec0
Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) Object doesn't exist!
Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) header@ffff88061fac1ec0[0x0, 1, [0x200000be0:0x1:0x0] hash lru]
header@ffff88061fac1ec0
Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) Object doesn't exist!