Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4712

racer test_1: oops at __d_lookup+0x8c

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • Lustre 2.7.0
    • Lustre 2.6.0
    • client and server: lustre-master build # 1911 RHEL6 ldiskfs DNE
    • 3
    • 12954

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/57b54ffa-a02d-11e3-947c-52540035b04c.

      The sub-test test_1 failed with the following error:

      test failed to respond and timed out

      client console

      00:22:43:Lustre: DEBUG MARKER: == racer test 1: racer on clients: client-32vm5,client-32vm6.lab.whamcloud.com DURATION=900 == 00:20:32 (1393489232)
      00:22:44:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=2 				   /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer1 
      00:22:44:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=2 				   /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer1 
      00:22:44:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=2 				   /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer 
      00:22:44:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=2 				   /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 
      00:22:44:LustreError: 14649:0:(lmv_intent.c:251:lmv_revalidate_slaves()) lustre-clilmv-ffff880037cd8400: nlink 0 < 2 corrupt stripe 0 [0x3c0000402:0xf4:0x0]:[0x3c0000402:0xf4:0x0]
      00:22:44:LustreError: 14651:0:(lmv_intent.c:251:lmv_revalidate_slaves()) lustre-clilmv-ffff880037cd8400: nlink 0 < 2 corrupt stripe 0 [0x3c0000402:0xf4:0x0]:[0x3c0000402:0xf4:0x0]
      00:22:44:LustreError: 17543:0:(lmv_intent.c:251:lmv_revalidate_slaves()) lustre-clilmv-ffff88007aba6800: nlink 0 < 2 corrupt stripe 0 [0x3c0000401:0x2b6:0x0]:[0x3c0000401:0x2b6:0x0]
      00:22:44:LustreError: 19056:0:(dir.c:467:ll_dir_setstripe()) mdc_setattr fails: rc = -22
      00:22:44:LustreError: 17285:0:(dir.c:467:ll_dir_setstripe()) mdc_setattr fails: rc = -22
      00:22:44:LustreError: 22091:0:(dir.c:467:ll_dir_setstripe()) mdc_setattr fails: rc = -2
      00:22:44:LustreError: 22091:0:(dir.c:467:ll_dir_setstripe()) Skipped 3 previous similar messages
      00:22:44:LustreError: 24110:0:(dir.c:467:ll_dir_setstripe()) mdc_setattr fails: rc = -22
      00:22:44:LustreError: 4266:0:(lmv_intent.c:251:lmv_revalidate_slaves()) lustre-clilmv-ffff88007aba6800: nlink 0 < 2 corrupt stripe 0 [0x3c0000403:0x61b:0x0]:[0x3c0000403:0x61b:0x0]
      00:22:44:LustreError: 6322:0:(lmv_intent.c:251:lmv_revalidate_slaves()) lustre-clilmv-ffff88007aba6800: nlink 0 < 2 corrupt stripe 0 [0x400000402:0x8d1:0x0]:[0x400000402:0x8d1:0x0]
      00:22:44:LustreError: 11-0: lustre-OST0006-osc-ffff880037cd8400: Communicating with 10.10.4.199@tcp, operation ldlm_enqueue failed with -107.
      00:22:45:Lustre: lustre-OST0006-osc-ffff880037cd8400: Connection to lustre-OST0006 (at 10.10.4.199@tcp) was lost; in progress operations using this service will wait for recovery to complete
      00:22:45:LustreError: 167-0: lustre-OST0006-osc-ffff880037cd8400: This client was evicted by lustre-OST0006; in progress operations using this service will fail.
      00:22:45:LustreError: 11-0: lustre-OST0006-osc-ffff880037cd8400: Communicating with 10.10.4.199@tcp, operation ldlm_enqueue failed with -107.
      00:22:45:Lustre: 2344:0:(llite_lib.c:2697:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.10.4.198@tcp:/lustre/fid: [0x3c0000401:0x949:0x0]/ may get corrupted (rc -108)
      00:23:07:Lustre: 2344:0:(llite_lib.c:2697:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.10.4.198@tcp:/lustre/fid: [0x400000401:0xcd7:0x0]/ may get corrupted (rc -108)
      00:23:07:LustreError: 7998:0:(osc_lock.c:830:osc_ldlm_completion_ast()) lock@ffff880071ad8738[2 3 0 1 1 00000000] W(2):[0, 18446744073709551615]@[0x380000400:0x45:0x0] {
      00:23:07:LustreError: 7998:0:(osc_lock.c:830:osc_ldlm_completion_ast())     lovsub@ffff88006e14a8a0: [0 ffff88006f61fe30 W(2):[0, 18446744073709551615]@[0x400000401:0x2b2:0x0]] 
      00:23:07:LustreError: 7998:0:(osc_lock.c:830:osc_ldlm_completion_ast())     osc@ffff88006f6227b8: ffff880072f2c100    0x20080020002 0xb20e2ac1892f0c38 3 ffff88007d6978c8 size: 0 mtime: 1393489253 atime: 0 ctime: 1393489253 blocks: 0
      00:23:07:LustreError: 7998:0:(osc_lock.c:830:osc_ldlm_completion_ast()) } lock@ffff880071ad8738
      00:23:08:LustreError: 7998:0:(osc_lock.c:830:osc_ldlm_completion_ast()) dlmlock returned -5
      00:23:08:LustreError: 10363:0:(lov_lock.c:798:lov_lock_cancel()) lock@ffff88006e240660[3 1 0 0 0 00000005] W(2):[0, 18446744073709551615]@[0x400000401:0x2b2:0x0] {
      00:23:08:LustreError: 10363:0:(lov_lock.c:798:lov_lock_cancel())     vvp@ffff88006f61ef60: 
      00:23:08:LustreError: 10363:0:(lov_lock.c:798:lov_lock_cancel())     lov@ffff88006f61fe30: 4
      00:23:08:LustreError: 10363:0:(lov_lock.c:798:lov_lock_cancel())     0 0: ---
      00:23:08:LustreError: 10363:0:(lov_lock.c:798:lov_lock_cancel())     1 0: lock@ffff88006c625ed0[2 5 0 0 0 00000001] W(2):[0, 18446744073709551615]@[0x200000400:0x43:0x0] {
      00:23:08:LustreError: 10363:0:(lov_lock.c:798:lov_lock_cancel())     lovsub@ffff88006e14ac60: [1 ffff88006f61fe30 W(2):[0, 18446744073709551615]@[0x400000401:0x2b2:0x0]] 
      00:23:08:LustreError: 10363:0:(lov_lock.c:798:lov_lock_cancel())     osc@ffff88006d1facc0: ffff88006d321c80    0x20080020002 0xb20e2ac1892f0dff 5 (null) size: 0 mtime: 1393489253 atime: 1393489251 ctime: 1393489253 blocks: 0
      00:23:09:LustreError: 10363:0:(lov_lock.c:798:lov_lock_cancel()) } lock@ffff88006c625ed0
      00:23:09:LustreError: 10363:0:(lov_lock.c:798:lov_lock_cancel())     2 0: ---
      00:23:09:LustreError: 10363:0:(lov_lock.c:798:lov_lock_cancel())     3 0: ---
      00:23:09:LustreError: 10363:0:(lov_lock.c:798:lov_lock_cancel()) 
      00:23:10:LustreError: 10363:0:(lov_lock.c:798:lov_lock_cancel()) } lock@ffff88006e240660
      00:23:10:LustreError: 10363:0:(lov_lock.c:798:lov_lock_cancel()) lov_lock_cancel fails with -5.
      00:23:10:Lustre: lustre-OST0006-osc-ffff880037cd8400: Connection restored to lustre-OST0006 (at 10.10.4.199@tcp)
      00:23:10:LustreError: 11-0: lustre-OST0004-osc-ffff88007aba6800: Communicating with 10.10.4.199@tcp, operation ldlm_enqueue failed with -107.
      00:23:10:Lustre: lustre-OST0004-osc-ffff88007aba6800: Connection to lustre-OST0004 (at 10.10.4.199@tcp) was lost; in progress operations using this service will wait for recovery to complete
      00:23:11:LustreError: 167-0: lustre-OST0004-osc-ffff88007aba6800: This client was evicted by lustre-OST0004; in progress operations using this service will fail.
      00:23:11:Lustre: 2343:0:(llite_lib.c:2697:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.10.4.198@tcp:/lustre/fid: [0x400000401:0xd55:0x0]/ may get corrupted (rc -108)
      00:23:11:Lustre: 2343:0:(llite_lib.c:2697:ll_dirty_page_discard_warn()) Skipped 5 previous similar messages
      00:23:11:LustreError: 15161:0:(osc_lock.c:830:osc_ldlm_completion_ast()) lock@ffff880070acaa98[2 3 0 1 1 00000000] W(2):[0, 18446744073709551615]@[0x100040000:0x64:0x0] {
      00:23:11:LustreError: 15161:0:(osc_lock.c:830:osc_ldlm_completion_ast())     lovsub@ffff880063ef3120: [0 ffff88006d391610 W(2):[0, 18446744073709551615]@[0x3c0000402:0x365:0x0]] 
      00:23:11:LustreError: 15161:0:(osc_lock.c:830:osc_ldlm_completion_ast())     osc@ffff8800674e5420: ffff88007081f740    0x20080020002 0xb20e2ac18931b4ea 3 ffff8800708b6058 size: 5 mtime: 1393489283 atime: 0 ctime: 1393489283 blocks: 8
      00:23:12:LustreError: 15161:0:(osc_lock.c:830:osc_ldlm_completion_ast()) } lock@ffff880070acaa98
      00:23:12:LustreError: 15161:0:(osc_lock.c:830:osc_ldlm_completion_ast()) dlmlock returned -5
      00:23:12:LustreError: 10650:0:(lov_lock.c:798:lov_lock_cancel()) lock@ffff88006db3d228[2 1 0 0 0 00000005] W(2):[0, 18446744073709551615]@[0x3c0000402:0x365:0x0] {
      00:23:13:LustreError: 10650:0:(lov_lock.c:798:lov_lock_cancel())     vvp@ffff880070b9e9c0: 
      00:23:13:LustreError: 10650:0:(lov_lock.c:798:lov_lock_cancel())     lov@ffff88006d391610: 3
      00:28:36:LustreError: 10650:0:(lov_lock.c:798:lov_lock_cancel())     0 0: ---
      00:28:37:LustreError: 10650:0:(lov_lock.c:798:lov_lock_cancel())     1 0: lock@ffff880070aca588[0 5 0 0 0 00000000] W(2):[0, 18446744073709551615]@[0x100050000:0x64:0x0] {
      00:28:37:LustreError: 10650:0:(lov_lock.c:798:lov_lock_cancel())     lovsub@ffff88006c4c3460: [1 ffff88006d391610 W(2):[0, 18446744073709551615]@[0x3c0000402:0x365:0x0]] 
      00:28:37:LustreError: 10650:0:(lov_lock.c:798:lov_lock_cancel())     osc@ffff88006ed489e0: ffff88006a958140    0x20080020002 0xb20e2ac18931bcdf 4 (null) size: 0 mtime: 1393489283 atime: 0 ctime: 1393489283 blocks: 0
      00:28:37:LustreError: 10650:0:(lov_lock.c:798:lov_lock_cancel()) } lock@ffff880070aca588
      00:28:37:LustreError: 10650:0:(lov_lock.c:798:lov_lock_cancel())     2 0: ---
      00:28:37:LustreError: 10650:0:(lov_lock.c:798:lov_lock_cancel()) 
      00:28:37:LustreError: 10650:0:(lov_lock.c:798:lov_lock_cancel()) } lock@ffff88006db3d228
      00:28:37:LustreError: 10650:0:(lov_lock.c:798:lov_lock_cancel()) lov_lock_cancel fails with -5.
      00:28:37:Lustre: lustre-OST0004-osc-ffff88007aba6800: Connection restored to lustre-OST0004 (at 10.10.4.199@tcp)
      00:28:37:LustreError: 15216:0:(dir.c:467:ll_dir_setstripe()) mdc_setattr fails: rc = -22
      00:28:37:LustreError: 15216:0:(dir.c:467:ll_dir_setstripe()) Skipped 1 previous similar message
      00:28:37:LustreError: 16899:0:(lmv_intent.c:251:lmv_revalidate_slaves()) lustre-clilmv-ffff88007aba6800: nlink 0 < 2 corrupt stripe 0 [0x400000404:0xe1c:0x0]:[0x400000404:0xe1c:0x0]
      00:28:38:LustreError: 17215:0:(dir.c:467:ll_dir_setstripe()) mdc_setattr fails: rc = -22
      00:28:38:LustreError: 17215:0:(dir.c:467:ll_dir_setstripe()) Skipped 1 previous similar message
      00:28:38:LustreError: 20240:0:(dir.c:467:ll_dir_setstripe()) mdc_setattr fails: rc = -22
      00:28:38:LustreError: 20240:0:(dir.c:467:ll_dir_setstripe()) Skipped 3 previous similar messages
      00:28:38:LustreError: 25893:0:(lmv_intent.c:251:lmv_revalidate_slaves()) lustre-clilmv-ffff88007aba6800: nlink 0 < 2 corrupt stripe 0 [0x400000402:0x1339:0x0]:[0x400000402:0x1339:0x0]
      00:28:38:LustreError: 25893:0:(lmv_intent.c:251:lmv_revalidate_slaves()) Skipped 4 previous similar messages
      00:28:38:LustreError: 31191:0:(lmv_intent.c:251:lmv_revalidate_slaves()) lustre-clilmv-ffff880037cd8400: nlink 0 < 2 corrupt stripe 0 [0x400000402:0x13b0:0x0]:[0x400000402:0x13b0:0x0]
      00:28:38:LustreError: 31191:0:(lmv_intent.c:251:lmv_revalidate_slaves()) Skipped 1 previous similar message
      00:28:38:LustreError: 4926:0:(dir.c:467:ll_dir_setstripe()) mdc_setattr fails: rc = -22
      00:28:38:LustreError: 4926:0:(dir.c:467:ll_dir_setstripe()) Skipped 3 previous similar messages
      00:28:38:LustreError: 11-0: lustre-MDT0001-mdc-ffff88007aba6800: Communicating with 10.10.4.202@tcp, operation ldlm_enqueue failed with -107.
      00:28:38:Lustre: lustre-MDT0001-mdc-ffff88007aba6800: Connection to lustre-MDT0001 (at 10.10.4.202@tcp) was lost; in progress operations using this service will wait for recovery to complete
      00:28:38:LustreError: 167-0: lustre-MDT0001-mdc-ffff88007aba6800: This client was evicted by lustre-MDT0001; in progress operations using this service will fail.
      00:28:38:LustreError: 31718:0:(mdc_locks.c:920:mdc_enqueue()) ldlm_cli_enqueue: -5
      00:28:38:LustreError: 31718:0:(mdc_locks.c:920:mdc_enqueue()) Skipped 4 previous similar messages
      00:28:38:LustreError: 31718:0:(file.c:3088:ll_inode_revalidate_fini()) lustre: revalidate FID [0x400000401:0x1ad8:0x0] error: rc = -5
      00:28:38:LustreError: 7822:0:(mdc_locks.c:920:mdc_enqueue()) ldlm_cli_enqueue: -108
      00:28:38:LustreError: 7822:0:(mdc_locks.c:920:mdc_enqueue()) Skipped 2 previous similar messages
      00:28:38:LustreError: 7822:0:(mdc_request.c:1537:mdc_read_page()) lustre-MDT0001-mdc-ffff88007aba6800: [0x400000401:0x1ad8:0x0] lock enqueue fails: rc = -108
      00:28:38:LustreError: 7822:0:(file.c:174:ll_close_inode_openhandle()) lustre-clilmv-ffff88007aba6800: inode [0x400000401:0x1ad8:0x0] mdc close failed: rc = -108
      00:28:39:LustreError: 31718:0:(file.c:3088:ll_inode_revalidate_fini()) Skipped 1 previous similar message
      00:28:39:LustreError: 6258:0:(file.c:3088:ll_inode_revalidate_fini()) lustre: revalidate FID [0x400000401:0x1a30:0x0] error: rc = -108
      00:28:39:LustreError: 31735:0:(file.c:174:ll_close_inode_openhandle()) lustre-clilmv-ffff88007aba6800: inode [0x400000400:0x1:0x0] mdc close failed: rc = -108
      00:31:21:LustreError: 31735:0:(file.c:174:ll_close_inode_openhandle()) Skipped 2 previous similar messages
      00:31:21:LustreError: 31742:0:(lmv_obd.c:1424:lmv_fid_alloc()) Can't alloc new fid, rc -19
      00:31:21:LustreError: 7645:0:(vvp_io.c:1215:vvp_io_init()) lustre: refresh file layout [0x400000400:0x1c3a:0x0] error -108.
      00:31:21:LustreError: 7645:0:(vvp_io.c:1215:vvp_io_init()) lustre: refresh file layout [0x400000400:0x1c3a:0x0] error -108.
      00:31:21:LustreError: 31750:0:(lmv_obd.c:1424:lmv_fid_alloc()) Can't alloc new fid, rc -19
      00:31:21:Lustre: lustre-MDT0001-mdc-ffff88007aba6800: Connection restored to lustre-MDT0001 (at 10.10.4.202@tcp)
      00:31:21:LustreError: 341:0:(dir.c:467:ll_dir_setstripe()) mdc_setattr fails: rc = -22
      00:31:21:LustreError: 341:0:(dir.c:467:ll_dir_setstripe()) Skipped 5 previous similar messages
      00:31:21:LustreError: 32200:0:(lmv_intent.c:251:lmv_revalidate_slaves()) lustre-clilmv-ffff88007aba6800: nlink 0 < 2 corrupt stripe 0 [0x400000401:0x1ad8:0x0]:[0x400000401:0x1ad8:0x0]
      00:31:21:LustreError: 11-0: lustre-MDT0001-mdc-ffff880037cd8400: Communicating with 10.10.4.202@tcp, operation ldlm_enqueue failed with -107.
      00:31:21:LustreError: Skipped 1 previous similar message
      00:31:21:Lustre: lustre-MDT0001-mdc-ffff880037cd8400: Connection to lustre-MDT0001 (at 10.10.4.202@tcp) was lost; in progress operations using this service will wait for recovery to complete
      00:31:21:LustreError: 167-0: lustre-MDT0001-mdc-ffff880037cd8400: This client was evicted by lustre-MDT0001; in progress operations using this service will fail.
      00:31:21:LustreError: 13398:0:(mdc_locks.c:920:mdc_enqueue()) ldlm_cli_enqueue: -5
      00:31:21:LustreError: 13398:0:(mdc_locks.c:920:mdc_enqueue()) Skipped 224 previous similar messages
      00:31:21:LustreError: 15879:0:(file.c:174:ll_close_inode_openhandle()) lustre-clilmv-ffff880037cd8400: inode [0x400000401:0x239f:0x0] mdc close failed: rc = -108
      00:31:21:LustreError: 15879:0:(file.c:174:ll_close_inode_openhandle()) Skipped 53 previous similar messages
      00:31:21:LustreError: 15671:0:(mdc_request.c:1537:mdc_read_page()) lustre-MDT0001-mdc-ffff880037cd8400: [0x400000400:0x1:0x0] lock enqueue fails: rc = -108
      00:31:21:LustreError: 15671:0:(mdc_request.c:1537:mdc_read_page()) Skipped 1 previous similar message
      00:31:21:LustreError: 15730:0:(lmv_obd.c:1424:lmv_fid_alloc()) Can't alloc new fid, rc -19
      00:31:21:LustreError: 15730:0:(lmv_obd.c:1424:lmv_fid_alloc()) Skipped 22 previous similar messages
      00:31:21:LustreError: 4415:0:(vvp_io.c:1215:vvp_io_init()) lustre: refresh file layout [0x400000400:0x1ad0:0x0] error -108.
      00:31:21:LustreError: 4415:0:(vvp_io.c:1215:vvp_io_init()) Skipped 4 previous similar messages
      00:31:21:Lustre: lustre-MDT0001-mdc-ffff880037cd8400: Connection restored to lustre-MDT0001 (at 10.10.4.202@tcp)
      00:31:21:BUG: unable to handle kernel paging request at fffffffd00000018
      00:31:21:IP: [<ffffffff811a374c>] __d_lookup+0x8c/0x150
      00:31:21:PGD 1a87067 PUD 0 
      00:31:21:Oops: 0000 [#1] SMP 
      00:31:21:last sysfs file: /sys/devices/system/cpu/online
      00:31:22:CPU 1 
      00:31:22:Modules linked in: lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc_gss(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      00:31:22:
      00:31:22:Pid: 10651, comm: file_rm.sh Not tainted 2.6.32-431.5.1.el6.x86_64 #1 Red Hat KVM
      00:31:22:RIP: 0010:[<ffffffff811a374c>]  [<ffffffff811a374c>] __d_lookup+0x8c/0x150
      00:31:22:RSP: 0018:ffff880071a31c88  EFLAGS: 00010286
      00:31:22:RAX: 0000000000000005 RBX: fffffffd00000000 RCX: 0000000000000012
      00:31:22:RDX: 018721e00667721f RSI: ffff880071a31d68 RDI: ffff88007e801980
      00:31:22:RBP: ffff880071a31cd8 R08: ffff880071a31d7d R09: 00000000fffffffa
      00:31:22:R10: 0000000000000004 R11: 0000000000000000 R12: fffffffcffffffe8
      00:31:22:R13: ffff88007e801980 R14: 00000000086181b9 R15: 0000000000003f40
      00:31:22:FS:  00007fbad989b700(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
      00:31:22:CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      00:31:22:CR2: fffffffd00000018 CR3: 000000006e172000 CR4: 00000000000006e0
      00:31:22:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      00:31:22:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      00:31:22:Process file_rm.sh (pid: 10651, threadinfo ffff880071a30000, task ffff880072d3e080)
      00:31:22:Stack:
      00:31:22: ffff880071a31d78 0000000500000001 0000000000000005 ffff880071a31d68
      00:31:22:<d> 0000000000000000 0000000000010870 ffff880071a31d68 ffff88007e801980
      00:31:22:<d> ffff880071a31d68 0000000000003f40 ffff880071a31d08 ffffffff811a3fc5
      00:31:22:Call Trace:
      00:31:22: [<ffffffff811a3fc5>] d_lookup+0x35/0x60
      00:31:22: [<ffffffff811a4073>] d_hash_and_lookup+0x83/0xb0
      00:31:22: [<ffffffff811f8930>] proc_flush_task+0xa0/0x290
      00:31:22: [<ffffffff810751b8>] release_task+0x48/0x4b0
      00:31:22: [<ffffffff81075fb6>] wait_consider_task+0x7e6/0xb20
      00:31:22: [<ffffffff810763e6>] do_wait+0xf6/0x240
      00:31:22: [<ffffffff810765d3>] sys_wait4+0xa3/0x100
      00:31:22: [<ffffffff81074b70>] ? child_wait_callback+0x0/0x70
      00:31:22: [<ffffffff810e1e4e>] ? __audit_syscall_exit+0x25e/0x290
      00:31:22: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      00:31:22:Code: 48 03 05 48 6d a6 00 48 8b 18 8b 45 bc 48 85 db 48 89 45 c0 75 11 eb 74 0f 1f 80 00 00 00 00 48 8b 1b 48 85 db 74 65 4c 8d 63 e8 <45> 39 74 24 30 75 ed 4d 39 6c 24 28 75 e6 4d 8d 7c 24 08 4c 89 
      00:31:22:RIP  [<ffffffff811a374c>] __d_lookup+0x8c/0x150
      00:31:22: RSP <ffff880071a31c88>
      00:31:22:CR2: fffffffd00000018
      

      Attachments

        Issue Links

          Activity

            [LU-4712] racer test_1: oops at __d_lookup+0x8c
            pjones Peter Jones added a comment -

            As per Di this is no longer happening on current master

            pjones Peter Jones added a comment - As per Di this is no longer happening on current master
            pjones Peter Jones added a comment -

            ok

            pjones Peter Jones added a comment - ok
            laisiyao Lai Siyao added a comment -

            Peter, http://review.whamcloud.com/9689/ doesn't fully fix this issue, as is noted by Andreas in above comment. So this shouldn't be marked resolved.

            laisiyao Lai Siyao added a comment - Peter, http://review.whamcloud.com/9689/ doesn't fully fix this issue, as is noted by Andreas in above comment. So this shouldn't be marked resolved.
            pjones Peter Jones added a comment -

            Landed for 2.7

            pjones Peter Jones added a comment - Landed for 2.7

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/9689/
            Subject: LU-4712 llite: lock the inode to be migrated
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 0fab7dc89f4756538f8b67e7736abd6f225abae8

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/9689/ Subject: LU-4712 llite: lock the inode to be migrated Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0fab7dc89f4756538f8b67e7736abd6f225abae8

            Note in advance that the landing of http://review.whamcloud.com/9689 is not necessarily expected to fix this bug, so the bug should not be closed when it lands.

            adilger Andreas Dilger added a comment - Note in advance that the landing of http://review.whamcloud.com/9689 is not necessarily expected to fix this bug, so the bug should not be closed when it lands.
            jhammond John Hammond added a comment -

            I see this and similar oopses running racer with MDSCOUNT=4, 5% RPC drop, and migration disabled (commented out from racer/racer.sh).

            diff --git a/lustre/tests/racer/file_create.sh b/lustre/tests/racer/file_create.sh
            index 828e69c..8f4830f 100755
            --- a/lustre/tests/racer/file_create.sh
            +++ b/lustre/tests/racer/file_create.sh
            @@ -8,8 +8,8 @@ OSTCOUNT=${OSTCOUNT:-$(lfs df $DIR 2> /dev/null | grep -c OST)}
             
             while /bin/true ; do 
                    file=$((RANDOM % MAX))
            -       SIZE=$((RANDOM * MAX_MB / 32))
            -       echo "file_create: FILE=$DIR/$file SIZE=$SIZE"
            +       SIZE=$((RANDOM % 4))
            +
                    [ $OSTCOUNT -gt 0 ] &&
                            lfs setstripe -c $((RANDOM % OSTCOUNT)) $DIR/$file 2> /dev/null
                    dd if=/dev/zero of=$DIR/$file bs=1k count=$SIZE 2> /dev/null
            diff --git a/lustre/tests/racer/racer.sh b/lustre/tests/racer/racer.sh
            index 6ba8b7c..65528cb 100755
            --- a/lustre/tests/racer/racer.sh
            +++ b/lustre/tests/racer/racer.sh
            @@ -16,7 +16,7 @@ RACER_PROGS="file_create dir_create file_rm file_rename file_link file_symlink \
             file_list file_concat file_exec"
             
             if [ $MDSCOUNT -gt 1 ]; then
            -       RACER_PROGS="${RACER_PROGS} dir_remote dir_migrate"
            +    RACER_PROGS="${RACER_PROGS} dir_remote" # dir_migrate
             fi
             
             racer_cleanup()
            
            --
            # export MDSCOUNT=4
            # export MOUNT_2=y
            # llmount.sh
            ...
            # lctl set_param fail_loc=0x08000505
            # lctl set_param fail_val=20
            # sh lustre/tests/racer.sh
            
            jhammond John Hammond added a comment - I see this and similar oopses running racer with MDSCOUNT=4, 5% RPC drop, and migration disabled (commented out from racer/racer.sh). diff --git a/lustre/tests/racer/file_create.sh b/lustre/tests/racer/file_create.sh index 828e69c..8f4830f 100755 --- a/lustre/tests/racer/file_create.sh +++ b/lustre/tests/racer/file_create.sh @@ -8,8 +8,8 @@ OSTCOUNT=${OSTCOUNT:-$(lfs df $DIR 2> /dev/null | grep -c OST)} while /bin/true ; do file=$((RANDOM % MAX)) - SIZE=$((RANDOM * MAX_MB / 32)) - echo "file_create: FILE=$DIR/$file SIZE=$SIZE" + SIZE=$((RANDOM % 4)) + [ $OSTCOUNT -gt 0 ] && lfs setstripe -c $((RANDOM % OSTCOUNT)) $DIR/$file 2> /dev/null dd if=/dev/zero of=$DIR/$file bs=1k count=$SIZE 2> /dev/null diff --git a/lustre/tests/racer/racer.sh b/lustre/tests/racer/racer.sh index 6ba8b7c..65528cb 100755 --- a/lustre/tests/racer/racer.sh +++ b/lustre/tests/racer/racer.sh @@ -16,7 +16,7 @@ RACER_PROGS="file_create dir_create file_rm file_rename file_link file_symlink \ file_list file_concat file_exec" if [ $MDSCOUNT -gt 1 ]; then - RACER_PROGS="${RACER_PROGS} dir_remote dir_migrate" + RACER_PROGS="${RACER_PROGS} dir_remote" # dir_migrate fi racer_cleanup() -- # export MDSCOUNT=4 # export MOUNT_2=y # llmount.sh ... # lctl set_param fail_loc=0x08000505 # lctl set_param fail_val=20 # sh lustre/tests/racer.sh
            di.wang Di Wang added a comment - http://review.whamcloud.com/9689
            di.wang Di Wang added a comment -

            I suspect the panic has been fixed with later landing, anyway, I did not see the panic in my local run. But these console error message needs to be turned off. I will provide a patch.

            di.wang Di Wang added a comment - I suspect the panic has been fixed with later landing, anyway, I did not see the panic in my local run. But these console error message needs to be turned off. I will provide a patch.

            Looks like a bug with racer and striped directories. There are quite a number of scary looking errors that are not fatal, but imply some problems with DNE2 striped directories:

            lmv_revalidate_slaves()) lustre-clilmv-ffff880037cd8400: nlink 0 < 2 corrupt stripe 0 [0x3c0000402:0xf4:0x0]:[0x3c0000402:0xf4:0x0]
            

            It also looks like the client was evicted from the MDT, which shouldn't happen even during racer. That would imply some kind of deadlock or bug in the code.

            Some error messages could just be turned off I think, since they will be returned to the application anyway, and printing them on the console just hides more important stuff:

            ll_dir_setstripe()) mdc_setattr fails: rc = -22
            
            adilger Andreas Dilger added a comment - Looks like a bug with racer and striped directories. There are quite a number of scary looking errors that are not fatal, but imply some problems with DNE2 striped directories: lmv_revalidate_slaves()) lustre-clilmv-ffff880037cd8400: nlink 0 < 2 corrupt stripe 0 [0x3c0000402:0xf4:0x0]:[0x3c0000402:0xf4:0x0] It also looks like the client was evicted from the MDT, which shouldn't happen even during racer. That would imply some kind of deadlock or bug in the code. Some error messages could just be turned off I think, since they will be returned to the application anyway, and printing them on the console just hides more important stuff: ll_dir_setstripe()) mdc_setattr fails: rc = -22

            People

              di.wang Di Wang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: