Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.3.0
-
None
-
3
-
5218
Description
This issue was created by maloo for yujian <yujian@whamcloud.com>
This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/44cf5b04-16a1-11e2-afe1-52540035b04c.
Lustre Tag: v2_3_0_RC3
Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/36
Distro/Arch: RHEL6.3/x86_64(server), FC15/x86_64(client)
Network: TCP
ENABLE_QUOTA=yes
After running parallel-scale-nfsv4 test_iorfpp, unmounting OST hung, and the following error showed up in the OSS console log:
Lustre: DEBUG MARKER: == parallel-scale-nfsv4 parallel-scale-nfs.sh test complete, duration 2732 sec ======================= 00:45:35 (1350287135) LustreError: 138-a: lustre-OST0001: A client on nid 10.10.4.133@tcp was evicted due to a lock blocking callback time out: rc -107 LustreError: 138-a: lustre-OST0004: A client on nid 10.10.4.133@tcp was evicted due to a lock blocking callback time out: rc -107 LustreError: 12155:0:(client.c:1110:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff88041c891000 x1415872606374127/t0(0) o104->lustre-OST0005@10.10.4.133@tcp:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:N/0/ffffffff rc 0/-1 LustreError: 12155:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.10.4.133@tcp) returned 0 from blocking AST ns: filter-lustre-OST0005_UUID lock: ffff88011b8e9280/0x952f6899ef78e799 lrc: 4/0,0 mode: PW/PW res: 18768/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x10020 nid: 10.10.4.133@tcp remote: 0xaee7ce80b9fe38c0 expref: 5 pid: 8553 timeout 4301494110 LustreError: Skipped 12 previous similar messages BUG: scheduling while atomic: ldlm_elt/8548/0x00000200 Modules linked in: nfs fscache obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) jbd2 lustre(U) lquota(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa mlx4_ib ib_mad ib_core mlx4_en mlx4_core igb dca microcode serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core sg shpchp ext3 jbd mbcache sd_mod crc_t10dif ata_generic pata_acpi pata_atiixp ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 8548, comm: ldlm_elt Not tainted 2.6.32-279.5.1.el6_lustre.g8932d49.x86_64 #1 Call Trace: [<ffffffff81057df6>] ? __schedule_bug+0x66/0x70 [<ffffffff814fe298>] ? thread_return+0x646/0x76e [<ffffffff8127a43c>] ? put_dec+0x10c/0x110 [<ffffffff811ad70e>] ? bh_lru_install+0x16e/0x1a0 [<ffffffff810923be>] ? prepare_to_wait+0x4e/0x80 [<ffffffffa0b721cd>] ? do_get_write_access+0x29d/0x520 [jbd2] [<ffffffff81092110>] ? wake_bit_function+0x0/0x50 [<ffffffffa0b725a1>] ? jbd2_journal_get_write_access+0x31/0x50 [jbd2] [<ffffffffa0b98478>] ? __ldiskfs_journal_get_write_access+0x38/0x80 [ldiskfs] [<ffffffffa0ba4473>] ? ldiskfs_reserve_inode_write+0x73/0xa0 [ldiskfs] [<ffffffffa0baa590>] ? ldiskfs_delete_inode+0x0/0x250 [ldiskfs] [<ffffffffa0ba44ec>] ? ldiskfs_mark_inode_dirty+0x4c/0x1f0 [ldiskfs] [<ffffffffa0b714f0>] ? jbd2_journal_start+0xd0/0x110 [jbd2] [<ffffffffa0baa590>] ? ldiskfs_delete_inode+0x0/0x250 [ldiskfs] [<ffffffffa0baa64c>] ? ldiskfs_delete_inode+0xbc/0x250 [ldiskfs] [<ffffffff8119631e>] ? generic_delete_inode+0xde/0x1d0 [<ffffffff81196475>] ? generic_drop_inode+0x65/0x80 [<ffffffff811952c2>] ? iput+0x62/0x70 [<ffffffffa0cb1bb2>] ? filter_lvbo_free+0x22/0x100 [obdfilter] [<ffffffffa0621a38>] ? ldlm_resource_putref+0x128/0x2a0 [ptlrpc] [<ffffffffa061a168>] ? ldlm_lock_put+0xc8/0x530 [ptlrpc] [<ffffffffa0640200>] ? expired_lock_main+0x660/0x840 [ptlrpc] [<ffffffff81060250>] ? default_wake_function+0x0/0x20 [<ffffffffa063fba0>] ? expired_lock_main+0x0/0x840 [ptlrpc] [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [<ffffffffa063fba0>] ? expired_lock_main+0x0/0x840 [ptlrpc] [<ffffffffa063fba0>] ? expired_lock_main+0x0/0x840 [ptlrpc] [<ffffffff8100c140>] ? child_rip+0x0/0x20 Lustre: 2532:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1350287142/real 1350287142] req@ffff88021cff4000 x1415872606374144/t0(0) o400->MGC10.10.4.133@tcp@10.10.4.133@tcp:26/25 lens 224/224 e 0 to 1 dl 1350287149 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 LustreError: 166-1: MGC10.10.4.133@tcp: Connection to MGS (at 10.10.4.133@tcp) was lost; in progress operations using this service will fail LustreError: 8550:0:(client.c:1125:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff880419f6a400 x1415872606374147/t0(0) o101->MGC10.10.4.133@tcp@10.10.4.133@tcp:26/25 lens 328/384 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1 LustreError: 8550:0:(client.c:1125:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff880419f6a400 x1415872606374148/t0(0) o101->MGC10.10.4.133@tcp@10.10.4.133@tcp:26/25 lens 328/384 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1 Lustre: OST lustre-OST0000 has stopped. Lustre: Skipped 1 previous similar message Lustre: server umount lustre-OST0000 complete Lustre: 2521:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1350287149/real 1350287149] req@ffff88021ba68400 x1415872606374146/t0(0) o250->MGC10.10.4.133@tcp@10.10.4.133@tcp:26/25 lens 400/544 e 0 to 1 dl 1350287155 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 LustreError: 8550:0:(client.c:1125:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff88041d1b7000 x1415872606374153/t0(0) o101->MGC10.10.4.133@tcp@10.10.4.133@tcp:26/25 lens 328/384 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1 LustreError: 8550:0:(client.c:1125:ptlrpc_import_delay_req()) Skipped 4 previous similar messages Lustre: OST lustre-OST0001 has stopped. Lustre: server umount lustre-OST0001 complete Lustre: OST lustre-OST0002 has stopped. Lustre: server umount lustre-OST0002 complete Lustre: OST lustre-OST0003 has stopped. Lustre: 2521:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1350287159/real 1350287159] req@ffff88021c977800 x1415872606374159/t0(0) o250->MGC10.10.4.133@tcp@10.10.4.133@tcp:26/25 lens 400/544 e 0 to 1 dl 1350287170 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: server umount lustre-OST0004 complete Lustre: Skipped 1 previous similar message LustreError: 2484:0:(obd_mount.c:257:server_put_mount()) lustre-OST0005: mount busy, vfscount=15! Lustre: OST lustre-OST0005 has stopped. Lustre: Skipped 1 previous similar message Lustre: 2521:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1350287174/real 1350287174] req@ffff88021cfef800 x1415872606374160/t0(0) o250->MGC10.10.4.133@tcp@10.10.4.133@tcp:26/25 lens 400/544 e 0 to 1 dl 1350287190 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: Mount still busy with 15 refs after 30 secs.