[LU-768] Hyperion - recovery-double-scale fails Created: 17/Oct/11  Updated: 11/Mar/15  Resolved: 11/Mar/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.7
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Cliff White (Inactive) Assignee: Minh Diep
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

Hypeiron, RHEL5/x86_64


Severity: 3
Rank (Obsolete): 10377

 Description   

recovery-double-scale fails, detailed results in maloo
Error reported:
MDS

Lustre: DEBUG MARKER: Failing type2=clients item2=hyperion321,hyperion421 ...
Lustre: 2861:0:(quota_master.c:1718:mds_quota_recovery()) Only 4/8 OSTs are active, abort quota recovery
Lustre: lustre-MDT0000: Recovery period over after 0:24, of 126 clients 126 recovered and 0 were evicted.
Lustre: lustre-MDT0000: sending delayed replies to recovered clients
Lustre: MDS lustre-MDT0000: lustre-OST0005_UUID now active, resetting orphans
LustreError: 2910:0:(mds_open.c:1645:mds_close()) @@@ no handle for file close ino 122683904: cookie 0x3b989baac2130b2c req@ffff810f55c08c50 x1382661482849280/t0 o35->34523172-a256-a14a-a765-717695be2aa1@NET_0x50000c0a8723c_UUID:0/0 lens 408/864 e 0 to 0 dl 1318881574 ref 1 fl Interpret:/2/0 rc 0/0
LustreError: 2910:0:(mds_open.c:1645:mds_close()) Skipped 9 previous similar messages
Lustre: DEBUG MARKER: Mon Oct 17 12:59:31 2011
Client

Lustre: setting import lustre-OST0000_UUID INACTIVE by administrator request
LustreError: 25042:0:(ldlm_resource.c:519:ldlm_namespace_cleanup()) Namespace lustre-OST0000-osc-ffff81022efde800 resource refcount nonzero (2) after lock cleanup; forcing cleanup.
LustreError: 25042:0:(ldlm_resource.c:524:ldlm_namespace_cleanup()) Resource: ffff8101f237ae40 (162106/0/0/0) (rc: 2)
Lustre: Mount still busy with 5 refs! You may try to umount it a bit later
Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request
Lustre: Skipped 7 previous similar messages
LustreError: 25042:0:(ldlm_resource.c:519:ldlm_namespace_cleanup()) Namespace lustre-OST0000-osc-ffff81022efde800 resource refcount nonzero (2) after lock cleanup; forcing cleanup.
LustreError: 25042:0:(ldlm_resource.c:524:ldlm_namespace_cleanup()) Resource: ffff8101f237ae40 (162106/0/0/0) (rc: 1)
LustreError: 25042:0:(ldlm_request.c:1039:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
LustreError: 25042:0:(ldlm_request.c:1597:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
Lustre: client ffff81022efde800 umount complete
Lustre: DEBUG MARKER: Mon Oct 17 12:59:31 2011



 Comments   
Comment by Peter Jones [ 18/Oct/11 ]

Oleg

Could you please look into this one?

Thanks

Peter

Comment by Peter Jones [ 18/Oct/11 ]

Oleg tells me that this is suspected to be a test script issue and Minh is looking into it

Comment by Shuichi Ihara (Inactive) [ 26/Oct/11 ]

we are seeing similar issue at the site. can you advise me to avoid this isuse, please?

Comment by Shuichi Ihara (Inactive) [ 23/Nov/11 ]

attached is MDS's /var/log/messages at the customer site.

Comment by Minh Diep [ 12/Dec/11 ]

I ran without issue https://maloo.whamcloud.com/test_sessions/b00e4a06-22d7-11e1-aee6-5254004bbbd3

Please show me the setup, config, and test error

Comment by Minh Diep [ 24/Jan/12 ]

Reduced priority since we haven't reproduced this. Please send me complete log when you do

Generated at Sat Feb 10 01:10:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.