|
I believe that this is being worked under LU-661
|
|
I can confirm that the patch from LU-661 fixes the issue. You can mark this bug as a duplicate of LU-661
|
|
Thanks James
|
|
Sorry I was mistaken. The test fails even with the fix from LU-661. The error I'm getting is
Lustre: DEBUG MARKER: == replay-dual test 0b: lost client during waiting for next transno ================================== 09:33:44 (1323959624)
Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
LustreError: 24437:0:(ldlm_request.c:1173:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
LustreError: 24437:0:(ldlm_request.c:1800:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
Lustre: client ffff810161506c00 umount complete
Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request
Lustre: Skipped 25 previous similar messages
LustreError: 24560:0:(ldlm_request.c:1173:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
LustreError: 24560:0:(ldlm_request.c:1800:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
Lustre: client ffff8101ad6f0000 umount complete
LustreError: 24570:0:(genops.c:311:class_newdev()) Device MGC10.37.248.56@o2ib1 already exists at 0, won't add
LustreError: 24570:0:(obd_config.c:327:class_attach()) Cannot create device MGC10.37.248.56@o2ib1 of type mgc : -17
LustreError: 24570:0:(obd_mount.c:512:lustre_start_simple()) MGC10.37.248.56@o2ib1 attach error -17
LustreError: 24570:0:(obd_mount.c:2306:lustre_fill_super()) Unable to mount (-17)
Lustre: DEBUG MARKER: replay-dual test_0b: @@@@@@ FAIL: mount1 fais
|
|
Reopening
|
|
Bobijam
Could you please look at this test failure from ORNL?
Thanks
Peter
|
|
James,
Can you please uploading the debug logs of the nodes, esp. of the client?
|
|
Turned on full debug info. The logs are at /uploads/LU-639/replay-dual-1324327350.tar.bz2
|
|
Got the failure reason: the time when obd_zombie_exports finishes the last mgc export cleanup is later than the next test mount start.
00000020:01000000:3.0:1324327346.171837:0:30269:0:(obd_config.c:1517:class_manual_cleanup()) Manual cleanup of MGC10.37.248.56@o2ib1 (flags='') ====> start to cleanup mgc obd
...
00000020:00000001:3.0:1324327346.172024:0:30269:0:(obd_config.c:531:class_detach()) Process entered
00000020:00000080:3.0:1324327346.172025:0:30269:0:(obd_config.c:548:class_detach()) detach on obd MGC10.37.248.56@o2ib1 (uuid 7ee6c547-9a41-e9ad-d0f7-addf6fefe81a)
00000020:00000040:3.0:1324327346.172027:0:30269:0:(obd_config.c:670:class_decref()) Decref MGC10.37.248.56@o2ib1 (ffff810170f32038) now 1
====> will unlink self export and add to zombie export list
...
00000020:00000001:5.0:1324327346.172051:0:22196:0:(obd_class.h:1173:obd_destroy_export()) Process leaving (rc=0 : 0 : 0)
00000020:00000040:5.0:1324327346.172053:0:22196:0:(obd_config.c:670:class_decref()) Decref MGC10.37.248.56@o2ib1 (ffff810170f32038) now 0 ====> zombie start to destroy mgc obd
...
00000020:01000000:5.0:1324327346.172055:0:22196:0:(obd_config.c:687:class_decref()) finishing cleanup of obd MGC10.37.248.56@o2ib1 (7ee6c547-9a41-e9ad-d0f7-addf6fefe81a)
...
10000000:00000001:5.0:1324327346.172059:0:22196:0:(mgc_request.c:737:mgc_cleanup()) Process entered ====> part of destroy (obd_cleanup)
...
00000020:01000004:7.0:1324327346.188201:0:30279:0:(obd_mount.c:508:lustre_start_simple()) Starting obd MGC10.37.248.56@o2ib1 (typ=mgc) ====> next test mount start
...
00000020:00000080:7.0:1324327346.188231:0:30279:0:(obd_config.c:319:class_attach()) attach type mgc name: MGC10.37.248.56@o2ib1 uuid: 3a7a7701-cac0-3abf-0fde-bf7368296a73
00000020:00000001:7.0:1324327346.188233:0:30279:0:(genops.c:284:class_newdev()) Process entered ===> try to create mgc obd device
...
00000020:00020000:7.0:1324327346.188242:0:30279:0:(genops.c:311:class_newdev()) Device MGC10.37.248.56@o2ib1 already exists at 0, won't add ===> old mgc obd device still there
...
00000020:00020000:7.0:1324327346.188257:0:30279:0:(obd_config.c:327:class_attach()) Cannot create device MGC10.37.248.56@o2ib1 of type mgc : -17 ===> error EEXIST
...
10000000:00000001:5.0:1324327346.190745:0:22196:0:(mgc_request.c:751:mgc_cleanup()) Process leaving (rc=0 : 0 : 0)
00000020:00000001:5.0:1324327346.190747:0:22196:0:(obd_class.h:641:obd_cleanup()) Process leaving (rc=0 : 0 : 0)
00000020:00000040:5.0:1324327346.190750:0:22196:0:(genops.c:365:class_release_dev()) Release obd device MGC10.37.248.56@o2ib1 at 0 obd_type name =mgc ===> finished cleanup, remove mgc obd device
|
|
patch tracking at http://review.whamcloud.com/1896
|
|
It passed the test. Thank you.
|
|
Integrated in lustre-master » x86_64,server,el5,ofa #404
LU-639 obdclass: wait obd cleanup before mount (Revision 11d258632de7255ae8282b0615a19bd5c9cf707a)
Result = SUCCESS
Oleg Drokin : 11d258632de7255ae8282b0615a19bd5c9cf707a
Files :
- lustre/obdclass/obd_mount.c
|
|
Integrated in lustre-master » x86_64,client,el6,inkernel #404
LU-639 obdclass: wait obd cleanup before mount (Revision 11d258632de7255ae8282b0615a19bd5c9cf707a)
Result = SUCCESS
Oleg Drokin : 11d258632de7255ae8282b0615a19bd5c9cf707a
Files :
- lustre/obdclass/obd_mount.c
|
|
Integrated in lustre-master » x86_64,client,el5,inkernel #404
LU-639 obdclass: wait obd cleanup before mount (Revision 11d258632de7255ae8282b0615a19bd5c9cf707a)
Result = SUCCESS
Oleg Drokin : 11d258632de7255ae8282b0615a19bd5c9cf707a
Files :
- lustre/obdclass/obd_mount.c
|
|
Integrated in lustre-master » x86_64,client,sles11,inkernel #404
LU-639 obdclass: wait obd cleanup before mount (Revision 11d258632de7255ae8282b0615a19bd5c9cf707a)
Result = SUCCESS
Oleg Drokin : 11d258632de7255ae8282b0615a19bd5c9cf707a
Files :
- lustre/obdclass/obd_mount.c
|
|
Integrated in lustre-master » i686,server,el6,inkernel #404
LU-639 obdclass: wait obd cleanup before mount (Revision 11d258632de7255ae8282b0615a19bd5c9cf707a)
Result = SUCCESS
Oleg Drokin : 11d258632de7255ae8282b0615a19bd5c9cf707a
Files :
- lustre/obdclass/obd_mount.c
|
|
Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #404
LU-639 obdclass: wait obd cleanup before mount (Revision 11d258632de7255ae8282b0615a19bd5c9cf707a)
Result = SUCCESS
Oleg Drokin : 11d258632de7255ae8282b0615a19bd5c9cf707a
Files :
- lustre/obdclass/obd_mount.c
|
|
Integrated in lustre-master » x86_64,client,el5,ofa #404
LU-639 obdclass: wait obd cleanup before mount (Revision 11d258632de7255ae8282b0615a19bd5c9cf707a)
Result = SUCCESS
Oleg Drokin : 11d258632de7255ae8282b0615a19bd5c9cf707a
Files :
- lustre/obdclass/obd_mount.c
|
|
Integrated in lustre-master » x86_64,server,el5,inkernel #404
LU-639 obdclass: wait obd cleanup before mount (Revision 11d258632de7255ae8282b0615a19bd5c9cf707a)
Result = SUCCESS
Oleg Drokin : 11d258632de7255ae8282b0615a19bd5c9cf707a
Files :
- lustre/obdclass/obd_mount.c
|
|
Landed for 2.2
|
|
Integrated in lustre-master » x86_64,server,el6,inkernel #404
LU-639 obdclass: wait obd cleanup before mount (Revision 11d258632de7255ae8282b0615a19bd5c9cf707a)
Result = SUCCESS
Oleg Drokin : 11d258632de7255ae8282b0615a19bd5c9cf707a
Files :
- lustre/obdclass/obd_mount.c
|
|
Integrated in lustre-master » i686,client,el6,inkernel #404
LU-639 obdclass: wait obd cleanup before mount (Revision 11d258632de7255ae8282b0615a19bd5c9cf707a)
Result = SUCCESS
Oleg Drokin : 11d258632de7255ae8282b0615a19bd5c9cf707a
Files :
- lustre/obdclass/obd_mount.c
|
|
Integrated in lustre-master » i686,server,el5,ofa #404
LU-639 obdclass: wait obd cleanup before mount (Revision 11d258632de7255ae8282b0615a19bd5c9cf707a)
Result = SUCCESS
Oleg Drokin : 11d258632de7255ae8282b0615a19bd5c9cf707a
Files :
- lustre/obdclass/obd_mount.c
|
|
Integrated in lustre-master » i686,server,el5,inkernel #404
LU-639 obdclass: wait obd cleanup before mount (Revision 11d258632de7255ae8282b0615a19bd5c9cf707a)
Result = SUCCESS
Oleg Drokin : 11d258632de7255ae8282b0615a19bd5c9cf707a
Files :
- lustre/obdclass/obd_mount.c
|
|
Integrated in lustre-master » i686,client,el5,inkernel #404
LU-639 obdclass: wait obd cleanup before mount (Revision 11d258632de7255ae8282b0615a19bd5c9cf707a)
Result = SUCCESS
Oleg Drokin : 11d258632de7255ae8282b0615a19bd5c9cf707a
Files :
- lustre/obdclass/obd_mount.c
|
|
Integrated in lustre-master » i686,client,el5,ofa #404
LU-639 obdclass: wait obd cleanup before mount (Revision 11d258632de7255ae8282b0615a19bd5c9cf707a)
Result = SUCCESS
Oleg Drokin : 11d258632de7255ae8282b0615a19bd5c9cf707a
Files :
- lustre/obdclass/obd_mount.c
|
|
In fact that is fix invalid.
you just wait until exports kill finished in mount, but that is don't fix an issue when obdfilter-survey failed to start due same race aka class setup vs class release.
that is may be solved by adding zombi barrier to cleanup export phase but looks we need more generic fix instead.
|
Generated at Sat Feb 10 01:08:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.