[LU-2270] failure in replay-dual.sh test_0b: class_newdev() Device lustre-MDT0000-mdc already exists at 3, won't add: rc = -17 Created: 03/Nov/12  Updated: 27/Nov/12  Resolved: 27/Nov/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Alex Zhuravlev
Resolution: Duplicate Votes: 0
Labels: None
Environment:

lustre master build #1011 SLES11 SP2 client


Issue Links:
Duplicate
duplicates LU-2275 Open request leak Resolved
Severity: 3
Rank (Obsolete): 5429

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/47f07908-253d-11e2-9e7c-52540035b04c.

The sub-test test_0b failed with the following error:

mount2 fails

client 2 dmesg shows

[ 9000.375340] Lustre: Mounted lustre-client
[ 9000.393643] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre2
[ 9000.399956] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,acl,flock client-26vm7@tcp:/lustre /mnt/lustre2
[ 9000.414481] LustreError: 18958:0:(genops.c:316:class_newdev()) Device lustre-MDT0000-mdc-ffff880079c03400 already exists at 3, won't add
[ 9000.414489] LustreError: 18958:0:(obd_config.c:374:class_attach()) Cannot create device lustre-MDT0000-mdc-ffff880079c03400 of type mdc : -17
[ 9000.414500] LustreError: 18958:0:(obd_config.c:1546:class_config_llog_handler()) MGC10.10.4.154@tcp: cfg command failed: rc = -17
[ 9000.414511] Lustre:    cmd=cf001 0:lustre-MDT0000-mdc  1:mdc  2:lustre-clilmv_UUID  
[ 9000.414581] LustreError: 15c-8: MGC10.10.4.154@tcp: The configuration from log 'lustre-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
[ 9000.414588] LustreError: 18955:0:(llite_lib.c:1012:ll_fill_super()) Unable to process log: -17
[ 9000.415048] LustreError: 18955:0:(obd_mount.c:2987:lustre_fill_super()) Unable to mount  (-17)
[ 9000.822327] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-dual test_0b: @@@@@@ FAIL: mount2 fais 
[ 9000.877904] Lustre: DEBUG MARKER: replay-dual test_0b: @@@@@@ FAIL: mount2 fais
[ 9001.019498] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2012-11-01/lustre-master-el6-x86_64-sles11sp2-x86_64__1011__-69983126486880-163552/replay-dual.test_0b.debug_log.$(hostname -s).1351848424.log;
[ 9001.019502]          dmesg > /logdir/test_logs/2012-11-01/lustre-master-el6-x86_64-sl


 Comments   
Comment by Sarah Liu [ 05/Nov/12 ]

another failure:
https://maloo.whamcloud.com/test_sets/36d646d4-2704-11e2-b04c-52540035b04c

[ 3940.919180] Lustre: Unmounted lustre-client
[ 3958.686052] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre
[ 3958.693235] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,acl,flock fat-intel-3vm7@tcp:/lustre /mnt/lustre
[ 3958.702476] LustreError: 152-6: Ignoring deprecated mount option 'acl'.
[ 3958.703435] Lustre: MGC10.10.4.92@tcp: Reactivating import
[ 3958.704833] LustreError: 28998:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc log: -2
[ 3958.708656] LustreError: 29002:0:(genops.c:316:class_newdev()) Device lustre-MDT0000-mdc-ffff8800699d0000 already exists at 13, won't add
[ 3958.708661] LustreError: 29002:0:(obd_config.c:374:class_attach()) Cannot create device lustre-MDT0000-mdc-ffff8800699d0000 of type mdc : -17
[ 3958.708664] LustreError: 29002:0:(obd_config.c:1546:class_config_llog_handler()) MGC10.10.4.92@tcp: cfg command failed: rc = -17
[ 3958.708669] Lustre:    cmd=cf001 0:lustre-MDT0000-mdc  1:mdc  2:lustre-clilmv_UUID  
[ 3958.708692] LustreError: 15c-8: MGC10.10.4.92@tcp: The configuration from log 'lustre-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
[ 3958.708696] LustreError: 28998:0:(llite_lib.c:1012:ll_fill_super()) Unable to process log: -17
[ 3958.708803] Lustre: Unmounted lustre-client
Comment by Jodi Levi (Inactive) [ 06/Nov/12 ]

Alex,
Can you please have a look at this one and determine if it should be a blocker?
Thank you!

Comment by Sarah Liu [ 07/Nov/12 ]

Failure found in replay-vbr. both server and client are running RHEL6

https://maloo.whamcloud.com/test_sets/421702b0-2838-11e2-9c12-52540035b04c

Client 1 dmesg shows:

Lustre: DEBUG MARKER: == replay-vbr test 7c: create, {lost}, rename == 15:57:46 (1352159866)
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre2
Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,acl,flock fat-intel-3vm3@tcp:/lustre /mnt/lustre2
LustreError: 22925:0:(genops.c:316:class_newdev()) Device lustre-MDT0000-mdc-ffff88007d0fe400 already exists at 3, won't add
LustreError: 22925:0:(obd_config.c:374:class_attach()) Cannot create device lustre-MDT0000-mdc-ffff88007d0fe400 of type mdc : -17
LustreError: 22925:0:(obd_config.c:1546:class_config_llog_handler()) MGC10.10.4.88@tcp: cfg command failed: rc = -17
Lustre:    cmd=cf001 0:lustre-MDT0000-mdc  1:mdc  2:lustre-clilmv_UUID  
LustreError: 15c-8: MGC10.10.4.88@tcp: The configuration from log 'lustre-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 22922:0:(llite_lib.c:1012:ll_fill_super()) Unable to process log: -17
LustreError: 22922:0:(obd_mount.c:2987:lustre_fill_super()) Unable to mount  (-17)
Lustre: DEBUG MARKER: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname)
Lustre: DEBUG MARKER: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi
Lustre: DEBUG MARKER: rm /mnt/lustre2/d0.replay-vbr/d7/f.replay-vbr.7c-0; createmany -o /mnt/lustre2/d0.replay-vbr/d7/f.replay-vbr.7c- 1
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-vbr test_7c: @@@@@@ FAIL: test_7c.1: Cannot do \'lost\' operations 
Lustre: DEBUG MARKER: replay-vbr test_7c: @@@@@@ FAIL: test_7c.1: Cannot do 'lost' operations
Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2012-11-05/lustre-master-el6-x86_64__1018__-69983081643020-022304/replay-vbr.test_7c.debug_log.$(hostname -s).1352159868.log;
         dmesg > /logdir/test_logs/2012-11-05/lustre-master-el6-x86_64__1018__-699830816430
Comment by Andreas Dilger [ 20/Nov/12 ]

Alex, can you please take a quick look at this to figure out if it should remain a 2.4.0 blocker? It is currently on the priority list to fix before we open master for landing again.

Comment by Alex Zhuravlev [ 20/Nov/12 ]

not absolutely sure, but probably it's mountpoint being pinned by a leaked request: http://jira.whamcloud.com/browse/LU-2275

I wouldn't make this a blocker.

Comment by Andreas Dilger [ 27/Nov/12 ]

The most recent report of this bug causing a failure in Maloo is 2012-11-06:

https://maloo.whamcloud.com/test_sets/421702b0-2838-11e2-9c12-52540035b04c

and only three reports of the failure in Maloo at all.

so it probably does not need to be a blocker given it hasn't hit in some time already (unless it is not being triaged to this bug anymore).

Comment by Andreas Dilger [ 27/Nov/12 ]

Linking to LU-2275 per Alex's diagnosis.

Comment by Andreas Dilger [ 27/Nov/12 ]

Marking as a duplicate of LU-2275

Generated at Sat Feb 10 01:23:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.