[LU-3948] Unable to load lustre module when the filesystem is running. Created: 13/Sep/13  Updated: 24/Jul/18  Resolved: 24/Jul/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.1
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Shuichi Ihara (Inactive) Assignee: Bob Glossman (Inactive)
Resolution: Cannot Reproduce Votes: 1
Labels: None
Environment:

CentOS 6.4


Severity: 3
Rank (Obsolete): 10476

 Description   

If a node is running as a lustre server, with either MGS, MDT or OSTs locally then you cannot load the lustre module. This in turn prevents the Lustre client from mounting on a server.

This appears to be a module loading order issue, if I load the Lustre module before mounting the targets then everything is fine.

  1. mount -t lustre
    /dev/mapper/vg_mgs-mgs on /lustre/mgs type lustre (rw)
    /dev/mapper/vg_mdt3_home-mdt3 on /lustre/home/mdt3 type lustre (rw)
    /dev/mapper/vg_v-mdt on /lustre/v/mdt type lustre (rw)
    /dev/mapper/vg_mdt4_home-mdt4 on /lustre/home/mdt4 type lustre (rw)
    /dev/mapper/vg_mdt5_home-mdt5 on /lustre/home/mdt5 type lustre (rw)
    /dev/mapper/vg_mdt1_v-mdt1 on /lustre/v/mdt1 type lustre (rw)
    /dev/mapper/vg_mdt2_v-mdt2 on /lustre/v/mdt2 type lustre (rw)
  2. modprobe lustre
    WARNING: Error inserting lov (/lib/modules/2.6.32-358.18.1.el6_lustre.es46.x86_64/updates/kernel/fs/lustre/lov.ko): Operation already in progress
    FATAL: Error inserting lustre (/lib/modules/2.6.32-358.18.1.el6_lustre.es46.x86_64/updates/kernel/fs/lustre/lustre.ko): Operation already in progress

From dmesg we have the warning:

Sep 13 14:16:58 victrix-mds0 kernel: : LustreError: 18270:0:(lprocfs_status.c:489:lprocfs_register()) Lproc: Attempting to register osc more than once

The relevant modules loaded at this time are:

mdc 202485 0
osp 253656 253
lod 260436 115
mdt 638025 116
mgs 287077 1
mgc 82967 2 mgs
fsfilt_ldiskfs 15114 7
osd_ldiskfs 417738 230
lquota 339165 558 mdt,osd_ldiskfs
mdd 357047 117 mdt,osd_ldiskfs
fid 70842 121 mdc,osp,lod,mdt,osd_ldiskfs,mdd
fld 84982 120 lod,mdt,osd_ldiskfs,mdd,fid
ptlrpc 1556389 124 mdc,osp,lod,mdt,mgs,mgc,lquota,fid,fld
obdclass 1292111 580 mdc,osp,lod,mdt,mgs,mgc,osd_ldiskfs,lquota,mdd,fid,fld,ptlrpc
lvfs 16085 15 mdc,osp,lod,mdt,mgs,mgc,fsfilt_ldiskfs,osd_ldiskfs,lquota,mdd,fid,fld,ptlrpc,obdclass
ldiskfs 415574 9 fsfilt_ldiskfs,osd_ldiskfs
ko2iblnd 232685 0
ksocklnd 175314 1
lnet 334770 4 ptlrpc,obdclass,ko2iblnd,ksocklnd
libcfs 498217 18 mdc,osp,lod,mdt,mgs,mgc,fsfilt_ldiskfs,osd_ldiskfs,lquota,mdd,fid,fld,ptlrpc,obdclass,lvfs,ko2iblnd,ksocklnd,lnet



 Comments   
Comment by Peter Jones [ 13/Sep/13 ]

Bob

Could you please look into this one?

Thanks

Peter

Comment by Swapnil Pimpale (Inactive) [ 20/Sep/13 ]

In this case the following modules were loaded (in that order) before the mounting of the servers/targets

libcfs
ksocklnd
ko2iblnd
ldiskfs
ptlrpc
lnet

The workaround for this is loading the lustre module along with the above modules

Comment by Marek Magrys [ 18/Dec/13 ]

We are also hitting this bug in 2.4.1, it would be nice to have this one solved.

Comment by Marek Magrys [ 28/Mar/14 ]

Same in 2.5.1.

Comment by Peter Jones [ 24/Jul/18 ]

Has not been reported for many years

Generated at Sat Feb 10 01:38:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.