[LU-1428] MDT servrice threads spinning in cfs_hash_for_each_relax() Created: 21/May/12  Updated: 23/Jul/12  Resolved: 23/Jul/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.1
Fix Version/s: Lustre 2.3.0, Lustre 2.1.3

Type: Bug Priority: Major
Reporter: Ned Bass Assignee: Liang Zhen (Inactive)
Resolution: Fixed Votes: 0
Labels: libcfs, server
Environment:

https://github.com/chaos/lustre/commits/2.1.1-4chaos


Issue Links:
Duplicate
is duplicated by LU-1087 mdt thread spinning out of control Resolved
Severity: 3
Rank (Obsolete): 4587

 Description   

We have two MDT service threads using 100% CPU on a production MDS. I can't get a backtrace from crash because they do not yield the CPU, but based on oprofile they seem to be spinning in cfs_hash_for_each_relax(). At the same time we are seeing client hangs and high lock cancellation rates on the OSTs.

samples  %        image name               app name                 symbol name
4225020  33.0708  libcfs.ko                libcfs.ko                cfs_hash_for_each_relax
3345225  26.1843  libcfs.ko                libcfs.ko                cfs_hash_hh_hhead
532409    4.1674  ptlrpc.ko                ptlrpc.ko                ldlm_cancel_locks_for_export_cb
307199    2.4046  ptlrpc.ko                ptlrpc.ko                lock_res_and_lock
175349    1.3725  vmlinux                  vmlinux                  native_read_tsc
151989    1.1897  ptlrpc.ko                ptlrpc.ko                ldlm_del_waiting_lock
136679    1.0698  libcfs.ko                libcfs.ko                cfs_hash_rw_lock
109269    0.8553  jbd2.ko                  jbd2.ko                  journal_clean_one_cp_list


 Comments   
Comment by Prakash Surya (Inactive) [ 21/May/12 ]

And some more detailed information from the same oprofile data set is below. It's pretty clear that we're spending over 30% of the cpu time on this one line:

  00000000000001dc 118121958 119737591     97.3140  98.6450    linux-tracefile.c:195

Expanding that line number out we get:

libcfs/libcfs/linux/linux-tracefile.c:
--------------------------------------                                                                                                                                                                                                                                               
183 void
184 cfs_set_ptldebug_header(struct ptldebug_header *header,
185                         struct libcfs_debug_msg_data *m,
186                         unsigned long stack)
187 {
188         struct timeval tv;
189 
190         do_gettimeofday(&tv);
191 
192         header->ph_subsys = m->msg_subsys;
193         header->ph_mask = m->msg_mask;
194         header->ph_cpu_id = cfs_smp_processor_id();
195         header->ph_type = cfs_trace_buf_idx_get();
196         header->ph_sec = (__u32)tv.tv_sec;
197         header->ph_usec = tv.tv_usec;
198         header->ph_stack = stack;
199         header->ph_pid = current->pid;
200         header->ph_line_num = m->msg_line;
201         header->ph_extern_pid = 0;
202         return;
203 }

The top offenders:

vma      samples  cum. samples  %        cum. %     linenr info                 image name               app name                 symbol name
0000000000000170 121382292 121382292     38.3297  38.3297    linux-tracefile.c:187       libcfs.ko                libcfs.ko                cfs_set_ptldebug_header
  0000000000000170 13319    13319          0.0110   0.0110    linux-tracefile.c:187
  0000000000000171 3821     17140          0.0031   0.0141    linux-tracefile.c:187
  0000000000000174 12019    29159          0.0099   0.0240    linux-tracefile.c:187
  0000000000000178 1581     30740          0.0013   0.0253    linux-tracefile.c:187
  000000000000017c 12110    42850          0.0100   0.0353    linux-tracefile.c:187
  0000000000000184 5846     48696          0.0048   0.0401    linux-tracefile.c:187
  0000000000000189 23130    71826          0.0191   0.0592    linux-tracefile.c:187
  0000000000000190 3268     75094          0.0027   0.0619    linux-tracefile.c:187
  0000000000000193 12462    87556          0.0103   0.0721    linux-tracefile.c:187
  0000000000000196 1        87557         8.2e-07   0.0721    linux-tracefile.c:190
  00000000000001a0 23924    111481         0.0197   0.0918    linux-tracefile.c:192
  00000000000001a3 48882    160363         0.0403   0.1321    linux-tracefile.c:193
  00000000000001a8 3        160366        2.5e-06   0.1321    linux-tracefile.c:193
  00000000000001ab 5        160371        4.1e-06   0.1321    linux-tracefile.c:194
  00000000000001b3 13156    173527         0.0108   0.1430    linux-tracefile.c:194
  00000000000001b7 11       173538        9.1e-06   0.1430    thread_info.h:216
  00000000000001c0 43       173581        3.5e-05   0.1430    linux-tracefile.c:142
  00000000000001c7 43006    216587         0.0354   0.1784    linux-tracefile.c:142
  00000000000001cc 366      216953        3.0e-04   0.1787    linux-tracefile.c:142
  00000000000001d2 52496    269449         0.0432   0.2220    linux-tracefile.c:142
  00000000000001d4 199906   469355         0.1647   0.3867    thread_info.h:216
  00000000000001d6 836      470191        6.9e-04   0.3874    thread_info.h:216
  00000000000001d9 1145442  1615633        0.9437   1.3310    thread_info.h:216
  00000000000001dc 118121958 119737591     97.3140  98.6450    linux-tracefile.c:195
  00000000000001e0 744      119738335     6.1e-04  98.6456    linux-tracefile.c:196
  00000000000001e4 1841     119740176      0.0015  98.6472    linux-tracefile.c:198
  00000000000001e8 34073    119774249      0.0281  98.6752    linux-tracefile.c:203
  00000000000001ec 90       119774339     7.4e-05  98.6753    linux-tracefile.c:196
  00000000000001ef 13000    119787339      0.0107  98.6860    linux-tracefile.c:197
  00000000000001f3 1558552  121345891      1.2840  99.9700    linux-tracefile.c:197
  00000000000001f7 14       121345905     1.2e-05  99.9700    current.h:14
  0000000000000200 105      121346010     8.7e-05  99.9701    linux-tracefile.c:199
  0000000000000206 10931    121356941      0.0090  99.9791    linux-tracefile.c:199
  0000000000000209 7        121356948     5.8e-06  99.9791    linux-tracefile.c:200
  000000000000020e 184      121357132     1.5e-04  99.9793    linux-tracefile.c:201
  0000000000000215 12991    121370123      0.0107  99.9900    linux-tracefile.c:203
  0000000000000219 1        121370124     8.2e-07  99.9900    linux-tracefile.c:200
  000000000000021c 187      121370311     1.5e-04  99.9901    linux-tracefile.c:203
  0000000000000220 11753    121382064      0.0097  99.9998    linux-tracefile.c:203
  0000000000000221 228      121382292     1.9e-04  100.000    linux-tracefile.c:203
0000000000000030 48149841 169532133     15.2046  53.5344    l_lock.c:54                 ptlrpc.ko                ptlrpc.ko                lock_res_and_lock
  0000000000000030 1844     1844           0.0038   0.0038    l_lock.c:54
  0000000000000031 2957     4801           0.0061   0.0100    l_lock.c:54
  0000000000000034 1895     6696           0.0039   0.0139    l_lock.c:54
  0000000000000035 7        6703          1.5e-05   0.0139    l_lock.c:54
  0000000000000039 728007   734710         1.5120   1.5259    l_lock.c:54
  000000000000003e 5        734715        1.0e-05   1.5259    l_lock.c:56
  0000000000000045 895      735610         0.0019   1.5278    l_lock.c:54
  0000000000000048 7838     743448         0.0163   1.5440    l_lock.c:56
  0000000000000053 54       743502        1.1e-04   1.5441    lustre_dlm.h:1228
  0000000000000057 653      744155         0.0014   1.5455    lustre_dlm.h:1228
  000000000000005b 1514     745669         0.0031   1.5486    lustre_dlm.h:1228
  0000000000000060 47396462 48142131      98.4353  99.9840    lustre_dlm.h:1228
  0000000000000064 649      48142780       0.0013  99.9853    l_lock.c:61
  0000000000000068 4919     48147699       0.0102  99.9956    l_lock.c:61
  0000000000000069 164      48147863      3.4e-04  99.9959    l_lock.c:61
  000000000000006a 1978     48149841       0.0041  100.000    l_lock.c:61
ffffffff81035630 27003722 196535855      8.5272  62.0615    hpet.c:772                  vmlinux                  vmlinux                  read_hpet
  ffffffff81035630 18056    18056          0.0669   0.0669    hpet.c:772
  ffffffff81035631 22548    40604          0.0835   0.1504    hpet.c:772
  ffffffff81035634 2        40606         7.4e-06   0.1504    hpet.c:772
  ffffffff81035640 15796    56402          0.0585   0.2089    io.h:22
  ffffffff81035646 1412635  1469037        5.2313   5.4401    io.h:22
  ffffffff81035648 25518796 26987833      94.5010  99.9412    hpet.c:774
  ffffffff81035649 15889    27003722       0.0588  100.000    hpet.c:774
000000000000e8c0 15479461 212015316      4.8881  66.9496    hash.c:1574                 libcfs.ko                libcfs.ko                cfs_hash_for_each_relax
  000000000000e8c1 3361     3361           0.0217   0.0217    hash.c:1574
  000000000000e8c8 1832     5193           0.0118   0.0335    hash.c:1574
  000000000000e8cd 1917     7110           0.0124   0.0459    hash.c:1574
  000000000000e8d6 1854     8964           0.0120   0.0579    libcfs_debug.h:199
  000000000000e8dd 1        8965          6.5e-06   0.0579    hash.c:1574
  000000000000e8e0 16       8981          1.0e-04   0.0580    hash.c:1574
  000000000000e8e4 2028     11009          0.0131   0.0711    hash.c:1574

  000000000000e8e8 12       11021         7.8e-05   0.0712    libcfs_debug.h:199
  000000000000e8f1 172      11193          0.0011   0.0723    libcfs_debug.h:199
  000000000000e8f7 613      11806          0.0040   0.0763    libcfs_hash.h:454
  000000000000e8fb 1832     13638          0.0118   0.0881    hash.c:1585
  000000000000e908 264      13902          0.0017   0.0898    hash.c:1585
  000000000000e90e 1836     15738          0.0119   0.1017    libcfs_hash.h:574
  000000000000e912 123      15861         7.9e-04   0.1025    libcfs_hash.h:574
  000000000000e917 1864     17725          0.0120   0.1145    libcfs_hash.h:574
  000000000000e919 3816     21541          0.0247   0.1392    hash.c:1589
  000000000000e91d 150      21691         9.7e-04   0.1401    hash.c:1589
  000000000000e923 1964     23655          0.0127   0.1528    hash.c:1591
  000000000000e931 1953     25608          0.0126   0.1654    hash.c:1591
  000000000000e93e 1929     27537          0.0125   0.1779    hash.c:1591
  000000000000e942 13       27550         8.4e-05   0.1780    hash.c:1591
  000000000000e949 5        27555         3.2e-05   0.1780    hash.c:1591
  000000000000e950 2283     29838          0.0147   0.1928    libcfs_hash.h:592
  000000000000e954 85590    115428         0.5529   0.7457    libcfs_hash.h:592
  000000000000e956 2        115430        1.3e-05   0.7457    libcfs_hash.h:592
  000000000000e959 190519   305949         1.2308   1.9765    libcfs_hash.h:637
  000000000000e95d 44565    350514         0.2879   2.2644    libcfs_hash.h:637
  000000000000e960 2        350516        1.3e-05   2.2644    hash.c:1597
  000000000000e967 54428    404944         0.3516   2.6160    libcfs_hash.h:637
  000000000000e96a 1257     406201         0.0081   2.6241    libcfs_hash.h:637
  000000000000e970 1658681  2064882       10.7154  13.3395    libcfs_hash.h:683
  000000000000e974 61327    2126209        0.3962  13.7357    libcfs_hash.h:683
  000000000000e977 1453     2127662        0.0094  13.7451    libcfs_hash.h:683
  000000000000e97a 1713200  3840862       11.0676  24.8126    libcfs_hash.h:683
  000000000000e97c 2480136  6320998       16.0221  40.8347    hash.c:1597
  000000000000e97f 6860     6327858        0.0443  40.8791    hash.c:1597
  000000000000e985 1736065  8063923       11.2153  52.0943    hash.c:1598
  000000000000e988 86       8064009       5.6e-04  52.0949    hash.c:1598
  000000000000e98b 8691     8072700        0.0561  52.1510    hash.c:1598
  000000000000e991 1788     8074488        0.0116  52.1626    hash.c:1598
  000000000000e995 37138    8111626        0.2399  52.4025    hash.c:1598
  000000000000e999 1        8111627       6.5e-06  52.4025    hash.c:1598
  000000000000e9a0 101      8111728       6.5e-04  52.4032    libcfs_hash.h:560
  000000000000e9a4 3117     8114845        0.0201  52.4233    libcfs_hash.h:560
  000000000000e9ab 170      8115015        0.0011  52.4244    libcfs_hash.h:560
  000000000000e9b1 9885     8124900        0.0639  52.4883    libcfs_hash.h:562
  000000000000e9b7 19       8124919       1.2e-04  52.4884    libcfs_hash.h:562
  000000000000e9ba 1643     8126562        0.0106  52.4990    libcfs_hash.h:562
  000000000000e9bc 134      8126696       8.7e-04  52.4999    hash.c:1607
  000000000000e9c1 114      8126810       7.4e-04  52.5006    libcfs_hash.h:574
  000000000000e9c5 2729     8129539        0.0176  52.5182    libcfs_hash.h:574
  000000000000e9c7 97       8129636       6.3e-04  52.5189    libcfs_hash.h:574
  000000000000e9ca 222      8129858        0.0014  52.5203    hash.c:1607
  000000000000e9ce 1475     8131333        0.0095  52.5298    libcfs_hash.h:574
  000000000000e9d0 24743    8156076        0.1598  52.6897    libcfs_hash.h:592
  000000000000e9d4 1606     8157682        0.0104  52.7000    libcfs_hash.h:592
  000000000000e9d6 3        8157685       1.9e-05  52.7001    libcfs_hash.h:592
  000000000000e9da 309      8157994        0.0020  52.7021    libcfs_hash.h:592
  000000000000e9dd 3609     8161603        0.0233  52.7254    libcfs_hash.h:637
  000000000000e9e1 2135     8163738        0.0138  52.7392    hash.c:1617
  000000000000e9ea 1606     8165344        0.0104  52.7495    hash.c:1617
  000000000000e9f6 1650     8166994        0.0107  52.7602    hash.c:1623
  000000000000ea03 1644     8168638        0.0106  52.7708    hash.c:1598
  000000000000ea0d 3468     8172106        0.0224  52.7932    libcfs_hash.h:546
  000000000000ea13 1        8172107       6.5e-06  52.7932    libcfs_hash.h:546
  000000000000ea16 1705     8173812        0.0110  52.8042    libcfs_hash.h:598
  000000000000ea1a 82       8173894       5.3e-04  52.8048    libcfs_hash.h:598
  000000000000ea20 1810     8175704        0.0117  52.8165    libcfs_hash.h:598
  000000000000ea23 63       8175767       4.1e-04  52.8169    libcfs_hash.h:579
  000000000000ea27 1751     8177518        0.0113  52.8282    libcfs_hash.h:579
  000000000000ea2c 49       8177567       3.2e-04  52.8285    libcfs_hash.h:579
  000000000000ea2f 60       8177627       3.9e-04  52.8289    hash.c:1604
  000000000000ea33 1761     8179388        0.0114  52.8403    hash.c:1604
  000000000000ea3c 1738     8181126        0.0112  52.8515    hash.c:1604
  000000000000ea3f 177      8181303        0.0011  52.8526    hash.c:1605
  000000000000ea42 28630    8209933        0.1850  53.0376    hash.c:1605
  000000000000eaa5 1723469  9933402       11.1339  64.1715    hash.c:1597
  000000000000eaa8 1718161  11651563      11.0996  75.2711    hash.c:1597
  000000000000eaac 7189     11658752       0.0464  75.3176    hash.c:1597
  000000000000eab2 2105171  13763923      13.5998  88.9173    hash.c:1597
  000000000000eab4 2200     13766123       0.0142  88.9315    hash.c:1597
  000000000000eab6 44       13766167      2.8e-04  88.9318    hash.c:1597
  000000000000eabc 53076    13819243       0.3429  89.2747    libcfs_hash.h:598
  000000000000eac0 1226196  15045439       7.9214  97.1961    libcfs_hash.h:598
  000000000000eac2 53492    15098931       0.3456  97.5417    libcfs_hash.h:598
  000000000000eac9 47528    15146459       0.3070  97.8487    hash.c:1591
  000000000000eacd 54142    15200601       0.3498  98.1985    hash.c:1591
  000000000000ead7 53856    15254457       0.3479  98.5464    hash.c:1591
  000000000000eade 53717    15308174       0.3470  98.8935    hash.c:1591
  000000000000eae0 1        15308175      6.5e-06  98.8935    hash.c:1591
  000000000000eae4 9268     15317443       0.0599  98.9533    hash.c:1591
  000000000000eae8 50483    15367926       0.3261  99.2795    hash.c:1591

  000000000000eaec 1        15367927      6.5e-06  99.2795    hash.c:1591
  000000000000eaef 320      15368247       0.0021  99.2815    hash.c:1591
  000000000000eaf3 51637    15419884       0.3336  99.6151    hash.c:1591
  000000000000eafd 49470    15469354       0.3196  99.9347    libcfs_hash.h:579
  000000000000eb00 6        15469360      3.9e-05  99.9347    libcfs_hash.h:579
  000000000000eb02 1854     15471214       0.0120  99.9467    libcfs_hash.h:579
  000000000000eb05 189      15471403       0.0012  99.9479    hash.c:1632
  000000000000eb08 1966     15473369       0.0127  99.9606    hash.c:1632
  000000000000eb0d 1874     15475243       0.0121  99.9728    hash.c:1632
  000000000000eb13 1918     15477161       0.0124  99.9851    hash.c:1632
  000000000000eb16 1944     15479105       0.0126  99.9977    hash.c:1632
  000000000000eb17 1        15479106      6.5e-06  99.9977    hash.c:1583
  000000000000eb1e 169      15479275       0.0011  99.9988    hash.c:1583
  000000000000eb25 2        15479277      1.3e-05  99.9988    hash.c:1583
  000000000000eb31 184      15479461       0.0012  100.000    hash.c:1583
Comment by Ned Bass [ 21/May/12 ]

It's pretty clear that we're spending over 30% of the cpu time on this one line:

00000000000001dc 118121958 119737591 97.3140 98.6450 linux-tracefile.c:195

I think that's because you looked after I turned on full lustre debugging and I forgot to stop profiling.

Comment by Ned Bass [ 21/May/12 ]

Uploading lustre debug log LU-1428.lustre.log.pigs-mds1.1337628934.gz to ftp.whamcloud.com. The spinning processes are 26095 and 29818.

Comment by Prakash Surya (Inactive) [ 21/May/12 ]

Sure. Looking only at the cfs_hash_for_each_relax function, these jump out at me:

  000000000000e970 1658681  2064882       10.7154  13.3395    libcfs_hash.h:683
...
  000000000000e97a 1713200  3840862       11.0676  24.8126    libcfs_hash.h:683
  000000000000e97c 2480136  6320998       16.0221  40.8347    hash.c:1597
...
  000000000000e985 1736065  8063923       11.2153  52.0943    hash.c:1598
...
  000000000000eaa5 1723469  9933402       11.1339  64.1715    hash.c:1597
  000000000000eaa8 1718161  11651563      11.0996  75.2711    hash.c:1597
...
  000000000000eab2 2105171  13763923      13.5998  88.9173    hash.c:1597
...
  000000000000eac0 1226196  15045439       7.9214  97.1961    libcfs_hash.h:598

Which point to these lines:

1597                 cfs_hash_bd_for_each_hlist(hs, &bd, hhead) {
...
1626                 }
1627                 cfs_hash_bd_unlock(hs, &bd, 0);

I'm not sure what to make of it though. My first guess was the old hash function but that might be ruled out as we have the LU-143 fid hash improvement patch applied.

Comment by Ned Bass [ 21/May/12 ]

We also had a similar occurrence around the same time on a separate Lustre cluster. In that case the ll_evictor thread on the MDS was spinning in the same function.

Comment by Peter Jones [ 22/May/12 ]

Oleg

Could you please comment on this one?

Thanks

Peter

Comment by Bruno Faccini (Inactive) [ 23/May/12 ]

We also encounter this same situation on one of our MDS who was not fully operational ("df" was working, but some sub-trees access was hanging).

In our case, there was 4 threads eating 100% cpu but regularly re-schedule()'ing :

> 9735 2 18 ffff88187c3d87d0 RU 0.0 0 0 [ldlm_elt]
> 9770 2 12 ffff88205ada0100 RU 0.0 0 0 [ll_evictor]
> 35489 2 1 ffff881854ef2790 RU 0.0 0 0 [mdt_121]
> 46002 2 15 ffff88184cca97d0 RU 0.0 0 0 [mdt_445]

since we could not find a way to gracefully recover from the situation we decided to re-boot the MDT and to take/force a crash-dump where the threads stacks looks the same than in our live Alt+SysRq+l attempts :

crash> bt 9735 9770 35489 46002
PID: 9735 TASK: ffff88187c3d87d0 CPU: 18 COMMAND: "ldlm_elt"
#0 [ffff88109c707e90] crash_nmi_callback at ffffffff8101fd06
#1 [ffff88109c707ea0] notifier_call_chain at ffffffff814837f5
#2 [ffff88109c707ee0] atomic_notifier_call_chain at ffffffff8148385a
#3 [ffff88109c707ef0] notify_die at ffffffff8108026e
#4 [ffff88109c707f20] do_nmi at ffffffff81481443
#5 [ffff88109c707f50] nmi at ffffffff81480d50
[exception RIP: cfs_hash_for_each_relax+193]
RIP: ffffffffa0410e11 RSP: ffff8818675e7b80 RFLAGS: 00000282
RAX: ffff88165f78567c RBX: ffff881020d2ccc0 RCX: 0000000000000005
RDX: 000000000000000d RSI: ffff8818675e7bd0 RDI: ffff881020d2ccc0
RBP: ffff8818675e7c10 R8: 0000000209bd7a5d R9: 6d68000000000000
R10: 6b40000000000000 R11: 0c204977264ccdad R12: 0000000000000000
R13: ffffffffa058e3c0 R14: 0000000000000001 R15: ffff881020d2cd40
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
— <NMI exception stack> —
#6 [ffff8818675e7b80] cfs_hash_for_each_relax at ffffffffa0410e11 [libcfs]
#7 [ffff8818675e7c18] cfs_hash_for_each_empty at ffffffffa0412466 [libcfs]
#8 [ffff8818675e7c98] ldlm_cancel_locks_for_export at ffffffffa058a49f [ptlrpc]
#9 [ffff8818675e7ca8] server_disconnect_export at ffffffffa059a2b4 [ptlrpc]
#10 [ffff8818675e7d28] mdt_obd_disconnect at ffffffffa0a079eb [mdt]
#11 [ffff8818675e7e48] class_fail_export at ffffffffa04aacde [obdclass]
#12 [ffff8818675e7e98] expired_lock_main at ffffffffa05b0dd4 [ptlrpc]
#13 [ffff8818675e7f48] kernel_thread at ffffffff810041aa

PID: 9770 TASK: ffff88205ada0100 CPU: 12 COMMAND: "ll_evictor"
#0 [ffff8800450c7e90] crash_nmi_callback at ffffffff8101fd06
#1 [ffff8800450c7ea0] notifier_call_chain at ffffffff814837f5
#2 [ffff8800450c7ee0] atomic_notifier_call_chain at ffffffff8148385a
#3 [ffff8800450c7ef0] notify_die at ffffffff8108026e
#4 [ffff8800450c7f20] do_nmi at ffffffff81481443
#5 [ffff8800450c7f50] nmi at ffffffff81480d50
[exception RIP: cfs_hash_for_each_relax+168]
RIP: ffffffffa0410df8 RSP: ffff88205adc7b20 RFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff8810254ecd80 RCX: 0000000000000005
RDX: 0000000000000003 RSI: ffff88205adc7b70 RDI: ffff8810254ecd80
RBP: ffff88205adc7bb0 R8: 0000000208590aee R9: 2e90000000000000
R10: 7480000000000000 R11: 9ddca10a56b825d2 R12: 0000000000000000
R13: ffffffffa058e3c0 R14: 0000000000000001 R15: ffff8810254ece00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
— <NMI exception stack> —
#6 [ffff88205adc7b20] cfs_hash_for_each_relax at ffffffffa0410df8 [libcfs]
#7 [ffff88205adc7bb8] cfs_hash_for_each_empty at ffffffffa0412466 [libcfs]
#8 [ffff88205adc7c38] ldlm_cancel_locks_for_export at ffffffffa058a49f [ptlrpc]
#9 [ffff88205adc7c48] server_disconnect_export at ffffffffa059a2b4 [ptlrpc]
#10 [ffff88205adc7cc8] mdt_obd_disconnect at ffffffffa0a079eb [mdt]
#11 [ffff88205adc7de8] class_fail_export at ffffffffa04aacde [obdclass]
#12 [ffff88205adc7e38] ping_evictor_main at ffffffffa05e5a5d [ptlrpc]
#13 [ffff88205adc7f48] kernel_thread at ffffffff810041aa

PID: 35489 TASK: ffff881854ef2790 CPU: 1 COMMAND: "mdt_121"
#0 [ffff88089c407e90] crash_nmi_callback at ffffffff8101fd06
#1 [ffff88089c407ea0] notifier_call_chain at ffffffff814837f5
#2 [ffff88089c407ee0] atomic_notifier_call_chain at ffffffff8148385a
#3 [ffff88089c407ef0] notify_die at ffffffff8108026e
#4 [ffff88089c407f20] do_nmi at ffffffff81481443
#5 [ffff88089c407f50] nmi at ffffffff81480d50
[exception RIP: cfs_hash_for_each_relax+193]
RIP: ffffffffa0410e11 RSP: ffff88184e85b960 RFLAGS: 00000286
RAX: ffff8817f53bf10c RBX: ffff88105d484800 RCX: 0000000000000005
RDX: 000000000000001f RSI: ffff88184e85b9b0 RDI: ffff88105d484800
RBP: ffff88184e85b9f0 R8: 0000000208590aee R9: f188000000000000
R10: 8c40000000000000 R11: 9ddca109f53c3e31 R12: 0000000000000000
R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
— <NMI exception stack> —
#6 [ffff88184e85b960] cfs_hash_for_each_relax at ffffffffa0410e11 [libcfs]
#7 [ffff88184e85b9f8] cfs_hash_for_each_empty at ffffffffa0412466 [libcfs]
#8 [ffff88184e85ba78] ldlm_cancel_locks_for_export at ffffffffa058a49f [ptlrpc]
#9 [ffff88184e85ba88] server_disconnect_export at ffffffffa059a2b4 [ptlrpc]
#10 [ffff88184e85bb08] mdt_obd_disconnect at ffffffffa0a079eb [mdt]
#11 [ffff88184e85bc28] target_handle_disconnect at ffffffffa05966a9 [ptlrpc]
#12 [ffff88184e85bc88] mdt_disconnect at ffffffffa09fe2a9 [mdt]
#13 [ffff88184e85bcd8] mdt_handle_common at ffffffffa09f9865 [mdt]
#14 [ffff88184e85bd58] mdt_regular_handle at ffffffffa09fa875 [mdt]
#15 [ffff88184e85bd68] ptlrpc_main at ffffffffa05e4829 [ptlrpc]
#16 [ffff88184e85bf48] kernel_thread at ffffffff810041aa

PID: 46002 TASK: ffff88184cca97d0 CPU: 15 COMMAND: "mdt_445"
#0 [ffff88189c4c7e90] crash_nmi_callback at ffffffff8101fd06
#1 [ffff88189c4c7ea0] notifier_call_chain at ffffffff814837f5
#2 [ffff88189c4c7ee0] atomic_notifier_call_chain at ffffffff8148385a
#3 [ffff88189c4c7ef0] notify_die at ffffffff8108026e
#4 [ffff88189c4c7f20] do_nmi at ffffffff81481443
#5 [ffff88189c4c7f50] nmi at ffffffff81480d50
[exception RIP: cfs_hash_hh_hhead]
RIP: ffffffffa040f280 RSP: ffff881849f6b958 RFLAGS: 00000246
RAX: ffffffffa0422200 RBX: ffff88106b5d5b00 RCX: 0000000000000005
RDX: 0000000000000002 RSI: ffff881849f6b9b0 RDI: ffff88106b5d5b00
RBP: ffff881849f6b9f0 R8: 00000002075042cd R9: 0858000000000000
R10: 42c0000000000000 R11: c0fc1c42c8d4010b R12: 0000000000000000
R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
— <NMI exception stack> —
#6 [ffff881849f6b958] cfs_hash_hh_hhead at ffffffffa040f280 [libcfs]
#7 [ffff881849f6b958] cfs_hash_for_each_relax at ffffffffa0410e05 [libcfs]
#8 [ffff881849f6b9f8] cfs_hash_for_each_empty at ffffffffa0412466 [libcfs]
#9 [ffff881849f6ba78] ldlm_cancel_locks_for_export at ffffffffa058a49f [ptlrpc]
#10 [ffff881849f6ba88] server_disconnect_export at ffffffffa059a2b4 [ptlrpc]
#11 [ffff881849f6bb08] mdt_obd_disconnect at ffffffffa0a079eb [mdt]
#12 [ffff881849f6bc28] target_handle_disconnect at ffffffffa05966a9 [ptlrpc]
#13 [ffff881849f6bc88] mdt_disconnect at ffffffffa09fe2a9 [mdt]
#14 [ffff881849f6bcd8] mdt_handle_common at ffffffffa09f9865 [mdt]
#15 [ffff881849f6bd58] mdt_regular_handle at ffffffffa09fa875 [mdt]
#16 [ffff881849f6bd68] ptlrpc_main at ffffffffa05e4829 [ptlrpc]
#17 [ffff881849f6bf48] kernel_thread at ffffffff810041aa
crash>
crash> ps | grep UN
crash> log | less
crash> bt 35683
PID: 35683 TASK: ffff8810732200c0 CPU: 21 COMMAND: "mdt_315"
#0 [ffff881025bf7650] schedule at ffffffff8147dddc
#1 [ffff881025bf7718] cfs_waitq_wait at ffffffffa040175e [libcfs]
#2 [ffff881025bf7728] ldlm_completion_ast at ffffffffa05ad372 [ptlrpc]
#3 [ffff881025bf77f8] ldlm_cli_enqueue_local at ffffffffa05aca79 [ptlrpc]
#4 [ffff881025bf78b8] mdt_object_lock at ffffffffa09f544e [mdt]
#5 [ffff881025bf7978] mdt_getattr_name_lock at ffffffffa0a0122b [mdt]
#6 [ffff881025bf7a58] mdt_intent_getattr at ffffffffa0a0247a [mdt]
#7 [ffff881025bf7af8] mdt_intent_policy at ffffffffa09ff630 [mdt]
#8 [ffff881025bf7b68] ldlm_lock_enqueue at ffffffffa058eb8a [ptlrpc]
#9 [ffff881025bf7c08] ldlm_handle_enqueue0 at ffffffffa05b5767 [ptlrpc]
#10 [ffff881025bf7ca8] mdt_enqueue at ffffffffa09ff0ca [mdt]
#11 [ffff881025bf7cd8] mdt_handle_common at ffffffffa09f9865 [mdt]
#12 [ffff881025bf7d58] mdt_regular_handle at ffffffffa09fa875 [mdt]
#13 [ffff881025bf7d68] ptlrpc_main at ffffffffa05e4829 [ptlrpc]
#14 [ffff881025bf7f48] kernel_thread at ffffffff810041aa
crash>

thus my current opinion is that there seems to be a problem (loop ?) in the hash-list/struct that manage export's locks.

But I need to dig more in the crash-dump to conclude ...

Comment by Ned Bass [ 23/May/12 ]

Bruno,

Do you have the LU-143 patch? It was not landed until 2.2. Since your threads were getting rescheduled it makes me wonder if you are just suffering from poor hash distribution.

Comment by Bruno Faccini (Inactive) [ 04/Jun/12 ]

Thank's for pointing that Ned, the patch for LU-143 was not integrated. R&D made the job to back-port it in our Lustre 2.1 ditro, so now we just need to expose it to customer's production work-load and see how things work !!...

Comment by Liang Zhen (Inactive) [ 05/Jun/12 ]

Looks like this one is same with LU-1087, due to log on LU-1087 I suspect that we left a lock on the export and can't remove it by this path:

ldlm_cancel_locks_for_export_cb()->
       ldlm_lock_cancel()->
             ldlm_lock_destroy_nolock()->
                   ldlm_lock_destroy_internal()->cfs_hash_del()

we actually started to see this since:
https://bugzilla.lustre.org/show_bug.cgi?id=19557

Probably we never really can fix this issue although we have landed a patch on BZ 19557.

I suspect it's because:

ldlm_lock_destroy_internal:

        if (lock->l_destroyed) {
                LASSERT(cfs_list_empty(&lock->l_lru));
                EXIT;
                return 0;
        }
        lock->l_destroyed = 1;

        if (lock->l_export && lock->l_export->exp_lock_hash &&
            !cfs_hlist_unhashed(&lock->l_exp_hash))
                cfs_hash_del(lock->l_export->exp_lock_hash,
                             &lock->l_remote_handle, &lock->l_exp_hash);

lock->l_exp_hash should be protected by internal lock of cfs_hash, but we called cfs_hlist_unhashed(&lock->l_exp_hash) w/o holding cfs_hash lock, which means if someone wants to cancel this lock while export->exp_lock_hash is in progress of rehashing (thread context of cfs_workitem), there could be tiny window between deleting a lock from bucket[A] and re-adding it to bucket[B] of l_exp_hash, and cfs_hlist_unhashed(&lock->l_exp_hash) will return 1 in this tiny window, then we destroyed a lock but left it on l_exp_hash forever because we set lock::l_destroyed to 1 and ldlm_lock_destroy_internal() wouldn't do this again for us even it's called multiple times.

making a simple change to cfs_hash_del() and removing the above checking could fix this issue, I can post a patch for this.

Comment by Liang Zhen (Inactive) [ 05/Jun/12 ]

I've posted a patch for review: http://review.whamcloud.com/#change,3028

Comment by Ned Bass [ 12/Jun/12 ]

Thanks, we also need this patch for 2.1.

Comment by Bruno Faccini (Inactive) [ 19/Jun/12 ]

Liang,

I wanted to provide more details for this problem, and particulary to indicate that I feel pretty sure that the looping situation will never clear due to "orphaned" locks stiil beeing on the hash-lists, but seems that you finally found the bug/window !!

So, may be its too late, but just in case, I also found that digging in the Servers/Clients logs, it seems to me that at the start of the problem, the concerned Client/export reported the following errors/msgs during MDT umount :

1339507111 2012 Jun 12 15:18:31 lascaux3332 kern err kernel LustreError: 57793:0:(lmv_obd.c:665:lmv_disconnect_mdc()) Target scratch2-MDT0000_UUID disconnect error -110
1339507111 2012 Jun 12 15:18:31 lascaux3332 kern warning kernel Lustre: client ffff88087d539400 umount complete
1339507130 2012 Jun 12 15:18:50 lascaux3332 kern err kernel LustreError: 57863:0:(ldlm_request.c:1172:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
1339507130 2012 Jun 12 15:18:50 lascaux3332 kern err kernel LustreError: 57863:0:(ldlm_request.c:1172:ldlm_cli_cancel_req()) Skipped 52 previous similar messages

Also interesting is that this situation may have a "low" impact when concerning a single Client/export which already umounted and locks on a very specific part/directory of the filesystem !! This explain why we found multiple mdt_<id> threads spinning in this situation since days and no problem reported nor found whan accessing/using the concerned filesystem !!!

Comment by Andreas Dilger [ 19/Jun/12 ]

Patch landed to master for Lustre 2.3.0, still needs to be landed to b2_1 for 2.1.3.

Comment by Jay Lan (Inactive) [ 29/Jun/12 ]

NASA Ames hit this this morning on a 2.1.1 MDS.

Comment by Peter Jones [ 23/Jul/12 ]

Landed for 2.1.3 and 2.3

Generated at Sat Feb 10 01:16:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.