<br><font size=2 face="sans-serif">Hi, good holidays, there ?</font>
<br><font size=2 face="sans-serif">I have applied the patch below.</font>
<br><font size=2 face="sans-serif">It works now:</font>
<br>
<br><font size=2 face="sans-serif">padbr345P -O rmgr=slurm --proc-summary -a</font>
<br><font size=2 face="sans-serif">Warning, remote process state differs across ranks</font>
<br><font size=2 face="sans-serif">state : ranks</font>
<br><font size=2 face="sans-serif">R (running) : [2]</font>
<br><font size=2 face="sans-serif">S (sleeping) : [0-1,3-7]</font>
<br><font size=2 face="sans-serif">rank hostname pid vmsize vmrss S uptime %cpu lcore command </font>
<br><font size=2 face="sans-serif"> 0 vb8 24595 133440 kB 47296 kB S 0.01 0 0 pp_sndrcv_spbl</font>
<br><font size=2 face="sans-serif"> 1 vb9 17406 111616 kB 25536 kB S 0.01 0 0 pp_sndrcv_spbl</font>
<br><font size=2 face="sans-serif"> 2 vb10 12521 133440 kB 47296 kB R 0.93 99 1 pp_sndrcv_spbl</font>
<br><font size=2 face="sans-serif"> 3 vb8 24588 111616 kB 25728 kB S 0.01 0 2 pp_sndrcv_spbl</font>
<br><font size=2 face="sans-serif"> 4 vb9 17411 111616 kB 25600 kB S 0.01 0 5 pp_sndrcv_spbl</font>
<br><font size=2 face="sans-serif"> 5 vb10 12522 111616 kB 25600 kB S 0.93 0 0 pp_sndrcv_spbl</font>
<br><font size=2 face="sans-serif"> 6 vb8 24589 111616 kB 25600 kB S 0.01 0 3 pp_sndrcv_spbl</font>
<br><font size=2 face="sans-serif"> 7 vb9 17407 112640 kB 25728 kB S 0.01 0 0 pp_sndrcv_spbl</font>
<br><font size=2 face="sans-serif">[thipa@vb0 openmpi]$ </font>
<br>
<br><font size=2 face="sans-serif">Thipadin.</font>
<br>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td>
<td><font size=1 face="sans-serif"><b>Ashley Pittman <ashley@pittman.co.uk></b></font>
<p><font size=1 face="sans-serif">12/03/2009 12:08 PM</font>
<br>
<td><font size=1 face="Arial"> </font>
<br><font size=1 face="sans-serif"> Pour : thipadin.seng-long@bull.net</font>
<br><font size=1 face="sans-serif"> cc : florence.vallee@bull.net, francois.wellenreiter@bull.net, padb-devel@pittman.org.uk, Sylvain.JEAUGEY@bull.net</font>
<br><font size=1 face="sans-serif"> Objet : Re: Réf. : Re: [padb] Patch of support of Slurm + Openmpi Orte manager</font></table>
<br>
<br><font size=2 face="Courier New"><br>
I'm just running out of the door myself and will be away until Sunday<br>
now.<br>
<br>
On Thu, 2009-12-03 at 11:45 +0100, thipadin.seng-long@bull.net wrote:<br>
> You have mpirun which has rank0, this shouldn't, and you miss 3,6.<br>
<br>
ranks 3 and 6 are on the same node as rank 0, can you try the following<br>
additional patch which should cause it to skip over the mpirun process<br>
and look for local ones based on their environment.<br>
<br>
If this patch doesn't work take a look at the the contents<br>
of /proc/$pid/status for the process it's erroneously reporting as rank<br>
0 to see what Name is set to. In the example you sent it's pid 22210<br>
<br>
--- padb-slurm-open-3 2009-12-03 11:03:08.500044734 +0000<br>
+++ padb 2009-12-03 11:03:15.333036493 +0000<br>
@@ -8187,6 +8187,7 @@<br>
next unless ( $job eq $jobid );<br>
next unless ( $step == $inner_conf{slurm_job_step} );<br>
next if( find_from_status( $pid, 'Name' ) eq 'orted');<br>
+ next if( find_from_status( $pid, 'Name' ) eq 'mpirun');<br>
maybe_show_pid( $global, $pid );<br>
$found_target = 1;<br>
}<br>
<br>
<br>
-- <br>
<br>
Ashley Pittman, Bath, UK.<br>
<br>
Padb - A parallel job inspection tool for cluster computing<br>
http://padb.pittman.org.uk<br>
<br>
</font>
<br>
<br>