How the labindex was assigned for the workers inside a node/machine in MDCS?

1 次查看(过去 30 天)
We know that in MDCS we can choose to create more than one workers inside a node/machine, say 4 workers per node/machine. So how the labindex was assigned for these 4 workers?Are thay always 1,2,3,4 for each node, or they are continuous increment node by node, such as 5-8, 9-12..., or they are totally random such as 1,3,9,6 for a node/,machine?

采纳的回答

Edric Ellis
Edric Ellis 2018-5-25
You don't specify which cluster type you're using with MDCS, but I'm going to assume MJS for now. (Not all of what follows will be scheduler-specific).
labindex within an spmd context is equal to the task index executing on the worker. So, if you have 2 nodes each running 4 workers, and you run a single communicating job of size 8 (i.e. parpool('myMjsCluster', 8)), then the task indices are 1:8, as are the corresponding values of labindex.
MJS will endeavour to schedule things such that consecutive tasks are co-located on a single node - i.e. it will attempt to put tasks 1:4 on the first node, and 5:8 on the second. (Most other scheduler types will end up doing something similar, but by a different means).
Basically, what you need to do is come up with a mapping of labindex to hostname to work out which labs are located on which host, and then you can use that "local labindex" to pick which Java program to use. Here's one way.
spmd
[s, hostname] = system('hostname');
assert(s == 0, 'Failed to compute hostname');
hostname = strtrim(hostname);
% Get a list of all hostnames in the pool
allHostnames = gcat({hostname}, 1);
% Work out which labindex values are on this host
allLabs = 1:numlabs;
labsOnThisHost = allLabs(strcmp(hostname, allHostnames))
% Work out this lab's position among the labs on this host
myIndexOnThisHost = find(labindex == labsOnThisHost)
end

更多回答(1 个)

Walter Roberson
Walter Roberson 2018-5-25
"The value of labindex spans from 1 to n, where n is the number of workers running the current job, defined by numlabs"
"This was done by pause a random seconds and then detect if there is ###.exe running in the tasklist of this node."
I would probably think in terms of having
if labindex == 1
check in case somehow external software is running
otherwise
launch external software
do any waiting for external software to be ready to go
end
end
labbarrier();
  1 个评论
raym
raym 2018-5-25
Thanks Roberson. Your code is really a better way to share the external software, but I am not sure if every machine has labindex 1. In fact that's the key of this question.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Parallel and Cloud 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by