How to return 'X' number of unique subsets (combinations) of 'N' numbers taken 'K' at a time

4 次查看(过去 30 天)
I need to return X number of unique combinations of N numbers (i.e., vector V of length N) taken K at a time.
I can't use 'nchoosek' because I don't want ALL unique combinations. I just want X number of them and 'nchoosek' will crash if I enter the actual values for V and K because V is too large.
Here's an example, with more descriptive variable names…
origSet = rand(1,500); %the full original (example) set of numbers
desNumComb = 10000; %the number of unique combinations/subsets that I want to end up with
subsetSize = 10; %the desired size for each combination/subset
allCombos = nchoosek(1:length(origSet), subsetSize); %will return ALL possible combinations (if it ran)
subsetInds = allCombos(desNumComb,:); %the indices for each of the desNumComb subsets
Worth mentioning is that the size of the original set of numbers [i.e., length(origSet) ], the desired subset size [i.e., subsetSize], and the desired number of unique combinations [i.e., desNumComb] will possibly vary every time I loop through, which will be many times.
Thanks in advance to all.
Cheers, John
  2 个评论
Walter Roberson
Walter Roberson 2015-7-23
Which X subsets? The "first" X subsets under some specific ordering? X random subsets? Are you using this to iterate through all the possibilities in batches?
John Trimper
John Trimper 2015-7-24
Hi Walter,
It doesn't matter which X subsets out of the full range of unique possibilities. What matters is that they're all unique.
Here's what I'm doing: I need to compare two groups but they have really different numbers of samples. One group has up to several hundred, while the other group might have as few as 5. The metric I'm using is biased so I need to equate the number of samples in each group. So what I want to do is repeatedly subsample the larger group down to match the number of samples in the smaller group, up to 10,000 times (but not more) and then average over the measurements taken across those 10,000 subsamples. Since the total number of unique combinations is WAY more than I need (incomputable by nchoosek), I need to find a way to only get a reduced chosen number of unique combinations.
I hope that helps to clarify. Thank you for your time.

请先登录,再进行评论。

采纳的回答

John Trimper
John Trimper 2015-7-27
编辑:Walter Roberson 2015-7-27
Answer provided by Star Strider & Walter Roberson above, worked out in comments, summarized here:
Use randperm to generate more vectors than necessary, then use unique(A, 'rows', 'stable') to select only unique combinations.
Example code for those interested:
biggerGroup = rand(1,100);
subsetSize = 10;
mixer = zeros(1, length(biggerGroup));
mixer(1:subsetSize) = 1;
for s = 1:20000; %more shuffles than I actually need
mixer = mixer(randperm(length(mixer)));
allCombs(s,:) = biggerGroup(mixer==1);
end
uniqueShufs = unique(allCombs, 'rows', 'stable');
desNumUniShuf = 10000; %actual desired # of unique shuffles
myUniShufs = uniqueShufs(1:desNumUniShuf,:);

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Creating and Concatenating Matrices 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by