Generate large nchoosek in batches?

7 次查看(过去 30 天)
I would like to generate all possible combinations of a binary vector, I’ve tried using nchoosek and it does what I want it to do, but I run out of memory if I have a k value of more than about 4.
Is there any way that I can put the nchoosek in a loop and trade speed for memory usage by generating the combinations in batches instead of all at once?
Say I need
nchoosek(1:200,5)
is there a clever way to evaluate say the first 20% of the combinations, do what I need to do with them and then throw them away before I generate the next 20% of the nchoosek(1:200,5) sequence?
Thanks.

采纳的回答

John D'Errico
John D'Errico 2016-9-22
编辑:John D'Errico 2016-9-22
No. Nchoosek is not written in any way to allow you to choose only some reduced subset. In fact, the set of 200 bit binary numbers has a HUGE number of subsets with 5 bits set. I debated showing you how to write such a code, but then you would immediately want to compute combinations of 6 or 7 or 50 of the elements from that set in batches, and you would very quickly overwhelm even a batched version.
The set of combinations that you want to generate gets HUGE, and does so very quickly.
I'll relent just a tiny bit though. The set that you want to generate is simple enough to compute.
Consider if bit 1 is set. Then you need to generate all sets of 199 bits, with exactly 4 of them chosen. You already know how to do this. If bit 1is not set, then you need to generate all possible subsets of 5 bits, choosing from 199 bits. So you can do the whole thing recursively, until the sets become small enough to generate.
  2 个评论
Peta
Peta 2016-9-22
Isn’t the total number of combinations just:
200! / (5! (200 - 5)!) which would roughly be 2 535 650 040?
Under 3 billion in other words. A pathetically small, almost negligible, number compared to say for example the U.S. national debt etc.?
Surely this is something that a computer in the year 2016 should be able to do without even breaking a sweat?
John D'Errico
John D'Errico 2016-9-22
编辑:John D'Errico 2016-9-22
Sigh. People think their computers are infinitely large and infinitely fast. TV and the movies spoil them.
It is true that
nchoosek(200,5)
ans =
2.5357e+09
So 2.5 billion combinations.
But each such combination generates one row of a 2.5 billion by 5 matrix.
So there are
nchoosek(200,5)*5
ans =
1.2678e+10
elements in the array you are trying to generate. But worse, each of those elements is a double. So 8 bytes per element.
nchoosek(200,5)*5*8
ans =
1.0143e+11
So roughly 100 gigabytes of RAM to store. Usually more that that, since every operation you do with an array sometimes forces the allocation of temporary memory to store some modification of that array. You should always try to keep several times as much memory available as your largest array to be safe, and even then, you are usually pushing your luck, because memory becomes easily fragmented, preventing huge arrays to be stored.
So do you have several hundred gigabytes of addressable (AND CONTIGUOUS) RAM on your machine? Remember that if MATLAB needs to go deeply into virtual memory here, using your hard disk, expect things to get REALLY SLOW, even if it does not just give up for lack of memory.
A = nchoosek(1:50,5);
B = nchoosek(uint8(1:50),5);
whos A B
Name Size Bytes Class Attributes
A 2118760x5 84750400 double
B 2118760x5 10593800 uint8
So you can save just a wee bit by using uint8, since a uint8 can store the integers 1:200. That might just make your current problem solvable. But tomorrow you will be back again, wailing about how to compute nchoosek(uint8(1:200),6), and why do you run out of memory.

请先登录,再进行评论。

更多回答(1 个)

Stephen23
Stephen23 2016-9-22
  1 个评论
Peta
Peta 2016-9-22
Hm, is this really the same thing as nchoosek? It seems the function you linked only calculates the combinations of 2 rows of a matrix.
I don’t see how I would call that function to achieve the same effect as nchoosek

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Loops and Conditional Statements 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by