Finding the first 1 in a binary set using optimization
4 次查看(过去 30 天)
显示 更早的评论
Hello everybody,
Assume that a binary set V is given and the task is to find the first 1 in V. Note that
1) V is not sorted (so, we cannot benifit from binary search)
2) Although V is binary it is not known beforehand. It would be computationally expensive to tell
whether an element of V is 0 or 1. If I would know V then I could simple use the command find(V,1)
Therefore, I need to search efficiently. It seems that optimization is one efficient way to solve this problem. But,
unfortunately, it seems that the underlying optimization problem is not that easy to tackle. For instance, In
bellow, you see an example where I used genetic algorithm (I used this algorithm since it can tackle integer optimization problems)
clc;
A=[zeros(1,10^4) 1 zeros(1,100) 1 1 zeros(1,10^3)];
cost=@(i) i+10^8*(1-A(i)); % Here, in theory instead of 10^8, we should use 'infinity'
options = optimoptions('ga','Display','iter');
[x,f] = ga(cost,1,[],[],[],[],1,length(A),[],1,options)
Unfortunately, it often always fails to find the true solution which is x = 10^4+1. Rather it finds x = 1 as the
solution which is clearly wrong.
Any idea?
Thanks for your kind help in advance!
Babak
3 个评论
Sam Chak
2023-8-27
编辑:Sam Chak
2023-8-27
I'm attempting to visualize your situation as the quest for searching the first Calculus book relative to the first call number order at the George Peabody Library. Ordinarily, you would peruse the library's catalogue, and after pinpointing the book, you would employ the call number to find it.
Let's assume that the library's homepage is inaccessible, and the digital catalogue is unavailable as well. Consequently, you engage with multiple librarians to aid you in physically locating the book. Is the metaheuristic optimization approach you're utilizing comparable to this scenario?
Image by Matthew Petroff / CC BY-SA.
采纳的回答
Walter Roberson
2023-8-27
Imagine that you had a case where location P is set, that all positive integer multiples of P were also set. Then suppose you examine location #60. If it is not set then you can rule out locations 1, 2, 3, 4, 5, 6, 10, 15, 20, 30 -- so in this hypothetical situation in which information about the value of one V entry could give you definitive information about the value of an earlier V entry, then there would be the potential for strategies that are better than linear search.
However, in your description about your matrices, it is not obvious to us that information about a particular V entry gives you definite information about any other V entry.
Let us think for the moment about the possibility that knowing (say) V(10) is 1 gives you the probabilistic result that V(1) has only (say) 1e-10 probability of being 1. Can you skip examining V(1) in that case? No! If your accumulated information so far gave V(1) only a 1 chance in 2^64 of being 1, then you would still need to examine V(1), because the task is not to find a "good approximation" or a "bounded cost best-effort" solution, but is instead to find the first entry in V with value 1. Knowing a probability distribution might help guide you in choosing upper bounds, but as long as there is any remaining probability in the first entry not yet examined, you still need to examine it.
Now let us consider the case where examining an entry cannot rule out an earlier entry. Suppose you have examined V(4) and found that it is 1, but you have not examine V(1) yet. Can you stop? There is nothing obvious to us in your description of your task that would allow you to stop without examining V(1): as best the volunteers have been able to make out, even knowing V(2) is 1 would not allow you to rule out V(1) being 1. So you always need to examine V(1) as best we can tell. There is a 100% probability that you will need to examine V(1) . If you were to examine V(2) before V(1) and found the value of V(2), would you be able to skip V(1) ? Not that we can tell: if V(2) == 1 then as far as well can see you would still need to examine V(1) . But examining V(2) first has a cost, and if V(1) turns out to be 1, you would have wasted the effort of examing V(2) first.
Likewise, suppose you examine V(3) first and determine it is 1. You examine V(1) and find it is 0. Can you skip examining V(2) and assume that V(3) is the first? It is not obvious to any of us that you could do that: as far as we can tell, you would still have to examine V(2), and if V(2) is 1 then the cost of examining V(3) first would have been wasted.
Yes, if you first examine V(10) and then examine V(5) and determine V(5) is 1 then you can skip V(6) through V(9) -- but you would have wasted the cost of evaluating V(10) first, and would have been better off examining V(5) first.
In any situation in which examining a latter entry cannot rule out an earlier entry, then by induction you can show that the most cost effective method is to examine V(1) then V(2) then V(3) and so on. A linear search, in other words. There is a 100% chance you will need to evaluate V(1). If V(1) == 0 then there is a 100% chance you need to examine V(2) . If V(2) == 0 there is a 100% chance you will need to evaluate V(3)... and so on.
更多回答(3 个)
Image Analyst
2023-8-27
Why not simply do
V=[zeros(1,10^4) 1 zeros(1,100) 1 1 zeros(1,10^3)];
index = find(V == 1, 1, 'first')
???
6 个评论
Image Analyst
2023-8-27
Not really. Is there any difference between A and V? And if you want to somehow access A(27), like assign it to a new variable, that is virtually instant. Even generating a logical vector telling if A or V is exactly equal to 1 is virtually instant (unless A or V is hundreds of millions of elements long).
But it looks like your responses to @Bruno Luong that you don't really want find() regardless because you are trying to learn/use optimization, like a genetic algorithm or something: "I want to tackle it (via optimization) " and so are not interested in using standard, traditional functions.
Bruno Luong
2023-8-27
编辑:Bruno Luong
2023-8-27
You ask the same question several time. Without any other a priori knowledge, scan your array until 1 is meet.
There is no miracle algorithm to find it since there is not other way to guess where 1 first appears.
6 个评论
John D'Errico
2023-8-27
编辑:John D'Errico
2023-8-27
If V is given, then find(V,1) is perfect, and it will not be improved upon. And you say that V is given! So what is your question?
MAYBE, MAYBE, you mean that V is given, but unknown in a sense, that you cannot know the value of V unless you interrogate that element? That seems consistent with your comments.
Now, your goal is to somehow magically use optimization to find that first element. The problem is, suppose you sample element 100, this V(100), and you find that is is a 1. Is that the FIRST element of V that is 1? You can never know. And nothing will tell you that is the case, unless you evaluate EVERY element that precedes that element. This tells me if your goal is to find the FIRST element of this unknown binary vector that is a 1, then you need to simply start at the beginning, and test every element. Optimization cannot help you, since the unit elements of V are presumed to be random as far as we know. Optimization is meaningless in this case. I'm sorry, but it is.
Ok, is there anything we can do? Suppose V has length 1024. Evaluate V at some intermediate position., say V(512). If V(512) is a 1, then you know that the first element that is a 1 must lie no further than element 512. But if V(512)==0, then you know nothing. Again, this suggests your best scheme is to simply start at the very begnning.
It feels like it might help if we know the probability that any element of V is a 1. Now you could use probability theory to know the distribution of the first non-zero element, if any element has probability P of being a 1. That could tell you where to start looking, but does it really help? NO. In the end, you will always need to test the first element of V, and if is zero, then test the second element, and so on, because knowing if a later element of V is 1 does not tell you anything.
I'm sorry, but if you absolutely need to know the first element of V, then you need to start your search at element 1, and then proceed. This is the only way to know if you have found the first 1.
4 个评论
John D'Errico
2023-8-27
编辑:John D'Errico
2023-8-27
No. It does not at all justify your claim. Not even remotely.
An optimization will evaluate the vector V at multiple points. That you don't see that happening is irrelevant. You are not a little child, where if you don't see something happen in front of your nose, it does not happen. Function evals called by GA are no less costly than function calls in a loop, but in fact more costly, since you now have the overhead of GA.
A simple loop would have been more efficient, and the ONLY possible way to know that you have found the first true element in V.
An optimizer like GA will not know that no earlier element is 1, UNLESS IT TESTS ALL PREVIOUS VALUES. GA stops sometimes early, because it got lucky. And as you have seen, often GA will get it wrong. If you are looking for a probabilistic scheme, that MIGHT be different. But even then, I see no better scheme than to start at the beginning.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Surrogate Optimization 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!