- What kind of pattern am I looking for?
- Do I expect a particular pattern across all data sets?
- Will a visualization of the data help? (usually yes)
- Etc.
How to find similarities in different data sets ?
4 次查看(过去 30 天)
显示 更早的评论
Hello,
I have multiple data sets of some values over time. I want to find behavior similarities between those data sets or some sort of pattern. I am interested to apply it later for continuous data feed in order to find same behavior in continuous data. Any ideas how it can be done?
Thanks in advance
0 个评论
回答(2 个)
Mike Caldwell
2017-8-14
I think there are questions you might want to explore the answers to first:
If you want to find patterns in data, you may have to analyze it "by hand". Start off by plotting the data and observe the behavior. You could run statistics on it as well.
There is probably a better answer that you are looking for, but I think knowing more detail about what data and patterns you are analyzing will help narrow things down.
-Mike
Walter Roberson
2017-8-14
The theoretical answer is that you need to find the Kolmogorov Complexity of each possible concatenation of the powersets of the datasets, in order to find the subset of the first dataset that most accurately predicts some subset of the second dataset.
That's the theory.
In practical terms, the Kolmogorov Complexity becomes infeasible to compute once you get past about 34 total bits worth of data, and the KC of the powersets is probably going to be useless to compute beyond roughly 16 bits of data.
Your question, as phrased, is asking to find all possible relationships between two datasets. It suffers from horrible combinatorial explosion of possible relationships.
2 个评论
Walter Roberson
2017-8-14
So you have a dataset that, if you ignore a bunch of, has a portion that has a negative slope trend, and if there is somewhere in the second dataset that you could select a section of and then ignore inconvenient values would have a negative slope trend, and you will declare them to be a match no matter what their respective slopes or the respective length of the slope.
Why not just pick an arbitrary starting point in the first data set, and run through selecting only successive points that are each lower than the previous selected points; that will give you a negative slope. Then do the same for the second data set. Then declare yourself done. (There are circumstances under which this approach would pretty much work, even if it sounds pretty useless. In particular, this approach might have application in "envelope matching")
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!