How to find similarities in different data sets ?

4 次查看(过去 30 天)
Hello,
I have multiple data sets of some values over time. I want to find behavior similarities between those data sets or some sort of pattern. I am interested to apply it later for continuous data feed in order to find same behavior in continuous data. Any ideas how it can be done?
Thanks in advance

回答(2 个)

Mike Caldwell
Mike Caldwell 2017-8-14
I think there are questions you might want to explore the answers to first:
  • What kind of pattern am I looking for?
  • Do I expect a particular pattern across all data sets?
  • Will a visualization of the data help? (usually yes)
  • Etc.
If you want to find patterns in data, you may have to analyze it "by hand". Start off by plotting the data and observe the behavior. You could run statistics on it as well.
There is probably a better answer that you are looking for, but I think knowing more detail about what data and patterns you are analyzing will help narrow things down.
-Mike
  1 个评论
Edgaras Kvedaravicius
Hi, Thank you for your answer. I have plotted the data sets and I know exactly what kind of behavior I am looking for. The thing is that the data usually has many fluctuations but you can see with naked eye that it has negative slope, and that is exactly what I am looking for in recognition. Behavior is similar in every case but not exact, therefore I am looking for common solution that could recognize this behavior.

请先登录,再进行评论。


Walter Roberson
Walter Roberson 2017-8-14
The theoretical answer is that you need to find the Kolmogorov Complexity of each possible concatenation of the powersets of the datasets, in order to find the subset of the first dataset that most accurately predicts some subset of the second dataset.
That's the theory.
In practical terms, the Kolmogorov Complexity becomes infeasible to compute once you get past about 34 total bits worth of data, and the KC of the powersets is probably going to be useless to compute beyond roughly 16 bits of data.
Your question, as phrased, is asking to find all possible relationships between two datasets. It suffers from horrible combinatorial explosion of possible relationships.
  2 个评论
Edgaras Kvedaravicius
Hi, Thank you for your answer. I have plotted the data sets and I know exactly what kind of behavior I am looking for. The thing is that the data usually has many fluctuations but you can see with naked eye that it has negative slope, and that is exactly what I am looking for in recognition. Behavior is similar in every case but not exact, therefore I am looking for common solution that could recognize this behavior.
Walter Roberson
Walter Roberson 2017-8-14
So you have a dataset that, if you ignore a bunch of, has a portion that has a negative slope trend, and if there is somewhere in the second dataset that you could select a section of and then ignore inconvenient values would have a negative slope trend, and you will declare them to be a match no matter what their respective slopes or the respective length of the slope.
Why not just pick an arbitrary starting point in the first data set, and run through selecting only successive points that are each lower than the previous selected points; that will give you a negative slope. Then do the same for the second data set. Then declare yourself done. (There are circumstances under which this approach would pretty much work, even if it sounds pretty useless. In particular, this approach might have application in "envelope matching")

请先登录,再进行评论。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by