How to find similarities in different data sets ?

Question

Edgaras Kvedaravicius 2017-8-13

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/352617-how-to-find-similarities-in-different-data-sets

评论： Walter Roberson 2017-8-14

Hello,

I have multiple data sets of some values over time. I want to find behavior similarities between those data sets or some sort of pattern. I am interested to apply it later for continuous data feed in order to find same behavior in continuous data. Any ideas how it can be done?

Thanks in advance

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Mike Caldwell 2017-8-14

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/352617-how-to-find-similarities-in-different-data-sets#answer_277860

I think there are questions you might want to explore the answers to first:

What kind of pattern am I looking for?
Do I expect a particular pattern across all data sets?
Will a visualization of the data help? (usually yes)
Etc.

If you want to find patterns in data, you may have to analyze it "by hand". Start off by plotting the data and observe the behavior. You could run statistics on it as well.

There is probably a better answer that you are looking for, but I think knowing more detail about what data and patterns you are analyzing will help narrow things down.

-Mike

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Edgaras Kvedaravicius 2017-8-14

Hi, Thank you for your answer. I have plotted the data sets and I know exactly what kind of behavior I am looking for. The thing is that the data usually has many fluctuations but you can see with naked eye that it has negative slope, and that is exactly what I am looking for in recognition. Behavior is similar in every case but not exact, therefore I am looking for common solution that could recognize this behavior.

请先登录，再进行评论。

Answer 2

Walter Roberson 2017-8-14

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/352617-how-to-find-similarities-in-different-data-sets#answer_277862

The theoretical answer is that you need to find the Kolmogorov Complexity of each possible concatenation of the powersets of the datasets, in order to find the subset of the first dataset that most accurately predicts some subset of the second dataset.

That's the theory.

In practical terms, the Kolmogorov Complexity becomes infeasible to compute once you get past about 34 total bits worth of data, and the KC of the powersets is probably going to be useless to compute beyond roughly 16 bits of data.

Your question, as phrased, is asking to find all possible relationships between two datasets. It suffers from horrible combinatorial explosion of possible relationships.

2 个评论
显示无隐藏无

Edgaras Kvedaravicius 2017-8-14

Hi, Thank you for your answer. I have plotted the data sets and I know exactly what kind of behavior I am looking for. The thing is that the data usually has many fluctuations but you can see with naked eye that it has negative slope, and that is exactly what I am looking for in recognition. Behavior is similar in every case but not exact, therefore I am looking for common solution that could recognize this behavior.

Walter Roberson 2017-8-14

So you have a dataset that, if you ignore a bunch of, has a portion that has a negative slope trend, and if there is somewhere in the second dataset that you could select a section of and then ignore inconvenient values would have a negative slope trend, and you will declare them to be a match no matter what their respective slopes or the respective length of the slope.

Why not just pick an arbitrary starting point in the first data set, and run through selecting only successive points that are each lower than the previous selected points; that will give you a negative slope. Then do the same for the second data set. Then declare yourself done. (There are circumstances under which this approach would pretty much work, even if it sounds pretty useless. In particular, this approach might have application in "envelope matching")

请先登录，再进行评论。

How to find similarities in different data sets ?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（2 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

2 个评论
显示无隐藏无

另请参阅

类别

标签

Community Treasure Hunt

How to find similarities in different data sets ?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（2 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

2 个评论 显示 无隐藏 无

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

2 个评论
显示无隐藏无