Method to forecast categorical variable from numerous numerical predictors?

12 次查看(过去 30 天)
Hey!
After few years of using matlab I stumbled upon the mightiest of challenges I have yet faced. Suppose you have access to the following data ( Measurements happen every second, data from 1.1.2017 till 20.6.2017 ):
- Binary data (0 or 1), 0 for normal situation and 1 every time a failure occurs. Currently out of the millions of data points 171 failures have happened. Meaning that the vector consist mostly of 0's and just a few 1's
- Process data (temperature, speed, moisture etc.) from all the processes that I think might cause the failure during production
The problem here is to create a model, or an algorithm, that predicts when failure might happen, and why it happens. So far I have visualized the data to find correlation between failures and process data, removed obvious outliers and tried some feature selection algorithms such as sequentialfs. I Also tried creating some forecasting algorithms. All without any luck or success, and I think I know why:
- Too many process parameters to visually analyze thoroughly.
- The failure might be caused by changes in the past, for example temperature changes in the beginning of the process might cause failure ten seconds later in the end of the production process.
- The failure might be caused by combination of changes in the process parameters. For example moisture changes in the beginning of the process and twenty seconds later by increasing speed might cause failure.
- Knowledge of the process does not help, it is so complicated and the failure might be caused by any of the hundreds of process parameters and their combinations.
What would be the best method to start solving this problem? Naïve Bayes did not work, Neural networks are not for categorical predictions (as far as I know) and hard to interpret. The complexity of the process altogether makes this a very hard puzzle.
I can't share the data.
BR
  1 个评论
Greg Heath
Greg Heath 2017-6-22
You are wrong:
The neural network function PATTERNNET is designed for multiclass classification with output targets that are columns of ones and zeros.
LOGSIG is the output function when the classes are not distinct ( output columns can have more than one "1")
and
SOFTMAX is the output function when the classes are distinct (output columns are those of the unit matrix).
See the documentation
help patternnet
doc patternnet
Hope this helps.
Thank you for formally accepting my answer
Greg

请先登录,再进行评论。

回答(1 个)

Ankita Nargundkar
Ankita Nargundkar 2017-6-22
This webinar might be a good place to start: link
More generally: link

类别

Help CenterFile Exchange 中查找有关 Manage System Data 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by