High dimensional clustering input importance

Beaver

2013 9 16

0 个回答

3 次查看（30 天）

0 个投票

Hello,

I am venturing in to new territory and thought I would seek a little guidance.

I am looking at data retroactively to try to determine input importance relative to a known output. Lets say I have X input parameters, I am trying to determine a range to individually filter input parameters such that

for i=1:n Y(i) = find (x(i) > X(i)min && x(i) < X(i)max ) end

whereby Yi:Yn maximizes the number of X input parameters relative to a classifcation (true or false).

In perhaps more simple to communicate terms. I have marketing survey data for 1000 individuals that involves 10 questions that are bound to a range -100 to 100. Assume that 100 individuals answered 'Yes', and another 100 individuals answered 'No',* I am trying to find a range for answers to the 10 questions that is most likely to produce a yes or a no.* I then want to use this range to filter out current data to target a search.

I am considering kmeans clustering to find the largest cluster groups and looking at the distribution of inputs to determine a range. Another thought was SOFM to get a map and then look at the neurons with the most hits and then also implement a distribution of inputs to determine a range.

Thanks Very much for any feedback.

Beav