Take categorical data matrix and transform whole matrix to binary sparse 1ofM matrix, keeping track of what came from where. Ideal for any form of count-based probabilistic analysis.
Typically used in a chain following loadcell.m and celltonumeric.m
datato1ofm - recast data in 1 of M format, maintaining multinomial info.
function [newdata, attrmap] = datato1ofm( data );
DATA is the complete dataset. It is presumed that all the possible states are represented in the dataset. If not the data should be augmented with dummy data so that this is the case. Each column of DATA corresponds to a different attribute, and each row a different data item. DATA must be numeric.
NEWDATA is a sparse real-binary 1 of M dataset. All attributes are one of M encoded, including previous binary attributes. The split of these previously binary attributes can be removed trivially: see below.
ATTRMAP gives the attribute mapping information. ATTRMAP(1,k) gives the original atribute number for the kth new attribute. ATTRMAP(2,k) gives the value of the original attribute indicated by the kth new attribute. ATTRMAP(3,k) indicates how many elements the kth new attribute is one of.
To remove 1 of M encoding for previously binary attributes use
ii = find(~(attrmap(2,:)==1 & attrmap(3,:)==2));
newdata = newdata(:,ii); attrmap = attrmap(:,ii);
To compute multinomial probabilities (simply but inefficiently) use
normmatrix = sparse([1:size(attrmap,2)],attrmap(1,:),1);
normmatrix = normmatrix*normmatrix';
probs = mean(newdata)./(mean(newdata)*normmatrix);
See loadcell, celltonumeric
引用格式
Amos Storkey (2024). datato1ofm.m (https://www.mathworks.com/matlabcentral/fileexchange/26368-datato1ofm-m), MATLAB Central File Exchange. 检索时间: .
MATLAB 版本兼容性
平台兼容性
Windows macOS Linux类别
标签
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!版本 | 已发布 | 发行说明 | |
---|---|---|---|
1.0.0.0 |