DM Utils (data mining utils)

版本 1.9.0.0 (8.7 KB) 作者: Przemyslaw
The tools for dealing with distance matrix, improving data mining capabilities
819.0 次下载
更新时间 2016/6/21

查看许可证

Utility m-files to improve distance matrix usage and computation:
- parallel distance matrix computation (pair_dist_par),
- parallel distance computation using shared memory model capable of maintaining really large matrices (pair_dist_spmd) - requires sharedmatrix to be compiled and in path,
- function for a comfortable referring to a vector form of a distance matrix (pseudo_squareform).
The out=pair_dist_par(X,fun,parameters) works somewhat similar to standard pdist. It requires Parallel Computing Toolbox (PCT) but the difference is that distance function (fun) works in row-vs-row manner (instead row-vs-submatrix in pdist) and it is a handler to a distance function of a form d=fun(x,y,params), where: params is a variable list of parameters, x and y are single row vectors it is DIFFERENT to the pdist.
One can provide or a handler to own distance function or a name of a built-in distance function, or (which is better due to performance) to use additional external function name2fun to get an anonymous function handler.
If there is no PCT or trouble with getting a workers pool then pair_dist_seq is used.
The function pair_dist_spmd it is intended for rather large distance matrices.
The calculation is done using SPMD model with shared memory (SM)
as IPC, therefore, it requires to have in path J. Dillon's sharedmatrix (or win SharedMemory). The function returns a handler to sharedmemory segment containing
results. The cache hit-ratio is improved by the interleaving.
To use the results You need to attach to the shared variableby the returned handler.

out = sharedmatrix('attach', out_hdl);
or
out = SharedMemory('attach', out_hdl);

remember also to remove unused variable (there is no grabage collector)
sharedmatrix('detach', out_hdl, out);
sharedmatrix('free', out_hdl);

The pseudo_squareform function allows you to get an access to a distance matrix as if it were in a square form without a need to convert it. Classic index operators allowed (coordinates, whole columns/rows or ranges with minor discrepancy for a colon operator).

Further improvements will come...

P.S.
Due to cumbersome compilation of sharedmatrix win32 and lin64 mex files are also included.
P.P.S.
I have been ordered by Mathworks staff to exclude mexfiles from the contribution. If anyone is interested please contact me i can provide win32 and lin64 versions of shared matrix.
P.P.P.S.
If You find the work useful pleae consider to cite our paper which we (at last) published.
P. Skurowski1 and M. Staniszewski Parallel distance matrix computation for Matlab data mining. AIP Conf. Proc. 1738, 070004 (2016), http://dx.doi.org/10.1063/1.4951835

引用格式

Przemyslaw (2024). DM Utils (data mining utils) (https://www.mathworks.com/matlabcentral/fileexchange/34598-dm-utils-data-mining-utils), MATLAB Central File Exchange. 检索时间: .

MATLAB 版本兼容性
创建方式 R2009a
兼容任何版本
平台兼容性
Windows macOS Linux
类别
Help CenterMATLAB Answers 中查找有关 Parallel Computing 的更多信息
致谢

参考作品: sharedmatrix

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
版本 已发布 发行说明
1.9.0.0

some typos corrected
lint to the paper describing the pair_dist_par

1.8.0.0

updated with spmd (and excluded mex files from submission as required by mtalabcentral staff)

1.7.0.0

a file correction

1.2.0.0

Shared memory function included

1.0.0.0