Positive-Unlabeled Learning Tools
AlphaMax
Matlab methods for estimating class priors in the positive-unlabeled classification setting
Pre-requisites
Required Toolboxes
- deep learning toolbox
- optimization toolbox
- matlab-stdlib
Required Software
-
svm-light - SVM Univariate Transform requires the files present at
~/Documents/svm_light
, but can be modified using theSVMlightpath
argument in transform_svm
Recommended Toolboxes
- parallel computing toolbox
Datasets
Datasets are available at Zenodo and should be downloaded to data/uci_ml_datasets
How to run AlphaMax
The main function for running AlphaMax is runAlphaMax. See below for an example of how to estimate the class priors of a real data set
How to run DistCurve
The main function for running DistCurve is runDistCurve. See testdistcurve.m for an example of how to use DistCurve to estimate the class priors of a real data set
Example Code
Run Estimators on Pre-processed Data
% Load samples from the UCI gas dataset that have already been transformed
addpath(genpath("."));
load("data/uci_ml_datasets/gas.mat");
XM=ds.instances{1}.optimal.xm;
XC=ds.instances{1}.optimal.xc;
trueClassPrior=sum(ds.instances{1}.yM)/size(ds.instances{1}.yM,1);
% Run AlphaMax
%addpath("alphamax");
path_to_alphamax_estimator = "alphamax/estimators/alphamaxEstimator.mat";
[alphaMax_pred,alphaMax_out] = runAlphaMax(XM,XC,'transform','rt','useEstimatorNet',true,...
'estimator',path_to_alphamax_estimator);
%Run DistCurve
addpath("distcurve");
path_to_distcurve_estimator = "distcurve/estimator/network.mat";
[distCurve_pred,distCurve_curve, distCurve_aucPU] = runDistCurve(XM,XC,'transform','rt',...
'estimator',path_to_distcurve_estimator);
disp(strcat("True Class Prior: ",num2str(trueClassPrior),"; AlphaMaxNet Estimate: ",num2str(alphaMax_pred),"; DistCurve Estimate: ",num2str(distCurve_pred)))
Generate dataset from PN Data
addpath(genpath("."));
% Load data from csv files
mat.X = readmatrix('data/example/example_data_pn/X.csv');
mat.y = readmatrix('data/example/example_data_pn/y.csv');
ds = Dataset(mat,"example");
% Generate 1 PU instance from this dataset
ds.buildPUDatasets(1);
% Read Component sample and Mixture sample
XC = ds.instance{1}.optimal.xc;
XM = ds.instance{1}.optimal.xm;
trueClassPrior=sum(ds.instances{1}.yM)/size(ds.instances{1}.yM,1);
Load PU Dataset
addpath(genpath("."));
% Load Data from CSV Files
XC = readmatrix('data/example/example_data_pu/XC.csv');
XM = readmatrix('data/example/example_data_pu/XM.csv');
% Optionally Run Univariate Transforms to reduce data from d-dimensions to 1 dimension
% Generate Feature matrix X and PU label matrix S
X = [XC;XM];
yC = ones(size(XC,1),1);
yM = zeros(size(XM,1),1);
S = [yC;yM];
transformResults = Dataset.transform_PU_data(X,S);
XC = transformResults.optimal.xc;
XM = transformResults.optimal.xm
Results
Mean Absolute Error on 30 multi-dimensional datasets from UCI Machine Learning Repository
Dataset | AlphaMaxNet | AlphaMax | DistCurve |
---|---|---|---|
abalone | 0.2666 | 0.5134 | 0.3949 |
activity_recogition_s1 | 0.3460 | 0.2800 | 0.3039 |
activity_recognition_s2 | 0.0233 | 0.8467 | 0.0233 |
adult | 0.2200 | 0.1229 | 0.1634 |
airfoil | 0.2766 | 0.1393 | 0.1405 |
anuran | 0.0551 | 0.1857 | 0.0623 |
bank | 0.1805 | 0.0203 | 0.0426 |
casp | 0.2947 | 0.0407 | 0.0991 |
concrete | 0.3540 | 0.1729 | 0.1199 |
covertype | 0.0973 | 0.0157 | 0.1131 |
epileptic | 0.2314 | 0.2830 | 0.2328 |
gas | 0.0444 | 0.0087 | 0.0350 |
h1b | 0.0644 | 0.0356 | 0.0486 |
housing | 0.2305 | 0.1116 | 0.0620 |
landsat | 0.0406 | 0.0085 | 0.0569 |
molecular biology | 0.0719 | 0.0488 | 0.0335 |
mushroom | 0.0926 | 0.0231 | 0.0118 |
pageblock | 0.0856 | 0.0184 | 0.0640 |
parkinsons | 0.1164 | 0.0367 | 0.0603 |
pendigit | 0.0178 | 0.0202 | 0.0331 |
pima | 0.1576 | 0.1895 | 0.0810 |
shuttle | 0.1238 | 0.1572 | 0.2383 |
smartphone | 0.0270 | 0.0273 | 0.0727 |
spambase | 0.1741 | 0.0422 | 0.0166 |
thyroid | 0.0377 | 0.6333 | 0.0377 |
transfusion | 0.0937 | 0.1397 | 0.0688 |
waveform | 0.0513 | 0.1312 | 0.0270 |
waveformnoise | 0.0740 | 0.0588 | 0.0263 |
wilt | 0.0301 | 0.3862 | 0.0332 |
wine | 0.2073 | 0.1297 | 0.0874 |
Overall | 0.1362 | 0.1609 | 0.0930 |
Related Repositories
DistCurve Python Implementation
Contact
Daniel Zeiberg - zeiberg.d@northeastern.edu
引用格式
Daniel Zeiberg (2024). Positive-Unlabeled Learning Tools (https://github.com/Dzeiberg/AlphaMax/releases/tag/v1.0.1), GitHub. 检索来源 .
MATLAB 版本兼容性
平台兼容性
Windows macOS Linux标签
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Transforms
Transforms/SVM
Transforms/utilities
alphamax
alphamax/Algorithms
alphamax/distributions
alphamax/estimators
data
distcurve
distcurve/distanceMetrics
distcurve/estimator
scripts/evaluation
scripts/makeCurveScripts
syntheticDataGeneration
tests
版本 | 已发布 | 发行说明 | |
---|---|---|---|
1.0.1.0 | See release notes for this release on GitHub: https://github.com/Dzeiberg/AlphaMax/releases/tag/v1.0.1 |
||
1.0.0 |