Positive-Unlabeled Learning Tools

版本 1.0.1.0 (3.9 MB) 作者: Daniel Zeiberg
Tools for estimating class priors in the positive-unlabeled classification setting.
16.0 次下载
更新时间 2023/2/28

AlphaMax

Matlab methods for estimating class priors in the positive-unlabeled classification setting

View Positive-Unlabeled Learning Tools on File Exchange

Pre-requisites

Required Toolboxes

Required Software

  • svm-light - SVM Univariate Transform requires the files present at ~/Documents/svm_light, but can be modified using the SVMlightpath argument in transform_svm

Recommended Toolboxes

  • parallel computing toolbox

Datasets

Datasets are available at Zenodo and should be downloaded to data/uci_ml_datasets

How to run AlphaMax

Jain et al. 2016

The main function for running AlphaMax is runAlphaMax. See below for an example of how to estimate the class priors of a real data set

How to run DistCurve

Zeiberg et al. 2020

The main function for running DistCurve is runDistCurve. See testdistcurve.m for an example of how to use DistCurve to estimate the class priors of a real data set

Example Code

Run Estimators on Pre-processed Data

% Load samples from the UCI gas dataset that have already been transformed
addpath(genpath("."));
load("data/uci_ml_datasets/gas.mat");
XM=ds.instances{1}.optimal.xm;
XC=ds.instances{1}.optimal.xc;
trueClassPrior=sum(ds.instances{1}.yM)/size(ds.instances{1}.yM,1);

% Run AlphaMax
%addpath("alphamax");
path_to_alphamax_estimator = "alphamax/estimators/alphamaxEstimator.mat";
[alphaMax_pred,alphaMax_out] = runAlphaMax(XM,XC,'transform','rt','useEstimatorNet',true,...
     'estimator',path_to_alphamax_estimator);

%Run DistCurve
addpath("distcurve");
path_to_distcurve_estimator = "distcurve/estimator/network.mat";
[distCurve_pred,distCurve_curve, distCurve_aucPU] = runDistCurve(XM,XC,'transform','rt',...
    'estimator',path_to_distcurve_estimator);

disp(strcat("True Class Prior: ",num2str(trueClassPrior),"; AlphaMaxNet Estimate: ",num2str(alphaMax_pred),"; DistCurve Estimate: ",num2str(distCurve_pred)))

Generate dataset from PN Data

addpath(genpath("."));
% Load data from csv files
mat.X = readmatrix('data/example/example_data_pn/X.csv');
mat.y = readmatrix('data/example/example_data_pn/y.csv');
ds = Dataset(mat,"example");
% Generate 1 PU instance from this dataset
ds.buildPUDatasets(1);
% Read Component sample and Mixture sample
XC = ds.instance{1}.optimal.xc;
XM = ds.instance{1}.optimal.xm;
trueClassPrior=sum(ds.instances{1}.yM)/size(ds.instances{1}.yM,1);

Load PU Dataset

addpath(genpath("."));
% Load Data from CSV Files
XC = readmatrix('data/example/example_data_pu/XC.csv');
XM = readmatrix('data/example/example_data_pu/XM.csv');
% Optionally Run Univariate Transforms to reduce data from d-dimensions to 1 dimension
%     Generate Feature matrix X and PU label matrix S
X = [XC;XM];
yC = ones(size(XC,1),1);
yM = zeros(size(XM,1),1);
S = [yC;yM];
transformResults = Dataset.transform_PU_data(X,S);
XC = transformResults.optimal.xc;
XM = transformResults.optimal.xm

Results

Mean Absolute Error on 30 multi-dimensional datasets from UCI Machine Learning Repository

Dataset AlphaMaxNet AlphaMax DistCurve
abalone 0.2666 0.5134 0.3949
activity_recogition_s1 0.3460 0.2800 0.3039
activity_recognition_s2 0.0233 0.8467 0.0233
adult 0.2200 0.1229 0.1634
airfoil 0.2766 0.1393 0.1405
anuran 0.0551 0.1857 0.0623
bank 0.1805 0.0203 0.0426
casp 0.2947 0.0407 0.0991
concrete 0.3540 0.1729 0.1199
covertype 0.0973 0.0157 0.1131
epileptic 0.2314 0.2830 0.2328
gas 0.0444 0.0087 0.0350
h1b 0.0644 0.0356 0.0486
housing 0.2305 0.1116 0.0620
landsat 0.0406 0.0085 0.0569
molecular biology 0.0719 0.0488 0.0335
mushroom 0.0926 0.0231 0.0118
pageblock 0.0856 0.0184 0.0640
parkinsons 0.1164 0.0367 0.0603
pendigit 0.0178 0.0202 0.0331
pima 0.1576 0.1895 0.0810
shuttle 0.1238 0.1572 0.2383
smartphone 0.0270 0.0273 0.0727
spambase 0.1741 0.0422 0.0166
thyroid 0.0377 0.6333 0.0377
transfusion 0.0937 0.1397 0.0688
waveform 0.0513 0.1312 0.0270
waveformnoise 0.0740 0.0588 0.0263
wilt 0.0301 0.3862 0.0332
wine 0.2073 0.1297 0.0874
Overall 0.1362 0.1609 0.0930

Related Repositories

DistCurve Python Implementation

Contact

Daniel Zeiberg - zeiberg.d@northeastern.edu

引用格式

Daniel Zeiberg (2024). Positive-Unlabeled Learning Tools (https://github.com/Dzeiberg/AlphaMax/releases/tag/v1.0.1), GitHub. 检索来源 .

MATLAB 版本兼容性
创建方式 R2022b
兼容任何版本
平台兼容性
Windows macOS Linux
标签 添加标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Transforms

Transforms/SVM

Transforms/utilities

alphamax

alphamax/Algorithms

alphamax/distributions

alphamax/estimators

data

distcurve

distcurve/distanceMetrics

distcurve/estimator

scripts/evaluation

scripts/makeCurveScripts

syntheticDataGeneration

tests

版本 已发布 发行说明
1.0.1.0

See release notes for this release on GitHub: https://github.com/Dzeiberg/AlphaMax/releases/tag/v1.0.1

1.0.0

要查看或报告此来自 GitHub 的附加功能中的问题,请访问其 GitHub 仓库
要查看或报告此来自 GitHub 的附加功能中的问题,请访问其 GitHub 仓库