Violinplot extending beyond data range

Hello everyone,
I’m using the violinplot function in MATLAB to create violin plots for some datasets. I am specifying the position and the data as follows:
violinplot(3, data2(5:end));
However, I’ve encountered an issue. The violin plot extends to negative values even though all my data values are positive. For another dataset, I observed a similar problem: the violin plot includes values that are negative or larger than the maximum values in my data.
I’ve read that this might be caused by the kernel density estimation (KDE) method used by violinplot to calculate and visualize the data's probability density. KDE smooths the data distribution and can sometimes produce density values outside the actual range of the data.
I’m unsure how to resolve this issue and would greatly appreciate any advice or suggestions.
Thank you!
Angie

 采纳的回答

[Edit: add ylim() so that all 3 plots have same y-axis range.]
You can vary the bandwidth, or the kernel function, or both. In the examples below, the data are uniformly distributed on (0,1), which is kind of a worst case, if you don't want the violin to extend to negative values. The violins do extend beyond the data in the examples below, but the options control by how much it extends. Experiment to see if you like the results. You may not be able to avoid the violin going negative, depending on your data.
ydata = rand(100,1);
figure;
%
subplot(131)
violinplot(ydata);
title('Default Violinplot'); ylim([-.5,1.5])
%
[f1,xf1] = kde(ydata,Bandwidth=0.05);
subplot(132)
violinplot(EvaluationPoints=xf1,DensityValues=f1)
title('Bandwidth=0.05'); ylim([-.5,1.5])
%
[f2,xf2] = kde(ydata,Kernel="box");
subplot(133)
violinplot(EvaluationPoints=xf2,DensityValues=f2)
title('Box Kernel'); ylim([-.5,1.5])

4 个评论

thank you! i tried to change the bandwidth but still the violins extend to values beyond my dataset.
@Angie, you're welcome. Matlab's violinplot uses a kernel distibribution by default. I showed in my example how you can vary the kernel smoothing function, which gives you some control of the violinplot appearance. But, no matter which kernel smoothing function you choose, the pdf obtained with a kernel distribution will extend beyond the most extreme data points in your dataset. If this is unacceptable to you, then you can fit your data with a different distribution. You can display the fitted pdf with violinplot. It might not look like a violin.
Example 1: Consider 50 random data points uniformly distributed between 1 and 10. Fit the data with a normal distribution. Then truncate the fitted distribution at 0 and at 1.5 times the maximum data point. Display the truncated distribution with violinplot.
x=1+9*rand(50,1); % 50 points, ~U(1,10)
pd1=fitdist(x,'Normal'); % fit normal distribution to data
pd2=truncate(pd1,0,1.5*max(x)); % truncated normal PDF
% Make violinplot using the truncated normal distribution
evalPts=linspace(0,1.5*max(x),100);
densVals=pdf(pd2,evalPts);
violinplot(EvaluationPoints=evalPts,DensityValues=densVals)
Example 2: Consider 50 random data points, half-normally distributed with mu=0 and sigma=1. Fit the data with a half-normal distribution. Display the values of fitted mu and sigma. Display the fitted distribution with violinplot, extending to 4*best-fit-sigma.
pd=makedist('HalfNormal','mu',0,'sigma',1); % half-normal distrib
x=random(pd,50,1); % 50 randoms, ~pd
% Fit the data
pdFit=fitdist(x,'HalfNormal'); % best fit half-normal distribu
% Display best-fit values of mu, sigma
fprintf('Half normal distribution fit: mu=%.2f, sigma=%.2f.\n',...
pdFit.mu,pdFit.sigma)
Half normal distribution fit: mu=0.00, sigma=1.10.
% Make violinplot
evalPts=linspace(pdFit.mu,pdFit.mu+4*pdFit.sigma,100);
densVals=pdf(pdFit,evalPts);
violinplot(EvaluationPoints=evalPts,DensityValues=densVals)
Good luck.
Thank you very much! As a pdf obtained with a kernel distribution extends beyond the most extreme data points in my dataset, which is something I want to avoid, I was considering using other distributions instead. Your examples have been very helpful.

请先登录,再进行评论。

更多回答(0 个)

类别

帮助中心File Exchange 中查找有关 Data Distribution Plots 的更多信息

产品

版本

R2024b

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by