How does MATLAB generate the probability density function?

5 次查看(过去 30 天)
I am using the pdf function (https://nl.mathworks.com/help/stats/prob.normaldistribution.pdf.html) to just visuilize the pdf of the t statistic for teaching purposes. I am wondering how MATLAB generates this continous distrubtion. That is, this is the limiting distrbituion, so MATLAB should be taking many many random draws from the t distrubtion so that the frequency distribtuon converges to the pdf as the limiting distribtuon and looks all smooth, i.e. it does not look at all like a frequency distribution. But then how come that this happens so fast that I get the pdf immedaitly when I call the pdf function? Or is this not how MATLAB generates the pdf?

采纳的回答

Umar
Umar 2024-9-25

Hi @Snoopy,

You mentioned, “I am using the pdf function (https://nl.mathworks.com/help/stats/prob.normaldistribution.pdf.html) to just visuilize the pdf of the t statistic for teaching purposes. I am wondering how MATLAB generates this continous distrubtion. That is, this is the limiting distrbituion, so MATLAB should be taking many many random draws from the t distrubtion so that the frequency distribtuon converges to the pdf as the limiting distribtuon and looks all smooth, i.e. it does not look at all like a frequency distribution. But then how come that this happens so fast that I get the pdf immedaitly when I call the pdf function? Or is this not how MATLAB generates the pdf?”

Please see my response to your comments below.

To address your question after reviewing the mathworks documentation regarding pdf function, it is essential to understand the underlying mechanics of how MATLAB computes the PDF and the distinction between theoretical distributions and empirical frequency distributions. You need to understand that MATLAB’s pdf function computes the PDF of the t-distribution directly using mathematical formulas rather than relying on random sampling or empirical frequency distributions. When you invoke the pdf function with a specified distribution and parameters, MATLAB calculates the PDF at those points based on predefined characteristics of the t-distribution. Let me explain the key distinctions below.

_ Theoretical PDF vs. Empirical Distribution:_ The theoretical PDF represents a continuous model that defines how probabilities are distributed over a range of values for a given distribution. In the case of the t-distribution, this is derived from its underlying mathematical properties while an empirical frequency distribution is constructed from sampled data and reflects observed outcomes. It may take time to converge to a smooth curve as more samples are drawn.

_ Immediate Computation:_ The reason you can obtain the PDF almost instantaneously when calling the pdf function is that it evaluates the formula for each point you specify in your input array (e.g., x).This avoids the need for generating random samples or waiting for convergence, resulting in quick and efficient computation.

Please see provided code snippets below which effectively illustrate both methods, calculating the theoretical PDF and generating an empirical histogram:

Plotting Theoretical PDF

   % Define degrees of freedom
   df = 5;
   % Define x values range
   x = -5:0.1:5;
   % Calculate PDF using MATLAB's pdf function
   y = pdf('t', x, df);
   % Plotting
   figure;
   plot(x, y, 'LineWidth', 2);
   title('PDF of the t-Distribution (df = 5)');
   xlabel('t');
   ylabel('Probability Density');
   grid on;

Visualizing Empirical Distribution

   % Number of random samples
   n = 10000;
   % Generate random samples
   samples = trnd(df, n, 1);
   % Create histogram
   figure;
   histogram(samples, 'Normalization', 'pdf', 'BinWidth', 0.5);
   hold on;
   % Overlay theoretical PDF
   plot(x, y, 'r', 'LineWidth', 2);
   title('Empirical vs Theoretical PDF of the t-Distribution (df = 5)');
   xlabel('t');
   ylabel('Probability Density');
   legend('Empirical PDF', 'Theoretical PDF');
   grid on;

Please see attached.

So, in this second example, you can see random samples are generated from a t-distribution and created a histogram to visualize how these samples approximate the theoretical PDF as sample size increases. As you increase the number of samples in your empirical approach (e.g., from 10,000 to larger sizes), you will notice that your histogram increasingly resembles the smooth curve of the theoretical PDF. This showcases the Law of Large Numbers in action. So, understanding both methods is crucial in statistical analysis, whether you're modeling data using theoretical distributions or validating models through empirical evidence.

By grasping these concepts, you can better appreciate how tools like MATLAB streamline complex statistical computations while also understanding their foundational principles. Also, if you pay attention to @Voss comments, his approach to comparing PDF calculations is insightful and emphasizes the importance of validating statistical computations. He did excellent job demonstrating that both methods yield identical results, underscoring the robustness of statistical tools like MATLAB while also fostering an understanding of underlying mathematical principles.

If you have further questions or need more examples from us, feel free to ask!

更多回答(1 个)

Voss
Voss 2024-9-24
PDFs are calculated from their definitions. They are not simulated.
Example, calculating a standard normal PDF and comparing against what MATLAB's pdf function returns:
mu = 0;
sigma = 1;
x = linspace(-2,2,100);
y_builtin = pdf('Normal',x,mu,sigma)
y_builtin = 1×100
0.0540 0.0585 0.0633 0.0683 0.0736 0.0792 0.0851 0.0913 0.0978 0.1046 0.1116 0.1190 0.1266 0.1345 0.1426 0.1510 0.1596 0.1685 0.1775 0.1867 0.1961 0.2056 0.2152 0.2249 0.2346 0.2444 0.2542 0.2639 0.2736 0.2831
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
y_manual = exp(-(x-mu).^2/(2*sigma^2))/(sigma*sqrt(2*pi))
y_manual = 1×100
0.0540 0.0585 0.0633 0.0683 0.0736 0.0792 0.0851 0.0913 0.0978 0.1046 0.1116 0.1190 0.1266 0.1345 0.1426 0.1510 0.1596 0.1685 0.1775 0.1867 0.1961 0.2056 0.2152 0.2249 0.2346 0.2444 0.2542 0.2639 0.2736 0.2831
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
They are the same:
isequal(y_builtin,y_manual)
ans = logical
1
  2 个评论
Snoopy
Snoopy 2024-9-25
If I had the chance to accept two answers, I would have also accepted this as the answer. I am sorry that this is not possible. Thanks for this very intutitve answer. It is very illustrative.

请先登录,再进行评论。

标签

产品


版本

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by