MATLAB Answers

Scatter plot with two data sets of uneven values

11 views (last 30 days)
Vince Clementi
Vince Clementi on 10 Jan 2020
Commented: Adam Danz on 10 Jan 2020
Hi All,
Is it possible to create a scatter plot using two datasets of uneven values. For example, D1 and D1 both have x values that span 0 120 and y values of different parameters (D1 = oxygen, D2 = chlorine). However, D1 consists of 80 data points, and D2 consists of ~20. Moreover, the x values for D1 and D2 do not overlap.
If not, is there a recommended solution to make this easier? The only thing I can think of is to resample the data to a common axis, but that introduces data that are not real.
Thanks!

  9 Comments

Show 6 older comments
Adam Danz
Adam Danz on 10 Jan 2020
ahhhhh...... got it now.
Ok, how to you expect that these values are paired? Here are some ideas.
  1. y values from dataset2 are paired with the first n values of dataset1 (this sounds arbitrary to me; I doubt this is what you want).
  2. y values from dataset2 are paired with the y values from dataset1 whose x values are closest between the two datasets. Note that this may result in more than 1 coordinate in dataset2 being paired with the same coordinate in dataset1, which is fine.
  3. Some other rule you have in mind.
Vince Clementi
Vince Clementi on 10 Jan 2020
I expect a fairly linear relationship between the variables. Option 2 sounds reasonable, but it would be best to mitigate any spruious data.
Adam Danz
Adam Danz on 10 Jan 2020
"Option 2 sounds reasonable"
It sounds like this decision hasn't been though out. The results will not be meaningful unless the pairing is meaningful. There are lots of ways to pair the two datasets and each of them will produce a very different result with a different interpretation.

Sign in to comment.

Accepted Answer

Adam Danz
Adam Danz on 10 Jan 2020
Edited: Adam Danz on 10 Jan 2020
Here's a demo you can follow.
It produces 2 datasets per your description; then it pairs the y values from dataset1 to dataset 2 according to proximity of the x values.
Then in plots the results. The data are random so don't expect linearity.
% Produce 2 datasets, one longer than the other; x values range from 0:110
dataset1 = [rand(100,1)*110, rand(100,1)];
dataset2 = [rand(50,1)*110, rand(50,1)*10];
% Find the rows of dataset1 that is closest to the
% x values in dataset 2
D = pdist2(dataset1,dataset2); % distance between each (x,y)
% D = pdist2(dataset1(:,1),dataset2(:,1)); % distance between each (x)
[~, minRow] = min(D);
% Plot results
plot(dataset1(minRow,2), dataset2(:,2),'o')

  4 Comments

Show 1 older comment
Adam Danz
Adam Danz on 10 Jan 2020
Vince I'd like to add two things.
1) I added a line to my answer (it's commented-out)
% D = pdist2(dataset1(:,1),dataset2(:,1)); % distance between each (x)
I realized that the original line (the one above it) pairs the coordinates which might be exactly what you want. The new commented-out line does the pairing based only on the x values, in case that's what you wanted.
2) In order to see the distance between the paired values, you can add color that represents distance. This may be helpful to confirm that your pairing is reasonable.
D = pdist2(dataset1,dataset2); % distance between each (x,y)
[minDist, minRow] = min(D);
% Plot results
scatter(dataset1(minRow,2), dataset2(:,2),25,minDist,'filled')
cb = colorbar();
ylabel(cb,'nearest neighbor distance')
Vince Clementi
Vince Clementi on 10 Jan 2020
Great modification that strengthens the method. Thank you.

Sign in to comment.

More Answers (0)


Translated by