Best way to organize categorical data for plotting

27 次查看(过去 30 天)
I have wind speed data that I want to split and plot in several different ways. For instance, I may want to plot a 3d bar plot with different combinations of wind speed, month, frequency count, altitude. Or a subsection of the data as stacked bar plot or area plot. I've been looking at different ways of organizing the data to make plotting different variables and subsections easy, but I'm not used to working with categorical data in matlab, and I keep running into limitations with table/timetable. I'm coming from Python (and older versions of Matlab) and I find Matlab 2019 to be close enough to be just a bit frustrating.
I'm looking for advice on what might be the best way to organize data like this to make plotting quick and flexible. Clearly there's some logic to this aspect of Matlab that I've missed. I've previously simply split the data into various matrices and manipulated them for plotting, which is plain and simple, but I'm sure there are better and more advanced ways to do this if one knows how.
I've attached the file GC.mat where the data is organized by month, wind speed (5 m/s bins), altitude (2 km bins) and count. I've also added part of the original timetable w_tt.csv with 2 s wind speed data (the original data covers 6 years).
I can provide examples of my plotting attempts and try to figure out why it's not working, but I have a feeling the main problem is that I've just not understood the best way to organize the data, so I'm starting with an attempt to learn why this is/isn't a good approach.
Example table:
166×6 table
monthname_Time disc_Altitude disc_Windspeed GroupCount norm perc
______________ _____________ ______________ __________ __________ __________
January [20000, Inf] [0, 2) 503 4.3455e-05 0.0043455
January [20000, Inf] [2, 4) 1355 0.00011706 0.011706
January [20000, Inf] [4, 6) 2452 0.00021183 0.021183
January [20000, Inf] [6, 8) 2931 0.00025321 0.025321
January [20000, Inf] [8, 10) 3516 0.00030375 0.030375
January [20000, Inf] [10, 12) 3640 0.00031447 0.031447
January [20000, Inf] [12, 14) 3392 0.00029304 0.029304
January [20000, Inf] [14, 16) 2398 0.00020717 0.020717
January [20000, Inf] [16, 18) 2134 0.00018436 0.018436
January [20000, Inf] [18, 20) 2815 0.00024319 0.024319
January [20000, Inf] [20, 22) 3811 0.00032924 0.032924
January [20000, Inf] [22, 24) 4504 0.00038911 0.038911
Example timetable data:
Time Windspeed Altitude
___________________ _________ ________
01/01/2015 00:17:01 27.3 18317
01/01/2015 00:17:03 27.3 18325
01/01/2015 00:17:05 27.2 18334
01/01/2015 00:17:07 27.1 18343
01/01/2015 00:17:09 27 18352
01/01/2015 00:17:11 26.9 18361
  1 个评论
dpb
dpb 2021-1-27
I think where you got stuck with examples would be more helpful...I'd probably keep the dates as datetime instead of converting them to categorical and try to shorten the categorical categories simply to make less typing; you can use the valueset, catnames optional inputs for display purposes.
Also, it could be more advantageous to also leave the altitude and windspeed data as numeric and use discretize and friends to do the binning on the fly instead as well.
rowfun is extremely powerful in conjunction with grouping variables for such things...

请先登录,再进行评论。

回答(1 个)

Avni Agrawal
Avni Agrawal 2024-5-16
I understand that you are trying to organize and visualize your wind speed data in MATLAB.
Data Organization with Tables and Timetables:
  • Tables are versatile for mixed data types, ideal for categorizing wind speeds, altitudes, and counts. Timetables are perfect for time-series data, offering easy indexing and aggregation based on time.
Categorization and Grouping:
  • Convert relevant variables to categorical for meaningful grouping (e.g., month, wind speed bins). Use groupsummary for quick calculations within these groups.
Reshaping for Visualization:
  • Reshape your data to fit the needs of different plots. Functions like unstack can pivot data for easier plotting, organizing rows and columns by categories such as month or wind speed bin.
Visualization Techniques:
  • 3D Bar Plots (bar3): Great for showing relationships between three variables (e.g., month, wind speed bin, and count).
  • Stacked Bar and Area Plots (bar, area): Useful for comparing parts of a whole over categories, with data organized in 2D matrices.
Here's a simple example of how you might start to organize and plot your data:
% Load your data
load('GC.mat'); % Assuming this loads a table named 'GC'
% Convert to categorical if not already
GC.monthname_Time = categorical(GC.monthname_Time);
GC.disc_Altitude = categorical(GC.disc_Altitude);
GC.disc_Windspeed = categorical(GC.disc_Windspeed);
% Example: Plotting frequency count by month and wind speed bin
% This requires reshaping the data for plotting
pivotTable = unstack(GC, 'GroupCount', 'disc_Windspeed', 'GroupingVariables', 'monthname_Time');
% Simple 3D bar plot example
figure;
bar3(pivotTable{:, :});
xlabel('Wind Speed Bin');
ylabel('Month');
zlabel('Frequency Count');
Please take a look at below documentations for better understanding:
I hope this helps!

类别

Help CenterFile Exchange 中查找有关 Data Preprocessing 的更多信息

产品


版本

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by