Populating matrix with data from a table
5 次查看(过去 30 天)
显示 更早的评论
Hello .Sorry for this questions but I am not very familiar with the table function.I am attaching a csv file .I am reading this file which basically contains some stock prices for some days .I am trying to populate a matrix with the data .For some reason it looks really cumbersome to use the table function but it might be my stupidity.Before I was using textread,except I do not know in general how many names I have so is hard (or impossible) to do that.I have read the documentation for table but nothing seems to work!
so I do something like this
opts=detectImportOptions('ii.csv');
opts.VariableNamingRule = "preserve";
P=readtable('ii.csv',opts);
nDates=length(P(:,1);
nBlIds=length(P(1,:);
closes=nan*ones(nDates,nBlIds);
% here I write the date in different way
trDates=P(:,1);tradingDates=table2array(trDates);
formatOut='yyyymmdd';
Dates=str2num(datestr(tradingDates,formatOut));
No clue how to populate the matrix using P.... All my attempts failed .Maybe is hard maybe is not but if not so obvious I fail to understand the purpose of this function if you cannot create a simple numerical matrix .Maybe useful for complicated files with weird types of variables .But in this case I just need to put the data(numerical) in some matrix .In Perl or Python this is a breeze ,unfortunately I need to use it with Matlab in this context .Any suggestion ,maybe using some other function will save me a lot of time
9 个评论
dpb
2020-11-30
Almost certainly didn't need to have done that.
You don't show what your next step is going to be, but to duplicate all the data you already have and to do so in a tight double loop is about as inefficient a use of MATLAB as is possible.
采纳的回答
dpb
2020-11-28
编辑:dpb
2020-11-28
Since have expanded somewhat, will add additional Answer to bring to forefront -- and illustrate couple other things...
Above we ended up with something like --
opt=detectImportOptions('ii.csv'); % create base import options object
opt.VariableTypes=strrep(opt.VariableTypes,'char','double'); % all are double except time
opt.PreserveVariableNames='true'; % and keep original symbol/name
opt.VariableNames(1)={'Date'}; % fix Date column name
opt=setvaropts(opt,'Date',"DatetimeFormat",'dd-MMM-uuuu'); % and a more concise display format
Now, since it is a set of time data, use the timetable object instead of just the straight table -- this gets you the date info essentially "for free" in conjunction with the data selection.
ttII=readtimetable('ii.csv',opt); % 't' or 'tt' prefix to remind of class
Now you can do addressing however wish for whatever purpose -- remember that "()" addressing on the table will return another table but "{}" or "dot" addressing the data.
With a plain table, though, the Date and data aren't the same type/class so you can't concatenate them--that's the advantage of the timetable; the date comes along for the ride...
So you can do things like--
>> ix=randperm(width(ttII),10) % pick a random set of columns/symbols
ix =
512.00 1168.00 899.00 177.00 286.00 463.00 739.00 975.00 159.00 734.00
>> format bank, format compact % since are stock prices; more legible display format
>> ttII(:,ix) % retrieve those -- note result is another table w/ Date
ans =
3×10 timetable
Date AGGLN EMGSNO SONIPL CBGLN ROGSW JPRLN FWBIM PFCLN NG_LN ATLIM
___________ _____ ______ ______ _____ ______ _____ _____ _____ _____ _____
14-Feb-2000 0.50 NaN NaN 4.25 103.46 25.31 NaN NaN 1.69 3.22
15-Feb-2000 0.50 NaN NaN 4.15 102.14 25.55 NaN NaN 1.66 3.24
16-Feb-2000 0.49 NaN NaN 4.05 102.62 25.63 NaN NaN 1.72 3.33
>> ttII{:,ix} % retrieve as array -- free of the date
ans =
0.50 NaN NaN 4.25 103.46 25.31 NaN NaN 1.69 3.22
0.50 NaN NaN 4.15 102.14 25.55 NaN NaN 1.66 3.24
0.49 NaN NaN 4.05 102.62 25.63 NaN NaN 1.72 3.33
>>
Read up on tables and timetables -- there is a lot of processing power packaged to go with them -- particularly varfun, rowfun and friends as well as time selection with timerange
Don't sell MATLAB short...takes some time and effort to learn about syntax, but it's extremely powerful when do.
ADDENDUM:
However, you'll undoubtedly run into memory and perhaps performance issues if you try to load years of data for this many variables. I suggest to make the selection of variables needed at the time you read the data and only read in those really need.
You can do this by manipulating the 'SelectedVariableNames' property in the import options object. You could have ended up with the above table by
opt.SelectedVariableNames=opt.SelectedVariableNames(ix);
to have gotten the randomized selection or by any other chosen pattern.
1 个评论
dpb
2020-11-29
" the Date and data aren't the same type/class so you can't concatenate them-"
Which is the thing then that for the most part you really won't want or need to extract the data into an array, anyway--use the tools for tables instead.
更多回答(1 个)
dpb
2020-11-27
WOWSERS!!! You've got something like 1500 variables!!!??? That's definitely hard to deal with by variable name no matter what the language.
But, it's trivial to get the data from the table--
tII=readtable('ii.csv'); % read the file
tII.Properties.VariableNames(1)={'Date'}; % make convenient date variable name
After this, simply
>> tII.SXXP
ans =
232.2856
227.4823
230.4354
>>
Returns the data from the first variable; not complicated at all.
You can also defeference by subscript; note that in that case use {} instead of () to return the data as native type; the parentheses will return a table object containing the referenced variable(s).
>> tII{:,2}
ans =
232.2856
227.4823
230.4354
>>
Given the time nature of the data, I'd suggest probably a timetable is more appropriate.
See the doc for how to reference data from table object -- there's a veritable plethora of options available depending on need--see <Access-data-in-a-table> for the details.
2 个评论
dpb
2020-11-28
You didn't convert the input to double for all the missing data that was interpreted as 'char' data where there are missing data.
Use the import options object to force that to happen...
>> opt=detectImportOptions('ii.csv');
>> unique(opt.VariableTypes)
ans =
1×3 cell array
{'char'} {'datetime'} {'double'}
shows that some values will be 'char' because there weren't enough samples in the input file for it to be able to determine really was supposed to be numeric. So, since know what want/need--
>> opt.VariableTypes=strrep(opt.VariableTypes,'char','double');
>> unique(opt.VariableTypes)
ans =
1×2 cell array
{'datetime'} {'double'}
>>
Set all of those to also be 'double' before reading...then, I didn't look at the input file to see just why but noticed that the date column title wasn't recognized so may as well fix it now, too...
>> opt.VariableNames(1)={'Date'};
Then,
tII=readtable('ii.csv',opt);
and all your data will be numeric and no cell arrays be left.
It's tough to visualize such large datasets that aren't possible to look at on screen or even in many editors. Is there really a need for so many variables all at one time?
I also didn't look at the variable names in the file -- you'll have to choose which way to treat those -- if you want to use named variables for selection of a specific (set of) symbols, you may want to do as you did earlier and preserve those names despite them not being valid MATLAB names and perhaps harder to type. If not, you have potential to not have easy way to know what MATLAB named them as...again, all depends on just how you intend to/need to address so many variables.
You can create dynamic variable names from the string name property programmatically may be one way or build pick lists and search for where certain names are to return those.
However you choose to go, having 1500 variables is a challenge no matter what system you're using for anything other than fully automated analyses--just not feasible to do by hand.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Whos 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!