innerjoin error and potential matching questions
    4 次查看(过去 30 天)
  
       显示 更早的评论
    
Hi,
1. I am matching two tables based on ticker but i receive the following error
Error using tabular/innerjoin (line 104)
Left and right key variables 'ticker' and 'ticker' include cells containing non-character vector
values.
Error in Step1Match (line 10)
Data2=innerjoin(Data1,identifiers, 'Keys', {'ticker'});
the two tables Data1 and identifiers are attached. 2. is it feasible to match based on cusip where in one table the cusip is 8 digits and in the other it is 6 digits. In other words, is it possible to match if the 6 digits in one file are part of the 8 digit number of the other file
0 个评论
采纳的回答
  Guillaume
      
      
 2017-9-8
        Well, the error message is clear, the ticker column in at least one of your table contain a mixture of char array and non-char arrays. Let's check:
>>find(~cellfun(@ischar, identifiers.ticker)) 
ans =
    0×1 empty double column vector
identifiers is fine.
>>find(~cellfun(@ischar, Data1.ticker), 3)
ans = 
    112430
    112431
    112432
    ...
    141311
Indeed, there's a problem with Data1. Let's have a look at the values:
>>Data1(~cellfun(@ischar, Data1.ticker), 1:4)
ans =
  77×4 table
      PERMNO    date     SICCD    ticker
      ______    _____    _____    ______
      78830     38411    6035     [1]   
      78830     38442    6035     [1]   
      78830     38471    6035     [1]   
      78830     38503    6035     [1]
Well, there's your problem. The tickers are just numeric 1. You'll have to get rid of these or fix your import before the innerjoin.
As for your second question, it's not possible with matlab's join functions but it wouldn't be particularly difficult to implement yourself. One problem though:
>> unique(ceil(log10(Data1.CUSIP(~isnan(Data1.CUSIP)))))'
ans =
       5     6     7     8    15
>> unique(ceil(log10(identifiers.CUSIP(~isnan(identifiers.CUSIP)))))'
ans =
       3     4     5     6
So, you have cusip in Data1 that have 5, 6, 7, 8 or 15 digits in addition to many NaNs. Similarly in identifiers, they have 3, 4, 5 or 6 digits, and again there's lot of NaNs. Again, you need to look at why this does not conform to your expectations.
0 个评论
更多回答(0 个)
另请参阅
类别
				在 Help Center 和 File Exchange 中查找有关 Data Type Conversion 的更多信息
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
