Need help transforming data from a table

Question

0 个投票

I had transaction data in a .CSV where each instance of a sale is a row of data with sku, quantity, and date are columns.

I want to create a new table with the same three columns, but so that the quantity is the sum of all instances for sales of the specific sku for that specific day. For instance, right now we would see two seperate rows for two sales of sku 1234 on March 13. I want the table to have only one row for that sku while the quantity column reads "2" as it is the sum of all sales for that sku on that day.

Thank you

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Peter Perkins 2019-4-16

在 MATLAB Online 中打开

0 个投票

Zac, if I understand correctly what you want to do, it's way simpler than all that.

I assume you know that your data file is kind of non-standard. It's hard to tell what you intend as what, but I took a guess. I recommend you take a look at detectimportoptions to deal with it, what I've shjown here is quick and dirty but not the most flexible way of reading that file.

>> fmt = '%f%f%s%s%s%s%f%f%{dd-MMM-yy}D%{dd-MMM-yy}D%{dd-MMM-yy}D%s%s%s';
>> t = readtable('sampleDataSet.csv','Format',fmt,'Delimiter','~','ReadVariableNames',false);
>> t = t(:,[3 7 9]);
>> t.Properties.VariableNames = {'SKU' 'Quantity' 'Date'};
>> t.SKU = categorical(t.SKU)
t =
  20×3 table
      SKU      Quantity      Date   
    _______    ________    _________
    9303W          1       18-Mar-13
    2031727        1       30-Apr-13
    2030117        1       30-Apr-13
    2038036        1       30-Apr-13
    2038036        1       30-Apr-13
    2032445        1       30-Apr-13
    2035549        1       30-Apr-13
    2046881        2       30-Apr-13
    38H5           1       26-Apr-13
    52C2           1       26-Apr-13
    53Y8K          1       26-Apr-13
    43Z38          1       26-Apr-13
    9595           1       26-Apr-13
    9595           1       26-Apr-13
    2031510        1       26-Apr-13
    53Y8          15       26-Apr-13
    53Y8K         15       26-Apr-13
    0001           6       26-Apr-13
    1940           1       26-Apr-13
    4001           1       26-Apr-13

Once you're read the file, there are several ways to do the grouped calculation you want, including findgroups/splitapply and groupsummary. The following uses varfun. It's one line:

  >> varfun(@sum,t,'GroupingVariables',{'SKU' 'Date'})
  ans =
  17×4 table
      SKU        Date       GroupCount    sum_Quantity
    _______    _________    __________    ____________
    0001       26-Apr-13        1               6     
    1940       26-Apr-13        1               1     
    2030117    30-Apr-13        1               1     
    2031510    26-Apr-13        1               1     
    2031727    30-Apr-13        1               1     
    2032445    30-Apr-13        1               1     
    2035549    30-Apr-13        1               1     
    2038036    30-Apr-13        2               2     
    2046881    30-Apr-13        1               2     
    38H5       26-Apr-13        1               1     
    4001       26-Apr-13        1               1     
    43Z38      26-Apr-13        1               1     
    52C2       26-Apr-13        1               1     
    53Y8       26-Apr-13        1              15     
    53Y8K      26-Apr-13        2              16     
    9303W      18-Mar-13        1               1     
    9595       26-Apr-13        2               2    

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 2

Mark Sherstan 2019-4-11

在 MATLAB Online 中打开

0 个投票

Book1.csv

Give this a try! I am sure there are more effcient ways to achieve your outcome in terms of coding or other alternatives (e.g. a pivot table in Excel and then importing the data into MATLAB). I tried it with a few sets of numbers but give it a try on your data to validate that it makes sense and everything is functioning properly.

clear all
clc
% Import using CSV function into table
A = importfile('Book1.csv');
% Find the unique dates
idDate = unique(A.DATE);
% Loop through the unique dates
count = 1;
for ii = 1:length(idDate)
    % Look at one specific day
    tempDateIdx = find(idDate(ii) == A.DATE);
    B = [A.SKU(tempDateIdx) A.QTY(tempDateIdx)];
    
    % Find the unique SKU's
    SKUs = unique(B(:,1));
    
    % Loop through skus and store the sum 
    for jj = 1:length(SKUs)
        idxSKUs = find(SKUs(jj) == B(:,1));
        out{count,1} = SKUs(jj); 
        out{count,2} = sum(B(idxSKUs,2));
        out{count,3} = idDate(ii);
        count = count + 1;
    end
end
% Output cell array
out

Autogenerated function:

function Book1 = importfile(filename, startRow, endRow)
%IMPORTFILE Import numeric data from a text file as a matrix.
%   BOOK1 = IMPORTFILE(FILENAME) Reads data from text file FILENAME for the
%   default selection.
%
%   BOOK1 = IMPORTFILE(FILENAME, STARTROW, ENDROW) Reads data from rows
%   STARTROW through ENDROW of text file FILENAME.
%
% Example:
%   Book1 = importfile('Book1.csv', 2, 8);
%
%    See also TEXTSCAN.
% Auto-generated by MATLAB on 2019/04/11 11:20:41
%% Initialize variables.
delimiter = ',';
if nargin<=2
    startRow = 2;
    endRow = inf;
end
%% Read columns of data as text:
% For more information, see the TEXTSCAN documentation.
formatSpec = '%s%s%s%[^\n\r]';
%% Open the text file.
fileID = fopen(filename,'r');
%% Read columns of data according to the format.
% This call is based on the structure of the file used to generate this
% code. If an error occurs for a different file, try regenerating the code
% from the Import Tool.
dataArray = textscan(fileID, formatSpec, endRow(1)-startRow(1)+1, 'Delimiter', delimiter, 'TextType', 'string', 'HeaderLines', startRow(1)-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
for block=2:length(startRow)
    frewind(fileID);
    dataArrayBlock = textscan(fileID, formatSpec, endRow(block)-startRow(block)+1, 'Delimiter', delimiter, 'TextType', 'string', 'HeaderLines', startRow(block)-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
    for col=1:length(dataArray)
        dataArray{col} = [dataArray{col};dataArrayBlock{col}];
    end
end
%% Close the text file.
fclose(fileID);
%% Convert the contents of columns containing numeric text to numbers.
% Replace non-numeric text with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
    raw(1:length(dataArray{col}),col) = mat2cell(dataArray{col}, ones(length(dataArray{col}), 1));
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[1,2]
    % Converts text in the input cell array to numbers. Replaced non-numeric
    % text with NaN.
    rawData = dataArray{col};
    for row=1:size(rawData, 1)
        % Create a regular expression to detect and remove non-numeric prefixes and
        % suffixes.
        regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
        try
            result = regexp(rawData(row), regexstr, 'names');
            numbers = result.numbers;
            
            % Detected commas in non-thousand locations.
            invalidThousandsSeparator = false;
            if numbers.contains(',')
                thousandsRegExp = '^[-/+]*\d+?(\,\d{3})*\.{0,1}\d*$';
                if isempty(regexp(numbers, thousandsRegExp, 'once'))
                    numbers = NaN;
                    invalidThousandsSeparator = true;
                end
            end
            % Convert numeric text to numbers.
            if ~invalidThousandsSeparator
                numbers = textscan(char(strrep(numbers, ',', '')), '%f');
                numericData(row, col) = numbers{1};
                raw{row, col} = numbers{1};
            end
        catch
            raw{row, col} = rawData{row};
        end
    end
end
% Convert the contents of columns with dates to MATLAB datetimes using the
% specified date format.
try
    dates{3} = datetime(dataArray{3}, 'Format', 'yyyy-MM-dd', 'InputFormat', 'yyyy-MM-dd');
catch
    try
        % Handle dates surrounded by quotes
        dataArray{3} = cellfun(@(x) x(2:end-1), dataArray{3}, 'UniformOutput', false);
        dates{3} = datetime(dataArray{3}, 'Format', 'yyyy-MM-dd', 'InputFormat', 'yyyy-MM-dd');
    catch
        dates{3} = repmat(datetime([NaN NaN NaN]), size(dataArray{3}));
    end
end
dates = dates(:,3);
%% Split data into numeric and string columns.
rawNumericColumns = raw(:, [1,2]);
%% Create output variable
Book1 = table;
Book1.SKU = cell2mat(rawNumericColumns(:, 1));
Book1.QTY = cell2mat(rawNumericColumns(:, 2));
Book1.DATE = dates{:, 1};
% For code requiring serial dates (datenum) instead of datetime, uncomment
% the following line(s) below to return the imported dates as datenum(s).
% Book1.DATE=datenum(Book1.DATE);

3 个评论
显示 1更早的评论隐藏 1更早的评论

Zac Lingen 2019-4-11

在 MATLAB Online 中打开

Thanks for your answer! I went ahead and gave it a shot and got this error.

shipTable = readTable('orders.csv');
%Declaring names for column and cutting out unneeded data
Sku = shipTable(:,'Var3');
QuantityShipped = shipTable(:,'Var7');
DateOrder = shipTable (:,'Var9');
skuTable = table(Sku,QuantityShipped,DateOrder);
disp(skuTable)
% Now begins my implementation of your code: 
% Finding unique dates
idDate = unique(skuTable.DateOrder);
% Looping through unique dates
count = 1;
for i = 1:length(idDate)
    tempDateIdx = find(idDate(i) == skuTable.DateOrder);
    % Creating new table to hold values
    refactoredTable = [skuTable.Sku(tempDateIdx) skuTable.QuantityShipped(tempDateIdx)];
    
    % Find Unique skus
    uniqueSkus = unique(refactoredTable(:,1));
    
    % Loop through skus and store sum
    for j = 1:length(uniqueSkus)
        idxSKUs = find(uniqueSkus(j) == refactoredTable(:,1));
        out{count,1} = uniqueSkus(j);
        out{count,2} = sum(refactoredTable(idxSKUs,2));
        out{count,3} = idDate(i);
        count = count + 1;
    end
end
% Outputs cell array
out

At the beginning of the first loop I am getting the following error: Error using tabular/length (line 189). Unedfined function 'length" for input arguments of type 'table'. Use the HEIGHT, WIDTH or SIZE functions instead.

Rewrote it to call height function rather than length and am returning the following error: Subscripting a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts is not supported. Use a row subscript and a variable subscript.

Thank you again for taking the time to review this.

Mark Sherstan 2019-4-12

I think I see whats going on but would you be able to share or make up a subset of your data for me to test the code on? Less than 10 entries should be more than enough.

Zac Lingen 2019-4-12

sampleDataSet.csv

Commenting again as my last one was flagged. Use ~ as a delimiter. Only care about columns 3 (sku), 7 (Quantity Shipped), and 9 (Date)

请先登录，再进行评论。

Need help transforming data from a table

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（1 个）

3 个评论
显示 1更早的评论隐藏 1更早的评论

类别

标签

Community Treasure Hunt

Need help transforming data from a table

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

更多回答（1 个）

3 个评论 显示 1更早的评论 隐藏 1更早的评论

类别

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论