Need help transforming data from a table
5 次查看(过去 30 天)
显示 更早的评论
I had transaction data in a .CSV where each instance of a sale is a row of data with sku, quantity, and date are columns.
I want to create a new table with the same three columns, but so that the quantity is the sum of all instances for sales of the specific sku for that specific day. For instance, right now we would see two seperate rows for two sales of sku 1234 on March 13. I want the table to have only one row for that sku while the quantity column reads "2" as it is the sum of all sales for that sku on that day.
Thank you
0 个评论
采纳的回答
Peter Perkins
2019-4-16
Zac, if I understand correctly what you want to do, it's way simpler than all that.
I assume you know that your data file is kind of non-standard. It's hard to tell what you intend as what, but I took a guess. I recommend you take a look at detectimportoptions to deal with it, what I've shjown here is quick and dirty but not the most flexible way of reading that file.
>> fmt = '%f%f%s%s%s%s%f%f%{dd-MMM-yy}D%{dd-MMM-yy}D%{dd-MMM-yy}D%s%s%s';
>> t = readtable('sampleDataSet.csv','Format',fmt,'Delimiter','~','ReadVariableNames',false);
>> t = t(:,[3 7 9]);
>> t.Properties.VariableNames = {'SKU' 'Quantity' 'Date'};
>> t.SKU = categorical(t.SKU)
t =
20×3 table
SKU Quantity Date
_______ ________ _________
9303W 1 18-Mar-13
2031727 1 30-Apr-13
2030117 1 30-Apr-13
2038036 1 30-Apr-13
2038036 1 30-Apr-13
2032445 1 30-Apr-13
2035549 1 30-Apr-13
2046881 2 30-Apr-13
38H5 1 26-Apr-13
52C2 1 26-Apr-13
53Y8K 1 26-Apr-13
43Z38 1 26-Apr-13
9595 1 26-Apr-13
9595 1 26-Apr-13
2031510 1 26-Apr-13
53Y8 15 26-Apr-13
53Y8K 15 26-Apr-13
0001 6 26-Apr-13
1940 1 26-Apr-13
4001 1 26-Apr-13
Once you're read the file, there are several ways to do the grouped calculation you want, including findgroups/splitapply and groupsummary. The following uses varfun. It's one line:
>> varfun(@sum,t,'GroupingVariables',{'SKU' 'Date'})
ans =
17×4 table
SKU Date GroupCount sum_Quantity
_______ _________ __________ ____________
0001 26-Apr-13 1 6
1940 26-Apr-13 1 1
2030117 30-Apr-13 1 1
2031510 26-Apr-13 1 1
2031727 30-Apr-13 1 1
2032445 30-Apr-13 1 1
2035549 30-Apr-13 1 1
2038036 30-Apr-13 2 2
2046881 30-Apr-13 1 2
38H5 26-Apr-13 1 1
4001 26-Apr-13 1 1
43Z38 26-Apr-13 1 1
52C2 26-Apr-13 1 1
53Y8 26-Apr-13 1 15
53Y8K 26-Apr-13 2 16
9303W 18-Mar-13 1 1
9595 26-Apr-13 2 2
0 个评论
更多回答(1 个)
Mark Sherstan
2019-4-11
Give this a try! I am sure there are more effcient ways to achieve your outcome in terms of coding or other alternatives (e.g. a pivot table in Excel and then importing the data into MATLAB). I tried it with a few sets of numbers but give it a try on your data to validate that it makes sense and everything is functioning properly.
clear all
clc
% Import using CSV function into table
A = importfile('Book1.csv');
% Find the unique dates
idDate = unique(A.DATE);
% Loop through the unique dates
count = 1;
for ii = 1:length(idDate)
% Look at one specific day
tempDateIdx = find(idDate(ii) == A.DATE);
B = [A.SKU(tempDateIdx) A.QTY(tempDateIdx)];
% Find the unique SKU's
SKUs = unique(B(:,1));
% Loop through skus and store the sum
for jj = 1:length(SKUs)
idxSKUs = find(SKUs(jj) == B(:,1));
out{count,1} = SKUs(jj);
out{count,2} = sum(B(idxSKUs,2));
out{count,3} = idDate(ii);
count = count + 1;
end
end
% Output cell array
out
Autogenerated function:
function Book1 = importfile(filename, startRow, endRow)
%IMPORTFILE Import numeric data from a text file as a matrix.
% BOOK1 = IMPORTFILE(FILENAME) Reads data from text file FILENAME for the
% default selection.
%
% BOOK1 = IMPORTFILE(FILENAME, STARTROW, ENDROW) Reads data from rows
% STARTROW through ENDROW of text file FILENAME.
%
% Example:
% Book1 = importfile('Book1.csv', 2, 8);
%
% See also TEXTSCAN.
% Auto-generated by MATLAB on 2019/04/11 11:20:41
%% Initialize variables.
delimiter = ',';
if nargin<=2
startRow = 2;
endRow = inf;
end
%% Read columns of data as text:
% For more information, see the TEXTSCAN documentation.
formatSpec = '%s%s%s%[^\n\r]';
%% Open the text file.
fileID = fopen(filename,'r');
%% Read columns of data according to the format.
% This call is based on the structure of the file used to generate this
% code. If an error occurs for a different file, try regenerating the code
% from the Import Tool.
dataArray = textscan(fileID, formatSpec, endRow(1)-startRow(1)+1, 'Delimiter', delimiter, 'TextType', 'string', 'HeaderLines', startRow(1)-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
for block=2:length(startRow)
frewind(fileID);
dataArrayBlock = textscan(fileID, formatSpec, endRow(block)-startRow(block)+1, 'Delimiter', delimiter, 'TextType', 'string', 'HeaderLines', startRow(block)-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
for col=1:length(dataArray)
dataArray{col} = [dataArray{col};dataArrayBlock{col}];
end
end
%% Close the text file.
fclose(fileID);
%% Convert the contents of columns containing numeric text to numbers.
% Replace non-numeric text with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = mat2cell(dataArray{col}, ones(length(dataArray{col}), 1));
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[1,2]
% Converts text in the input cell array to numbers. Replaced non-numeric
% text with NaN.
rawData = dataArray{col};
for row=1:size(rawData, 1)
% Create a regular expression to detect and remove non-numeric prefixes and
% suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData(row), regexstr, 'names');
numbers = result.numbers;
% Detected commas in non-thousand locations.
invalidThousandsSeparator = false;
if numbers.contains(',')
thousandsRegExp = '^[-/+]*\d+?(\,\d{3})*\.{0,1}\d*$';
if isempty(regexp(numbers, thousandsRegExp, 'once'))
numbers = NaN;
invalidThousandsSeparator = true;
end
end
% Convert numeric text to numbers.
if ~invalidThousandsSeparator
numbers = textscan(char(strrep(numbers, ',', '')), '%f');
numericData(row, col) = numbers{1};
raw{row, col} = numbers{1};
end
catch
raw{row, col} = rawData{row};
end
end
end
% Convert the contents of columns with dates to MATLAB datetimes using the
% specified date format.
try
dates{3} = datetime(dataArray{3}, 'Format', 'yyyy-MM-dd', 'InputFormat', 'yyyy-MM-dd');
catch
try
% Handle dates surrounded by quotes
dataArray{3} = cellfun(@(x) x(2:end-1), dataArray{3}, 'UniformOutput', false);
dates{3} = datetime(dataArray{3}, 'Format', 'yyyy-MM-dd', 'InputFormat', 'yyyy-MM-dd');
catch
dates{3} = repmat(datetime([NaN NaN NaN]), size(dataArray{3}));
end
end
dates = dates(:,3);
%% Split data into numeric and string columns.
rawNumericColumns = raw(:, [1,2]);
%% Create output variable
Book1 = table;
Book1.SKU = cell2mat(rawNumericColumns(:, 1));
Book1.QTY = cell2mat(rawNumericColumns(:, 2));
Book1.DATE = dates{:, 1};
% For code requiring serial dates (datenum) instead of datetime, uncomment
% the following line(s) below to return the imported dates as datenum(s).
% Book1.DATE=datenum(Book1.DATE);
3 个评论
Mark Sherstan
2019-4-12
I think I see whats going on but would you be able to share or make up a subset of your data for me to test the code on? Less than 10 entries should be more than enough.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Text Files 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!