Transfer Data to Amazon S3 Buckets and Access Data Using MATLAB
To work with data in the cloud, you can upload it to Amazon S3™ and then access the data in Amazon S3 from MATLAB® or from workers in your cluster.
Set up Access
To work with remote data in Amazon S3, first set up access by following these steps:
Create an identity and access management (IAM) user using your AWS® root account. For more information, see Creating an IAM User in Your AWS Account.
Generate an access key to receive an access key ID and a secret access key. For more information, see Managing Access Keys for IAM Users.
Specify your AWS access key ID, secret access key, and region of the bucket as system environment variables in your MATLAB Parallel Server™ command window using the
setenv
(MATLAB) command.If you are using an AWS temporary token (such as with AWS Federated Authentication), you must specify your session token instead of the region.setenv("AWS_ACCESS_KEY_ID","YOUR_AWS_ACCESS_KEY_ID") setenv("AWS_SECRET_ACCESS_KEY","YOUR_AWS_SECRET_ACCESS_KEY") setenv("AWS_DEFAULT_REGION","YOUR_AWS_DEFAULT_REGION")
setenv("AWS_SESSION_TOKEN","YOUR_AWS_SESSION_TOKEN")
To permanently set these environment variables, set them in your user or system environment.
Before R2020a: Use
AWS_REGION
instead ofAWS_DEFAULT_REGION
.If you are using MATLAB Parallel Server on Cloud Center, configure your cloud cluster to access S3 services.
After you create a cloud cluster, copy your AWS credentials to your cluster workers. In MATLAB Parallel Server, select Parallel > Create and Manage Clusters. In the Cluster Profile Manager, select your cloud cluster profile. Scroll to the
EnvironmentVariables
property and add these environment variable names: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION. If you are using AWS temporary credentials, also add AWS_SESSION_TOKEN. For more details, see Set Environment Variables on Workers (Parallel Computing Toolbox).
Upload Data to Amazon S3 from Local Machine
This section shows you how to upload some data sets to Amazon S3 from your local machine. Later sections show you some ways to work with remote image and text data. To obtain these data sets on your local machine, follow these steps.
The Example Food Images data set contains 978 photographs of food in nine classes. You can download this data set to your local machine using this command in MATLAB.
fprintf("Downloading Example Food Image data set ... ") filename = matlab.internal.examples.downloadSupportFile('nnet', 'data/ExampleFoodImageDataset.zip'); fprintf("Done.\n") unzip(filename,"MyLocalFolder/FoodImageDataset");
To obtain the Traffic Signal Work Orders data set on your local machine, use this command.
fprintf("Downloading Traffic Signal Work Orders data set ... ") zipFile = matlab.internal.examples.downloadSupportFile("textanalytics","data/Traffic_Signal_Work_Orders.zip"); fprintf("Done.\n") unzip(zipFile,"MyLocalFolder/TrafficDataset");
You can upload data to Amazon S3 by using the AWS S3 web page. For more efficient file transfers to and from Amazon S3, use the command line.
To upload the Example Food Images data set and the Traffic Signal Work Orders data set from your local machine to Amazon S3, follow these steps.
Download and install the AWS Command Line Interface tool. This tool supports commands specific to AWS in your MATLAB command window.
Create a bucket for your data using the following command in your MATLAB command window. Replace
MyCloudData
with the name of your Amazon S3 bucket.!aws s3 mb s3://MyCloudData
Upload your data using the following command in your MATLAB command window.
!aws s3 cp mylocaldatapath s3://MyCloudData --recursive
For example, to upload the Example Food Images data set from your local machine to your Amazon S3 bucket, use this command.
!aws s3 cp MyLocalFolder/FoodImageDataset s3://MyCloudData/FoodImageDataset/ --recursive
To upload the Traffic Signal Work Orders data set from your local machine to your Amazon S3 bucket, use this command.
!aws s3 cp MyLocalFolder/TrafficDataset s3://MyCloudData/TrafficDataset/ --recursive
Access Data from Amazon S3 in MATLAB
After you store your data in Amazon S3, you can use Data Import and Export (MATLAB) functions in MATLAB to read or write data from the Amazon S3 bucket in MATLAB. MATLAB functions that support a remote location in their
filename
input arguments allow access to remote data. To check if a
specific function allows remote access, refer to its function page.
For example, you can use imread
(MATLAB) to read images from an Amazon S3 bucket. Replace s3://MyCloudData
with the URL of your
Amazon S3 bucket.
To write data into the Amazon S3 bucket, you can similarly use Data Import and Export (MATLAB) functions which support write access to remote data. To check if a specific function allows remote access, refer to its function page.
Read Data from Amazon S3 in MATLAB Using Datastores
For large data sets in Amazon S3, you can use datastores to access the data from your MATLAB client or your cluster workers. A datastore is a repository for collections of
data that are too large to fit in memory. Datastores allow you to read and process data
stored in multiple files on a remote location as a single entity. For example, use an imageDatastore
(MATLAB) to
read images from an Amazon S3 bucket. Replace s3://MyCloudData
with the URL of your
Amazon S3 bucket.
Create an
imageDatastore
object that points to the URL of the Amazon S3 bucket.imds = imageDatastore("s3://MyCloudData/FoodImageDataset/", ... IncludeSubfolders=true, ... LabelSource="foldernames");
Read the first image from Amazon S3 using the
readimage
(MATLAB) function.img = readimage(imds,1);
Display the image using the
imshow
(MATLAB) function.imshow(img)
To use datastores to read files or data of other formats, see Getting Started with Datastore (MATLAB).
For a step-by-step example that shows how to train a convolutional neural network using data stored in Amazon S3, see Train Network in the Cloud Using Automatic Parallel Support (Deep Learning Toolbox).
Write Data to Amazon S3 from MATLAB Using Datastores
You can use datastores to write data from MATLAB or cluster workers to Amazon S3. For example, follow these steps to use a tabularTextDatastore
(MATLAB)
object to read tabular data from Amazon S3 into a tall array, preprocess it, and then write it back to Amazon S3.
Create a datastore object that points to the URL of the Amazon S3 bucket.
ds = tabularTextDatastore("s3://MyCloudData/TrafficDataset/Traffic_Signal_Work_Orders.csv");
Read the tabular data into a tall array and preprocess it by removing missing entries and sorting the data.
tt = tall(ds); tt = sortrows(rmmissing(tt));
Write the data back to Amazon S3 using the
write
(MATLAB) function.write("s3://MyCloudData/TrafficDataset/preprocessedData/",tt);
To read your tall data back, use the
datastore
(MATLAB) function.ds = datastore("s3://MyCloudData/TrafficDataset/preprocessedData/"); tt = tall(ds);
To use datastores to write files or data of other formats, see Getting Started with Datastore (MATLAB).