Main Content

Transfer Data to Amazon S3 Buckets and Access Data Using MATLAB Datastore

To work with data in the cloud, you can upload it to Amazon S3™ and then use datastores to access the data in S3 from MATLAB® or from workers in your cluster.

Set up Access

To work with remote data in Amazon S3, first set up access by following these steps:

  1. Create an identity and access management (IAM) user using your AWS® root account. For more information, see Creating an IAM User in Your AWS Account.

  2. Generate an access key to receive an access key ID and a secret access key. For more information, see Managing Access Keys for IAM Users.

  3. Specify your AWS access key ID, secret access key, and region of the bucket as system environment variables in your MATLAB command window using the setenv (MATLAB) command.

    setenv("AWS_ACCESS_KEY_ID","YOUR_AWS_ACCESS_KEY_ID")
    setenv("AWS_SECRET_ACCESS_KEY","YOUR_AWS_SECRET_ACCESS_KEY")
    setenv("AWS_DEFAULT_REGION","YOUR_AWS_DEFAULT_REGION")
    
    If you are using an AWS temporary token, you must specify your session token instead of the region.
    setenv("AWS_SESSION_TOKEN","YOUR_AWS_SESSION_TOKEN")

    To permanently set these environment variables, set them in your user or system environment.

    Before R2020a: Use AWS_REGION instead of AWS_DEFAULT_REGION.

  4. If you are using MATLAB Parallel Server™ on Cloud Center, configure your cloud cluster to access S3 services.

    After you create a cloud cluster, copy your AWS credentials to your cluster workers. In MATLAB, select Parallel > Create and Manage Clusters. In the Cluster Profile Manager, select your cloud cluster profile. Scroll to the EnvironmentVariables property and add these environment variable names: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION. If you are using AWS temporary credentials, also add AWS_SESSION_TOKEN. For more details, see Set Environment Variables on Workers (Parallel Computing Toolbox).

Upload Data to Amazon S3 from Local Machine

You can upload data to Amazon S3 by using the AWS S3 web page. For more efficient file transfers to and from Amazon S3, use the command line. Follow these steps to upload data:

  1. Download and install the AWS Command Line Interface tool. This tool supports commands specific to AWS in your MATLAB command window.

  2. Create a bucket for your data using the following command in your MATLAB command window:

    !aws s3 mb s3://mynewbucket

  3. Upload your data using the following command in your MATLAB command window:

    !aws s3 cp mylocaldatapath s3://mynewbucket --recursive
    For example, upload the CIFAR-10 images data set from your local machine to Amazon S3:
    !aws s3 cp path/to/cifar10/in/the/local/machine s3://MyExampleCloudData/cifar10/ --recursive

Read Data from Amazon S3 in MATLAB

After you store your data in Amazon S3, you can use datastores to access the data from your MATLAB client or your cluster workers. For example, use an imageDatastore (MATLAB) to read images from an S3 bucket. Replace "s3://MyExampleCloudData/cifar10" with the URL of your S3 bucket.

  1. Create an imageDatastore object that points to the URL of the S3 bucket.

    imds = imageDatastore("s3://MyExampleCloudData/cifar10", ...
     IncludeSubfolders=true, ...
     LabelSource="foldernames");
  2. Read the first image from Amazon S3 using the readimage (MATLAB) function.

    img = readimage(imds,1);

  3. Display the image using the imshow (MATLAB) function.

    imshow(img)

You can use an imageDatastore object to read data from the cloud in MATLAB client, or when you run code on your cluster workers. For details, see Work with Remote Data (MATLAB).

For a step-by-step example that shows how to train a convolutional neural network using data stored in Amazon S3, see Train Network in the Cloud Using Automatic Parallel Support (Deep Learning Toolbox).

Write Data to Amazon S3 from MATLAB

You can use datastores to write data from MATLAB or cluster workers to Amazon S3. For example, to use a tabularTextDatastore (MATLAB) object to read tabular data from Amazon S3 into a tall array, preprocess it, and then write it back to Amazon S3, follow these steps.

  1. Create a datastore object that points to the URL of the S3 bucket.

    ds = tabularTextDatastore("s3://bucketname/dataset/airlinesmall.csv", ...
      'TreatAsMissing','NA','SelectedVariableNames',{'ArrDelay'});
    
  2. Read the tabular data into a tall array and preprocess it by removing missing entries and sorting the data.

    tt = tall(ds);
    tt = sortrows(rmmissing(tt));

  3. Write the data back to Amazon S3 using the write (MATLAB) function.

    write("s3://bucketname/preprocessedData/",tt);
    

  4. To read your tall data back, use the datastore (MATLAB) function.

    ds = datastore("s3://bucketname/preprocessedData/");
    tt = tall(ds);
    

To use datastores to read and write files or data of other formats, see Getting Started with Datastore (MATLAB).

Related Topics