Automatically Copy and Transfer Files between Cloud Storage like Dropbox, Box, Amazon S3 and Google

Copy Files between Cloud Storage Accounts

Migrate Files between your Cloud Storage Accounts Programmatically

It’s almost hard to remember the world before cloud-based storage and backups. Now cloud storage is a ubiquitous, core utility, provided by services like Dropbox, Box, Amazon S3, Google Drive, Azure Storage and OneDrive.

But, while it’s easy to get data into a cloud storage service, it isn’t always as trivial to transfer it out to another service. For instance, you might need to:

  • Migrate to a different cloud storage provider
  • Backup certain files to a secondary cloud storage provider
  • Copy certain files from a main store to an individual user’s store
  • Transfer files to a third-party vendor’s cloud storage account
  • Sync one storage with the latest files from a different storage

Thankfully, third-party APIs have come to the rescue. This tutorial will provide you with a simple example of how to utilize the Flex.io API to copy files and/or directories from one cloud storage account to another, filter files based on criteria like name, extension type or size, and schedule your process so that future transfers happen automatically.

Demonstration: Example and Code

Before jumping into the tutorial, here is a GitHub repo of what we’re about to build:

Fork the Source Code | Source Data

Summary: This data feed pulls all files from an S3 directory and copies them to a Dropbox folder. See below for additional permutations on filtering by file name, type, etc.

Let’s Start Building the Cloud Storage Transfer Data Feed

In this tutorial, we’ll do the following:

  1. Create a connection for two different cloud storage accounts.
  2. Access the list of files from your input directory.
  3. Build a simple pipe to move one file between the storage services.
  4. Build the loop to read files from the input data store and write to the output data store.
  5. Pull it all together and deploy the pipe.

To get started, you’ll need the following:

Step 1: Create your Cloud Storage Connections

For this tutorial, we’ll need a connection for input (in this case an AWS S3 directory) and a connection for output (in this case a Dropbox folder).

The Flex.io application has a Connections keychain for storing credential information and referencing them in your code via an alias. Here is a guide on setting up a connection in Flex.io.

Here are the example aliases we’ll reference in our code snippets below:

  • AWS S3 Connection Alias: tutorial-s3
  • Dropbox Connection Alias: tutorial-dropbox

Once your connections are set up, you’re ready to create your data feed. Simply swap out the default aliases with your own connection aliases.

Step 2: Access your Files from your Input Directory

For the input in this tutorial, we’ll be using the sample AWS S3 directory that is automatically provisioned when you sign up for a Flex.io API key.

Here’s the pipe to get the list of files from the AWS S3 root directory using the connection alias from Step 1:

Flexio.pipe()
.list('/tutorial-s3')

If you run this pipe as is, you’ll get a result like this listing the nine files in that directory, including CSV files and images:

[
{
"name": "cat-01.jpg",
"path": "/tutorial-s3/cat-01.jpg",
"size": 42113,
"modified": "2018-01-31T18:46:47+00:00",
"type": "FILE"
},
{
"name": "cat-02.jpg",
"path": "/tutorial-s3/cat-02.jpg",
"size": 58366,
"modified": "2018-01-31T18:46:48+00:00",
"type": "FILE"
},
{
"name": "cat-03.jpg",
"path": "/tutorial-s3/cat-03.jpg",
"size": 24188,
"modified": "2018-01-31T18:46:48+00:00",
"type": "FILE"
},
{
"name": "contact-list-1.csv",
"path": "/tutorial-s3/contact-list-1.csv",
"size": 71050,
"modified": "2018-01-31T18:46:49+00:00",
"type": "FILE"
},
{
"name": "contact-list-2.csv",
"path": "/tutorial-s3/contact-list-2.csv",
"size": 71565,
"modified": "2018-01-31T18:46:50+00:00",
"type": "FILE"
},
{
"name": "dog-01.jpg",
"path": "/tutorial-s3/dog-01.jpg",
"size": 49828,
"modified": "2018-01-31T18:46:51+00:00",
"type": "FILE"
},
{
"name": "dog-02.jpg",
"path": "/tutorial-s3/dog-02.jpg",
"size": 55655,
"modified": "2018-01-31T18:46:52+00:00",
"type": "FILE"
},
{
"name": "dog-03.jpg",
"path": "/tutorial-s3/dog-03.jpg",
"size": 142628,
"modified": "2018-01-31T18:46:52+00:00",
"type": "FILE"
},
{
"name": "sales-funnel.csv",
"path": "/tutorial-s3/sales-funnel.csv",
"size": 1330,
"modified": "2018-01-31T18:46:53+00:00",
"type": "FILE"
}
]

Step 3: Build a Pipe to Copy One File Between the Cloud Storage Services

Now that we have confirmed our data access to S3, let’s simply copy one file, contacts.csv, to our Dropbox account (in this case, our root folder):

Flexio.pipe()
.read('/tutorial-s3/contacts.csv')
.write('/tutorial-dropbox/contacts.csv')

The read task reads the file from the S3 directory and the write task copies it to Dropbox. Run the pipe and you should see the new file appear in Dropbox.

Step 4: Build a Loop to Copy all Files between Cloud Storage Account Directories

Now that we have a single file being copied, let’s move everything over. We’ll now create a loop that reads each of the files in S3 and copies them to Dropbox into a timestamped folder we’ll call backup-${process.time.unix}:

Flexio.pipe()
.list('/tutorial-s3')
.foreach(
Flexio.pipe()
.read('/tutorial-s3/${item.name}')
.write('/tutorial-dropbox/backup-${process.time.unix}/${item.name}')
)

This code uses a foreach task to loop through the files in our list and process them. Using the same logic as in the previous step, we simply read from S3 and write to Dropbox, for each file. Run your pipe and your files will be transferred.

Once you’ve run your pipe, you can view your new folder by utilizing the list task again, but this time with your Dropbox connection:

Flexio.pipe()
.list('/tutorial-dropbox')

Step 5: Deploy and Schedule the Cloud Storage Transfer Data Feed

Now that you have your working pipe, you can deploy this pipe to run when you have refreshed data in your S3 account. For the setup in this tutorial, new data would simply overwrite the previous index with new data.

The pipe can be saved in your code or in the Flex.io app. It could be called via API endpoint or scheduled to run as needed. Click here for a guide on Flex.io deployment options.

Additional Permutations

To extend the tutorial above, here are some additional permutations you can try:

Copy a Filtered Set of Files Based on File Extension

Instead of transferring all files from a directory, you can select specific file extensions by adding a wildcard (in this case, all .csv files):

Flexio.pipe()
.list('/tutorial-s3/*.csv')
.foreach(
Flexio.pipe()
.read('/tutorial-s3/${item.name}')
.write('/tutorial-dropbox/backup-${process.time.unix}/${item.name}')
)

Copy a Filtered Set of Files Based on File Name Variations

Instead of transferring all files from a directory, you can select specific file name variations by adding a wildcard (in this case, let’s say you have a directory of timestamped files that all follow this rubric file-20180827:10:23:03.csv; you can pull out all the 2017 files using file-2017*.csv):

Flexio.pipe()
.list('/tutorial-s3/file-2017*.csv')
.foreach(
Flexio.pipe()
.read('/tutorial-s3/${item.name}')
.write('/tutorial-dropbox/backup-${process.time.unix}/${item.name}')
)

Or, you could pick out specific filenames regardless of extension (in this case, any files that are named myfile.):

Flexio.pipe()
.list('/tutorial-s3/myfile.*')
.foreach(
Flexio.pipe()
.read('/tutorial-s3/${item.name}')
.write('/tutorial-dropbox/backup-${process.time.unix}/${item.name}')
)

Copy a Filtered Set of Files Based on File Size

Instead of transferring all files from a directory, you can select specific file based on size by adding a filter criteria (in this case, all files smaller than 50KB):

Flexio.pipe()
.list('/tutorial-s3')
.filter('size <= 50000')
.foreach(
Flexio.pipe()
.read('/tutorial-s3/${item.name}')
.write('/tutorial-dropbox/backup-${process.time.unix}/${item.name}')
)

Copy a Filtered Set of Files Based on Date/Time Modified

Instead of transferring all files from a directory, you can select specific file based on the date the file was modified by adding a filter criteria (in this case, all files modified after 2018-02-19):

Flexio.pipe()
.list('/tutorial-s3')
.filter('modified => "2018-02-19"')
.foreach(
Flexio.pipe()
.read('/tutorial-s3/${item.name}')
.write('/tutorial-dropbox/backup-${process.time.unix}/${item.name}')
)

Need Any Help?

We hope you found this tutorial useful. If you have any questions, shoot us a note using the chat button below. We’re happy to help and look forward to seeing what you can build!