Automatically Copy and Transfer Files between Cloud Storage like Dropbox, Box, Amazon S3 and Google
It’s almost hard to remember the world before cloud-based storage and backups. Now cloud storage is a ubiquitous, core utility, provided by services like Dropbox, Box, Amazon S3, Google Drive, Azure Storage and OneDrive.
But, while it’s easy to get data into a cloud storage service, it isn’t always as trivial to transfer it out to another service. For instance, you might need to:
- Migrate to a different cloud storage provider
- Backup certain files to a secondary cloud storage provider
- Copy certain files from a main store to an individual user’s store
- Transfer files to a third-party vendor’s cloud storage account
- Sync one storage with the latest files from a different storage
Thankfully, third-party APIs have come to the rescue. This tutorial will provide you with a simple example of how to utilize the Flex.io API to copy files and/or directories from one cloud storage account to another, filter files based on criteria like name, extension type or size, and schedule your process so that future transfers happen automatically.
Before jumping into the tutorial, here is a GitHub repo of what we’re about to build:
Summary: This data feed pulls all files from an S3 directory and copies them to a Dropbox folder. See below for additional permutations on filtering by file name, type, etc.
In this tutorial, we’ll do the following:
- Create a connection for two different cloud storage accounts.
- Access the list of files from your input directory.
- Build a simple pipe to move one file between the storage services.
- Build the loop to read files from the input data store and write to the output data store.
- Pull it all together and deploy the pipe.
To get started, you’ll need the following:
- Access to at least one cloud storage account, like Dropbox, AWS S3, Box, Google Drive, etc.
- A Free Flex.io API Key and the Flex.io SDK
For this tutorial, we’ll need a connection for input (in this case an AWS S3 directory) and a connection for output (in this case a Dropbox folder).
The Flex.io application has a Connections keychain for storing credential information and referencing them in your code via an alias. Here is a guide on setting up a connection in Flex.io.
Here are the example aliases we’ll reference in our code snippets below:
- AWS S3 Connection Alias:
- Dropbox Connection Alias:
Once your connections are set up, you’re ready to create your data feed. Simply swap out the default aliases with your own connection aliases.
For the input in this tutorial, we’ll be using the sample AWS S3 directory that is automatically provisioned when you sign up for a Flex.io API key.
Here’s the pipe to get the list of files from the AWS S3 root directory using the connection alias from Step 1:
If you run this pipe as is, you’ll get a result like this listing the nine files in that directory, including CSV files and images:
Now that we have confirmed our data access to S3, let’s simply copy one file,
contacts.csv, to our Dropbox account (in this case, our root folder):
read task reads the file from the S3 directory and the
write task copies it to Dropbox. Run the pipe and you should see the new file appear in Dropbox.
Now that we have a single file being copied, let’s move everything over. We’ll now create a loop that reads each of the files in S3 and copies them to Dropbox into a timestamped folder we’ll call
This code uses a
foreach task to loop through the files in our list and process them. Using the same logic as in the previous step, we simply
read from S3 and
write to Dropbox, for each file. Run your pipe and your files will be transferred.
Once you’ve run your pipe, you can view your new folder by utilizing the
list task again, but this time with your Dropbox connection:
Now that you have your working pipe, you can deploy this pipe to run when you have refreshed data in your S3 account. For the setup in this tutorial, new data would simply overwrite the previous index with new data.
The pipe can be saved in your code or in the Flex.io app. It could be called via API endpoint or scheduled to run as needed. Click here for a guide on Flex.io deployment options.
To extend the tutorial above, here are some additional permutations you can try:
Instead of transferring all files from a directory, you can select specific file extensions by adding a wildcard (in this case, all
Instead of transferring all files from a directory, you can select specific file name variations by adding a wildcard (in this case, let’s say you have a directory of timestamped files that all follow this rubric
file-20180827:10:23:03.csv; you can pull out all the 2017 files using
Or, you could pick out specific filenames regardless of extension (in this case, any files that are named
Instead of transferring all files from a directory, you can select specific file based on size by adding a filter criteria (in this case, all files smaller than 50KB):
Instead of transferring all files from a directory, you can select specific file based on the date the file was modified by adding a filter criteria (in this case, all files modified after
We hope you found this tutorial useful. If you have any questions, shoot us a note using the chat button below. We’re happy to help and look forward to seeing what you can build!