Unzip, clean and transfer data files from SFTP

Introduction

In this tutorial, you’ll learn how to access files on an FTP site, clean them, and then send the processed data to another FTP site.

Specifically, you’ll learn how to:

  • Connect to source and destination SFTP sites
  • Unzip an SFTP file and read it into a data stream
  • Convert, clean and transform the data
  • Write the data out to another SFTP site
  • Run the process daily

For this example, let’s say you have a gzip CSV affiliate data feed file on a particular FTP site that looks like this:

"item", "category", "description"
"123", "diamond", "1 caret, E color, round shape"
"465", "diamond", "2 caret, A color, square shape"

In this tutorial, we’ll convert the file to JSON format and then send it on to another FTP site for staging, prior to updating an e-commerce website:

[
{ "item": "123", "category": "diamond", "color": "E", "weight": "1", "shape": "round" },
{ "item": "465", "category": "diamond", "color": "A", "weight": "2", "shape": "square" }
]

Setup

To complete this tutorial, you’ll need:

  • A Flex.io account and API key
  • Access to an FTP location containing the data feed file you’d like to process
  • Access to an FTP or HTTP location where you’d like to put the processed data

Connect to source and destination SFTP sites

  1. To connect to these SFTP services, navigate to the Connection page and add your Connections.
  2. For the SFTP service you’ll copy from, give the connection the alias example-process-sftp-file-source. This will be used as part of the path in the code that performs a transfer, similar to a network drive mapping.
  3. For the SFTP service where you’ll put the processed data, give the connection the alias example-process-sftp-file-target.

Unzip an SFTP file and read it into a data stream

First, load the file from the source SFTP site.

  1. Create a new pipe in Flex.io

  2. In the pipe, you’ll see an initial Execute Task; make sure the language option is set to Python, and then overwrite the example “Hello, World” code with the following:

Change the input_path value to the path of the file you’d like to load. For example, if you’d like to load a file in a particular folder /temp/my-feed-file.zip, you’d set the value of input_path to example-process-sftp-file-source:/temp/my-feed-file.zip.

  1. After saving your changes, press the “TEST” button in the upper right of the Pipe Builder to run the pipe. You should see the first part of the file in the pipe output:
PK
  1. Now, you’ll modify the existing execute task. This updated code adds the logic for reading the file inside the zipped file. Copy the code below and replace your existing code with the following:

Convert, clean and transform the data

Next, transform the data into the output format. For this tutorial, we’ll assume the following output from the previous step, but your actual data may differ:

"item", "category", "description"
"123", "diamond", "1 caret, E color, round shape"
"465", "diamond", "2 caret, A color, square shape"
  1. Add a second Execute Task below your current unzip Execute Task. Make sure the language option is set to Python and then overwrite the example “Hello, World” code with the following:
  1. Run the pipe again by clicking the “TEST” button and you’ll see something like:
[
{
"item": 123,
"category": "diamond",
"description": "1 caret, E color, round shape"
},
{
"item": 465,
"category": "diamond",
"description": "2 caret, A color, square shape"
}
]
  1. Now, modify the execute task you just added, adding the logic for transforming the data:
  1. Run the pipe again by clicking the “TEST” button and you’ll see something like:
[
{
"category": "diamond",
"weight": "1",
"shape": "round",
"color": "E",
"item": 123
},
{
"category": "diamond",
"weight": "2",
"shape": "square",
"color": "A",
"item": 465
}
]

Write the data out to another SFTP site

Now that you have your clean JSON, you can write the data out to another SFTP site.

  1. Add a third Execute Task below the transformation step and paste in the code below. This task will write the output from that task to an SFTP file:

Change the output_path value to the path of the file you’d like to load. For example, if you’d like to load a file in a particular folder /temp/my-feed-data.txt, you’d set the value of input_path to example-process-sftp-file-target:/temp/my-feed-data.txt

  1. Run the pipe again by clicking the “TEST” button, and you should see the “success” response in the pipe output, as well as the output file in your target SFTP site.

Run the process daily

Finally, you can schedule the process to run daily.

  1. First, turn on your pipe to activate it.

  2. In your pipe, scroll down to the bottom Deployment section and check the box next to Run on a schedule.

  3. Finally, click on “Edit” and set up the schedule you would like to use. You can turn off your pipe at any time to terminate the scheduling.