Unzip and clean an affiliate data feed file from FTP

Introduction

In this tutorial, you’ll learn how to uncompress and transform an affiliate data feed file on an FTP site, send the processed data to another FTP site, and run this process daily.

Specifically, you’ll learn how to:

  • Download an affiliate data feed file from an FTP site
  • Unzip or uncompress the data feed file
  • Convert the data feed file format
  • Clean and transform the data
  • Send the processed data feed to another FTP
  • Run this process daily

For example, suppose you have a gzipped CSV affiliate data feed file on an FTP site that looks like:

"item", "category", "description"
"123", "diamond", "1 caret, E color, round shape"
"465", "diamond", "2 caret, A color, square shape"

When you’re done, you’ll be processing this file from one FTP site and sending the following uncompressed, converted, transformed data to another FTP or HTTP site every day:

[
{
"item": "123",
"category": "diamond",
"color": "E",
"weight": "1",
"shape": "round"
},
{
"item": "465",
"category": "diamond",
"color": "A",
"weight": "2",
"shape": "square"
}
]

Setup

To complete this tutorial, you’ll need the following:

  • A Flex.io API key
  • Access to an FTP location containing the data feed file you’d like to process
  • Access to an FTP or HTTP location where you’d like to put the processed data
  • The affiliate data feed file you’d like to process

To get a Flex.io API key, sign up for Flex.io, or if you already have an account, sign in. When you sign in, you’ll see your API listed in the “API Keys” section on the dashboard where you’re directed after you sign in.

Build it

To build the data feed, we need to do three basic things: 1. get the data out of a gzipped file on an FTP site, 2. convert and transform it, and 3. write the final data out to another FTP location.

First, let’s focus on #2 and build a basic pipe going that converts and transforms the data from the format of the affiliate data feed into a format we want. Once we have, this basic pipe working, we’ll build out steps #1 and #3 so that the data is pulled from the gzipped file and routed to the destination.

Create a pipe to transform the data

  1. Sign in to Flex.io
  2. Select “Pipes” from the list on the left
  3. In the pipe list, click on “New Pipe” in the upper right
  4. In the new pipe, give the pipe the name “Affiliate Feed” or something appropriate
  5. In the pipe configuration, paste the following code:
  1. Click “Save” in the upper right
  2. Click “Run” in the configuration area

In the pipe output, you should see the following:

"item", "category", "description"
"123", "diamond", "1 caret, E color, round shape"
"465", "diamond", "2 caret, A color, square shape"

This pipe simply echos some sample CSV data to simulate the feed.

Next, let’s convert it into a table and split out the product information in the description into separate fields.

  1. In the pipe configuration, paste the following code:
  1. Click “Save” in the upper right
  2. Click “Run” in the configuration area

In the pipe output, you should see the following:

[
{
"shape": "round",
"category": "diamond",
"weight": "1",
"color": "E",
"item": "123"
},
{
"shape": "square",
"category": "diamond",
"weight": "2",
"color": "A",
"item": "465"
}
]

When you run this pipe, 1) the “echo” task outputs sample data feed product info in CSV format, 2) the “convert” task this echoed output and converts it to a table, and 3) the “python” task reads through each row of the table, splits out the item description into separate fields, appends these to an array, then converts this whole array into JSON format and writes it to the output.

Now, let’s hook up this pipe to real data.

Modify the pipe to read and write to files on FTP

  1. In the pipe configuration, paste the following code:
  1. Click “Save” in the upper right

This pipe is similar to the pipe that transformed the data earlier, but it adds steps for connecting to the input and output SFTP sites (“connect”), reading the input data feed file (“read”), unzipping it (the task with the “unarchive” operation), and then writing it to the output (“write”).

To use this pipe, we need to fill out the SFTP connection info for the input and output files, as well as the input and output filenames.

  1. Edit the pipe configuration.
  2. Change the information in the “connect” step that has the “sftp-input” alias as follows: replace <input sftp host>, <input sftp username>, and <input sftp password> with the host, username, and password for the sftp site containing the data you would like to process.
  3. Change the <input filename> in the “read” step to the path of the file in the sftp input that you would like to process.
  4. Change the information in the “connect” step that has the “sftp-output” alias as follows: replace <output sftp host>, <output sftp username>, and <output sftp password> with the host, username, and password for the sftp site where you would like to send the data.
  5. Change the <output filename> in the “write” step to the path of the file in the sftp output where you would like to write out the processed data.
  6. Click “Save” in the upper right
  7. Click “Run” in the configuration area

When you run the pipe, you should see the process data from the input file appear in the output location.

Deployment

Once you’ve built the pipe, you’re ready to schedule it to run each day.

To schedule the pipe to run daily:

  1. In the pipe properties, click on “Options” next to “Scheduled” to open the Schedule dialog.
  2. In the Schedule dialog, select “Every Day” from the “Frequency” list and then set the time you would like to run the pipe.
  3. Click “Save Changes” to save the schedule.
  4. In the pipe properties, turn on the schedule by toggling the button next to “Scheduled”.