How to Embed a Live, Refreshable D3.js Chart into GitHub Pages

June 08, 2017 by Ken Kaczmarek

We recently had the honor to participate at PyCon (the preeminent annual Python conference) in Portland, as an invitee to Startup Row. It was truly wonderful to be a part of this fantastic community of coders and have a front-row seat to the bleeding edge of Python innovation.

Oh, and we gave out a lot of swag.

Flex.io swag and team

At our booth, we demonstrated a few different ways to pipe data to/from the cloud and manipulate it with Python. One of our pipes that got particular attention was one that powered a live D3.js chart sitting in a static GitHub Page.

In short, Flex.io provided a method to pass variables from the GitHub page to the pipe and then have the pipe feed back the result to the D3.js bubble chart — without the hassle of setting up a web server.

So, let’s see how we made it.

The Text Analysis App

First, let’s give the app a quick try:

Bubble chart app

  1. Click here to see the website.
  2. Enter a URL like: http://gameofthrones.wikia.com/wiki/Category:Characters
  3. Click the submit button and you’ll get a bubble chart that highlights the most-used words on the page.
  4. Move the slider’s minimum threshold to 4 and then word counts with 4 or more will be shown; slide the max threshold down to 18 and you’ll remove the most repeated words.

The Static Website

The code for the static website is pretty straightforward; you can find the source here. But, outside of the boilerplate code, let’s highlight the two main components we’re using:

D3.js Bubble Chart

All the dataviz magic happens with D3.js. For this one, we poked around this awesome D3 gallery and decided to use the Bubble Chart.

We simply grabbed the script snippet provided and pasted it into the page. It was easy peasy to make any minor tweaks from there.

Slider Component

Rather than just have a fixed word count threshold, we thought it would be fun to use an interactive slider so you could select the min/max word count and refresh the chart interactively. For this, we used the Ion.Rangeslider slider component and just made a few CSS modifications to simplify it a bit for our particular needs.

The Pipe and the Python

The pipe itself uses a bit of python to do its magic. Click here to copy this pipe into your own account and play with it.

In a nutshell, here’s what it does:

Step 1. First it takes the URL you submitted from the website.

input file: ${url}

Step 2. Then we use the Python library Beautiful Soup to scrape out the words used in the website.

from bs4 import BeautifulSoup

def flexio_file_handler(input,output):

    content = input.read()
    soup = BeautifulSoup(content, "html.parser")
    for script in soup(["script", "style"]):
        script.extract()
    text = soup.get_text()
    lines = (line.strip() for line in text.splitlines())
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    text = '\n'.join(chunk for chunk in chunks if chunk)
    output.content_type = 'text/plain'
    output.write(text.encode('utf-8','ignore'))

Step 3. We limit the results to 20,000, so we’re not truckloaded with, say, the 587,287 words from War and Peace.

Step 4. We then use the Python container defaultdict to group together all words with more than 3 characters and then count the number of times they’re used.

import json
from collections import defaultdict

def remove_non_ascii(text):
    return ''.join([i if ord(i) < 128 else ' ' for i in text])

def flexio_file_handler(input,output):

    content = input.read()
    content = content.decode()
    content = content.lower()

    d = defaultdict(int)
    for word in content.split():
        if len(word) > 3:
            d[word] += 1

    result = []
    for key, value in d.items():
        if value > 1:
            i = {"id": remove_non_ascii(key), "value": value}
            result.append(i)

    output.content_type = "application/json"
    output.write(json.dumps(result))

Step 5. In order to work with the D3.js Bubble Chart, we convert the JSON to CSV format since that is what the D3 Bubble Chart is expecting.

Step 6. For the final step we set up a filter based on the minimum and maximum slider values, which the user selects on the website.

filter where: to_number(value) >= ${min_threshold} and to_number(value) <= ${max_threshold}<br />

Tying it All Together with the API

There are two sections of code that tie the GitHub page and the pipe together.

Variables

The three variable names in the pipe are the ‘url’, ‘min_threshold’ and ‘max_threshold’ — these are defined in the code lines 57-63:

<form class="mt3 mb2">
  <div class="f5 dib pr1">Enter URL:</div>
  <input id="input_url" class="input-reset ba b--black-20 pa2 f6 w5" style="width: 400px" name="url" placeholder='"https://www.flex.io", etc.'>
  <button type="button" class="btn-submit border-box no-select ba ttu b f6 ph3 pv2 br1 white bg-blue b--blue darken-10">Submit</button>
  <input id="input_min" type="hidden" name="min_threshold" value="0">
  <input id="input_max" type="hidden" name="max_threshold" value="100000">
</form>

API call

The API call is found starting at line 214, which passes the variables to pipe and then, upon success, passes along the content of the pipe to the D3 Bubble Chart (we also do a client-side calculation to display the min and max threshold for the slider):

$.ajax({
  type: 'post',
  url: 'https://www.flex.io/api/v1/pipes/flexio-text-keywords-v1/run?stream=0',
  beforeSend: function(xhr) {
    xhr.setRequestHeader('Authorization', 'Bearer nmgzsqppgwqbvkfhjdjd');
  },
  data: $('form').serialize(),
  dataType: "json",
  success: function (content) {
    updateMinMax(content);
    updateSvg(content);
    ...
  }
});

Go Forth and Modify

This example app is fairly simple, utilizing off-the-shelf components like D3 and Python modules to do its magic. The Flex.io data pipe service brings along refreshability and repeatability, without the annoyance of provisioning or maintaining a server.

So, feel free to copy the pipe template and the hmtl and modify it with a different dataviz, data source or your own python magic. If you have any questions, just ping us and we’ll be more than happy to troubleshoot.