Sentiment analysis from GDELT data

Added on by Dario Villanueva.

For the last couple of weeks i've been working on a little collaboration with KPMG to create a quick demo dashboard to display sentiment analysis for cyber crime news data from the GDELT project.

Snapshot of the Sentiment display pane

Snapshot of the Sentiment display pane

We filtered articles from the GDELT dataset to display only the ones containing references to cyber crime. Then, the data science team clustered them to distill topical themes, in order to identify individual stories, made up of several related articles.

For every day, the guys computed positive and negative sentiment averages, and I was tasked to plot them. I drew big inspiration from Moritz Stefaner's Emoto project - where the overall sentiment of the internet was analysed during the 2012 London Olympics.

The piece uses Poisson-disc distribution to progressively fill each sentiment arc with evenly distributed points, and then applies a Delaunay triangulation to obtain an organic looking tessellation of the space that would highlight the idea of sentiments being organic and changing.

A detail of the "growth" animation, displaying the growth of the arcs and a case where there is no data for the date selected.

A detail of the "growth" animation, displaying the growth of the arcs and a case where there is no data for the date selected.

The visualisation is the main focus of the piece - which consists on a dashboard allowing users to choose dates, and two information areas, the above visualisation and a list of stories, akin to tweets.

The stories are thus presented on a continuous scrollable list, on the right hand side of the dashboard, with a title chosen from the titles of the articles that belong to that story cluster.

Quick overview of the main dashboard, with the sentiment display, the stories and a datepicker chart

Quick overview of the main dashboard, with the sentiment display, the stories and a datepicker chart

The stories invite the user to explore more about the topic, and include a list of people, themes and organisations that are involved, drawn from the articles that the cluster algorithm joined together. The user can click these stories and is taken to a story focus page, where she can study the story in detail:

Detail story view

Detail story view

The story page displays the people mentioned in the articles, a tree-map of the themes from the stories, a map with the locations where the stories have taken place, and a list of articles for the user to go off and read.

GDELT does not provide images for the people that are mentioned on the themes, so i wrote a node.js express app that adds routes to allow to add <img> tags with an src that directly points to a bing search. The images are cached (obviously) but if a new one is found, an API query is sent to bing images with a new one. The search is tailored so that portrait images are always returned. The result is 90% good.

Can you guess what this story was about by just looking at the pictures?

Can you guess what this story was about by just looking at the pictures?

The picture collage has the effect of instantly inviting the user to connect the people in the pictures to a theme, and users instantly relate to this. The only downside is that sometimes the GDELT data set identifies places as people, which in turn get translated to strange portraits, so a certain degree of editorial eye was required.

One of the requirements of the dashboard was to display a map of the locations where the articles had taken place. Instead of displaying a flat square projection, I decided to used an orthographic projection to provide a 3D effect with an orbiting globe. The result is surprisingly performant on desktops, although it would need an alternative in phones (i.e. maybe render it on canvas), as the orbiting animation runs very slow on my Nexus 5:

Slow orbit adds to the attractiveness of the dashboard

Slow orbit adds to the attractiveness of the dashboard

Tools used (in no particular order):

  • Gulp for tasks (first time using it over grunt, I like it, but the watch task needs to stop dying every time i forget to add a comma to my JS!)
  • Browserify for dependencies in the browser. NPM >> Bower for sure, although i had a few headaches getting some things to work (nudge nudge masonry)
  • React - it's just so quick to sketch things, evolve designs, etc.
  • D3 - well yeah.

Thanks for reading, you can have a look at it here

 

Visual Identity Prints

Added on by Dario Villanueva.

The prints from my Visual Identity workshop at the floating cinema are available. I've selected 24 of the audio fingerprints and created a limited run of 20 prints of them. They are all from people who attended the workshop and volunteered to have their speech analysed and turned into colours:

The printing was done at The Printer of Dreams who did a cracking job of reflecting the RGB colours onto matt coated Xativa 230gsm neutral-white paper. The posters look great on the "flesh" - with vivid tones and attention grabbing shades.