(image from NBCNews)

Has the power of maps gotten out of control?

How long has it been since you stumbled upon a COVID-19 related map? I wouldn’t be surprised if it’s only been a few hours. As the World Health Organization stated, the COVID-19 outbreak has been accompanied by a massive infodemic. Thousands of coronavirus maps were produced and will continue on being produced on a variety of sub-topics. We saw them on futuristic dashboards, in scientific papers, on television screens, in the interactive stories of online newspapers, and shared them on social networks faster than a virus.

Maps are among the most effective storytelling device ever created because they often hide…

Visualization can transcend the limited grammar of numbers

Imagine being an abolitionist in 18th century England and encountering data of this type:

Traders knew that many of the Africans would die on the voyage and would therefore pack as many people as possible on to their ships — in total there were 609 enslaved men, women and children on board this ship. […] Each person occupied […] spaces just 10 inches high and were often chained or shackled together in pairs, making movement even more difficult.

If your goal was to persuade others to reject and publicly condemn slavery, consider the effectiveness of these two examples in influencing…

Data labs may well be the corporate invention of the decade. With the unstoppable ascension of artificial intelligence and its nearest cousin machine learning, the words data scientist, deep learning, natural language processing and t-SNE visualizations are on the lips of every CEOs.

Well, maybe not the last one, though it looks so darn cool.

Hand-written digits visualized using t-SNE. See how they are already nicely separated even in two dimensions? (image from Towards Data Science)

Even if definitions sometimes differ, essentially the Datalab is the data startup of the company, able to go very fast, qualify the company data and develop machine learning projects. One of the core condition for it to work is to be at the center of the company’s ecosystem: the data, the business and the client.

But then, why would a datalab need data storytellers?

1. Data is ALWAYS about the story behind it

Stories are just data with souls”, says Dr Brené Brown in her TED Talk. I love this colorful sentence! Machine learning is usually filled with numbers, not much with emotions... …

The #MakeOverMonday challenge of December 5th was about divergent opinions on the status of their partnership between Germany and the United-States.

Original visualisation (article, data)

A few observations on this original plot :

  • It is composed of two images to answer basic questions about German respondents and US respondents
  • Two images makes it impossible to answer all possible questions in a single instant of perception (we need to look back and forth to compare Germany and the USA, which defeats the entire purpose of the visualization)
  • It is redondant because the sum of the percentages is always 100%
  • There is a lot of non-data…

In the previous article I was explaining how we could build a graph database of the Twitter stream using twitter4j and Apache TinkerPop on HGraphDB. If you are not familiar with these concepts I suggest your first read this piece.

Today, I will show how we can compute a few statistics on the graph using Gremlin, the traversal language of Apache TinkerPop, and Gelly, the graph processing engine of Apache Flink in a java application.

Prerequisite: Java, the gremlin console.

Part 1. What is gremlin and how to use it?

On the website, we can read “Apache TinkerPop™ is a graph computing framework for both graph databases (OLTP) and graph…

Today we’ll look into the creation of a Twitter graph, using the capabilities of the distributed storage system of Hadoop, HBase, and the java implementation of the Twitter API : twitter4j.

Prerequisites : Java and an IDE (preferably IntelliJ), HBase on a cluster or on local mode.

Part 1 : The graph schema

First thing first, let’s dive into the schema of the graph. The interested reader can look into the details of the Twitter API to better understand the User, Tweet and Entities object, which are implemented in Twitter4j.

Graphs databases are of great importance today and we can mention among the most popular the…

Mathieu Guglielmino

Data Storyteller.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store