ScienceOnline2012 Tag Cloud

I’ve always had a thing for tag clouds. I find tag clouds a useful visual representation of metadata; if they’re designed right, tag clouds can also look really good.

I captured all the tweets from ScienceOnline2012 last week, starting on Tuesday January 17th around 2:00 am and ending on Sunday, January 22rd around 10:30 pm. In total, there were 17,800 tweets comprising 302,592 words. I then set to work building a ScienceOnline2012 tag cloud.

From 302,592 words, 17,827 were unique. I calculated the frequency of all 17,827 words and then “cleaned” the data, removing all words less than 4 characters, numbers, and words that were either common words (such as “that”, “from” or “your”) or gibberish (consisting principally of url strings from shared links). I also removed the top two terms — scio12 (mentioned 18,087 times) and rt (mentioned 8,059 times) — since their frequency was so much higher than the third top term — science (mentioned 2,518 times) — which distorted the tag cloud.

The top 300 terms were them imported into Wordle and a weighted tag cloud was generated. Feel free to download any of the files below and reshare.

ScienceOnline2012 tag cloud

Here’s the top 10 terms from the cloud above:

Term Frequency
science 2518
scientists 958
session 884
people 720
great 655
good 618
boraz 591
maggiekb1 521
thanks 501
mireyamayor 489

All data and images are avaliable for download: low-resolution image, high-resolution image or raw data set of 300 words with frequencies.

Walter Jessen is a digital strategist, writer, web developer and data scientist. You can typically find him behind the screen something with an internet connection.

  • http://mistersugar.com Anton Zuiker

    This is great. I’m quite happy that ‘thanks’ is in the top 10.