I’ve always had a thing for tag clouds. I find tag clouds a useful visual representation of metadata; if they’re designed right, tag clouds can also look really good.
I captured all the tweets from ScienceOnline2012 last week, starting on Tuesday January 17th around 2:00 am and ending on Sunday, January 22rd around 10:30 pm. In total, there were 17,800 tweets comprising 302,592 words. I then set to work building a ScienceOnline2012 tag cloud.
From 302,592 words, 17,827 were unique. I calculated the frequency of all 17,827 words and then “cleaned” the data, removing all words less than 4 characters, numbers, and words that were either common words (such as “that”, “from” or “your”) or gibberish (consisting principally of url strings from shared links). I also removed the top two terms — scio12 (mentioned 18,087 times) and rt (mentioned 8,059 times) — since their frequency was so much higher than the third top term — science (mentioned 2,518 times) — which distorted the tag cloud.
The top 300 terms were them imported into Wordle and a weighted tag cloud was generated. Feel free to download any of the files below and reshare.
Here’s the top 10 terms from the cloud above:
Walter Jessen is a digital strategist, writer, web developer and data scientist. You can typically find him behind the screen something with an internet connection.