Welcome to EMC Consulting Blogs Sign in | Join | Help

SSIS Junkie

The Longest Tweet

I read an article today called The Longest Tweet In History which explained how it is now possible to send Tweets via Twitter that are longer than the supposed 140 character limit.

As it happens I’ve written a noddy app called Tweetpoll onto Windows Azure that periodically polls Twitter’s public timeline to determine the distribution of Tweet length and displays that distribution at http://tweetpoll.cloudapp.net/. Here’s a screenshot of the latest distribution:

image

This is based on a sample set of 2940150 tweets (so far). Unsurprisingly its a fairly smooth curve (I can’t explain the peaks though – probably a bug in my code) although before I started I expected the frequency of tweet lengths to increase exponentially and clearly that’s not the case because we have a sustained increase around 30-50 characters before dropping off again.

Anyway, I digress. Its been running a couple of months now and I became puzzled when, soon after launching it, I began getting results that were greater than 140 characters (as you can see on the graph above). I didn’t have an explanation for those numbers so I set about uncovering why. Three days ago I deployed an update to the app so that it now explicitly captures all tweets greater than 140 characters and stores them somewhere. Thanks to my new best friend LINQPad and the following query:

var results = svc.LongTweetsTable.ToList().Select(r => new { r.RowKey, Length = r.RowKey.Length})
                    .OrderByDescending(r => r.Length );

I can now find out what’s going on in those tweets:

image

Notice anything about those tweets? They all contain either “<” or “>” which are the escape characters for less-than/greater-than symbols in markup and hence the mystery is explained; the markup for tweets might well be longer than the actual tweet itself. Pretty logical if you think about it although it didn’t occur to me without actually examining the data. That’s an important lesson learned – make sure you know your data intimately.

@Jamiet

Published Friday, July 10, 2009 11:59 AM by jamie.thomson

Comments

No Comments
New Comments to this post are disabled

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems