In a system the depends on multiple other remote systems (as is common in a service oriented architecture) it's often necessary to make multiple requests and combine the results. The simplest possible code for this makes each request in serial, one after the other, and combines the results. This is easy to write and easy to understand but scales poorly. It seems that everybody can agree that a parallel scatter/gather approach is more scalable but it seems like there is a lack of Ruby examples of the pattern.
Yeah, I know. There are 1,001 tutorials on CSS-only progress bars. They are mostly the same thing re-explained in a slightly different way. Not in a bad way, either. People learn in different ways and blog posts provide different descriptions that might strike a chord with someone. This post is one more description of one more approach, with one feature I have found lacking in most other tutorials: sub-glyph coloring.
While interviewing prospective engineering candidates for my previous jobs I got thinking about the components of the resume. I've been thinking about a few common resume/interview components that struck me as ridiculous when I was younger but I've come to understand over time. Some of this introspection is thanks to Greg Pass with whom I never discussed this but who got me thinking about what's important in interviewing and hiring outside of technical qualifications. This post is about the Hobbies and Interests section of the resume.
I'm currently in training for a half marathon. I've been training for a month and the race is in two weeks. My distance has gone from a slow 4 mile run to 11 miles in an hour and a half (just under 8:00 / mile). This might lead you to believe I like running but you'd be wrong. I hate running. I'm not getting paid to do it so the Yuppie Nuremberg defense is right out the window – as it usually should be.
There is no shortage of people calling for the end of software patents. For the most part large corporations are in support of software patents while smaller companies and individuals are on the whole against them. There are exceptions but that seems to be the break down. I'm shocked to see the agile software adherents talking about a delete/re-write of a system they only partially understand. We need to start with incremental changes and see if we can move toward working system before we consider razing the boats. Personally I see some value in software patents but believe the bar should be higher for what is patentable. I also think there are some limits that can be put in place without needing to destroy the whole system.
I've made fun of Node and argued against its use in a production environment. I've questioned its architecture and its entire reason for being. I've mocked the brogrammers hocking it as magic scaling sprinkles (as I did to the Erlang proponents before them). My main issue with Node at the start was that it had simply not been tested to withstand the type of abuse a high-scale production site would put on it. Now that I don't work on a high scale, performance critical site I can take a step back and read up on some of the things I've been shunning all of these years.
I'm thrilled to announce
libcld, a stand-alone C++ library for the Chromium Compact Language Detector. As someone who works mostly in higher level languages I'm also thrilled to say that bindings for Ruby, Python and Node are also included with Java coming soon. This is based on the awesome work by Mike McCandless who extracted the CLD code and wrote the Python bindings. I've since focused on an improved build configuration, higher level language bindings and a Homebrew formula. Read on for an introduction to installing and using
I stream my music and store my pictures online. Home stereos moved from single CD to the multi-disc carousel but DVD players have remained oddly behind the times. There are a few DVD changers on the market but that's only a stop gap solution and it requires that you plan ahead for what you're going to watch. I was in search of a way to have every movie I own just a few clicks away. I've found a solution via Apple TV, my Mac and RipIt. I spent time searching online and there wasn't much out there so I hope this post answers at least one persons question before they buy an Apple TV. This is aimed at people who don't want to play with encoding options and just want things to work out of the box.
After a minor accident I needed to replace the rear fender on my 2001 Vespa ET4. The person who rear-ended me was curious what was entailed in replacing the part but we couldn't find a good time for him to visit while I made the repairs. I busted out my handy iPhone and took pictures of my repair progress so I could share. This post is really aimed at that one person but should anyone need to replace the fender on an ET series (especially one with crash bars) hopefully this helps.
Browser incompatibility is so 1999, isn't it? Well, while we spend our time fretting about IE version incompatibility and cross-browser issues we often overlook the version issues of other browsers. Over the past week I've been working on the twitter-text-js support for hashtags in Russian, Korean, Japanese and Chinese. Along the way I ran into two bugs in some versions of Safari that surprised me. I didn't find much online about it so I wanted to take a moment and jot this down.
I'm proud to announce the release of what is possibly the smallest Ruby gem I've ever worked on, R2 (R2rb on github, simply r2 on rubygems.org). Anybody who has read my older posts knows that I'm interested in Arabic, and more specifically Arabic information processing. While talking about something unrelated I found out that Dustin Diaz has written a Node.js module called R2 for mirroring the appropriate CSS values needed to alter the directionality of a page. While this isn't a silver bullet it does do a very good job on pages that have successfully separated presentation from markup (read as: don't use inline CSS styles).
This is post is about how I have come to use the words "crowdsourced" and "community" to distinguish different, but related, activities. I've been working on Twitter's community translation tools since before they were launched and this is a lesson I've learned during that time. This all started with my reply to a Quora topic and much of the information was already covered there. But since Quora is a smaller community than the web at large I wanted to re-format the information for widest consumption and change some of the examples to be a little bit clearer.
Much like my previous posts about omnivores and MC Hammer this is something I've told many people in person but I'm only just now putting down in writing. Many people ask me why I think Twitter is popular, thinking there is some part of it they have yet to see. The 'killer feature' isn't some page on twitter.com they haven't seen yet but rather it's the simplicity of what they already see. It's not about something complicated but rather the sum of the uncomplicated parts … not unlike the internet itself. I'm way ahead of myself. This post is about the evolution of people's self-expression on the internet, people's internet-identity, and how I see Twitter in that context. This isn't some lofty vision from a Twitter founder or executive. This is the view from a guy who just happened upon all of this and is still trying to explain it however he can – to himself most of all.
I used MySQL for a great many projects over the years with the assumption that
a charset of
utf8 and a collation of
utf8_unicode_ci was going to support
all of UTF-8 and that was all I need to do. I was sorely mistaken but there
was no point in writing until now, because MySQL 5.5 has finally helped
rectify the issue. Up until MySQL 5.5 (released December of 2010) the UTF-8
support was severely hobbled. With MySQL 5.5 the server can now support the
full range of characters that UTF-8 allows but it's not the default behavior.
There are still plenty of pitfalls for the naïve developer starting out with
If you've seen me talk at a conference or meet-up about Twitter you've likely heard about MC Hammer (@mchammer on Twitter). Well, since I don't do that many speaking engagements I wanted to take a moment and record my story about about what MC Hammer taught me about Twitter … and I already worked there.
So the new Twitter redesign (a.k.a. #NewTwitter) is out in the wild at last, even if it's only a small percentage of users. Soon enough we'll all have access but even before that I wanted to write about customizing #NewTwitter using Grease Monkey. Much has been said about the new right side "Detail Pane" real estate as a platform but I don't know about any of that. I suspect that annotations and the Details Pane will be a match made in heaven but that's not something I heard at the office, just my personal view as a former Platform team member, and former 3rd party Twitter developer. What I'm interested in right now is customizing the Details Pane for myself using Grease Money.
I've been working for a pretty early stage and popular start up for a few years now and I've learned some things. None of what I've learned is news to people who have been through the start up mania, and I bet there are better posts out there on the internet. These are my personal ramblings about my experience and might not reflect anyone else's experience. Having said that, when I was making my decision to join a start up I wanted an informal description of this arc and all I found where venture capitalists and people yearning for the bygone '90's bubble. I'm neither of those. I'm just a guy who likes to play with badly formed analogies.
I have many friends who are vegetarian to one degree or another and I'm happy to accommodate that. I respect their choice in the same way they respect mine. In asking around about people's dietary restrictions over the years I've found a group who annoy me: The Dishonest Omnivore.
Like all aspects of computers Unicode has its own security issues. And like all Unicode issues most engineers spend their entire professional career trying to avoid dealing with them. It's ok, you can be honest, I understand. When I gave my talk about Twitter International at Chirp (the Twitter developer conference) I mentioned some of these issues. After that talk I was surprised how many people who know more about internationalization than I do said they hadn't considered some of these issues.
After reading Alex Payne's post on heroism (Don't Be A Hero) I have to say I was a little irked. I disagree somewhat on the details of what defines a hero in this context and that seems to be the crux of my discomfort. I don't think hero's have to work until four in the morning. Nor do I think a hero creates inherently lower quality software. A hero is someone so dedicated and passionate about what they are doing that they are willing to work hard and deliver when other people are not (and for some people, what they are passionate about is not low-quality "feature work"). For some "heros" this becomes late nights, for others early mornings, and for still others it's a during the day activity with no extra time. I'll be honest, that last case is pretty rare, because the passionate usually see time as flexible and success as a rigid goal.
Unicode support in Ruby doesn't get much attention. Most of the information about it focuses on MySQL more than it does on actual Ruby support. Ruby can read and write Unicode data without much trouble but actually working with it, and moreover making sure it does not get corrupted, is one of the lesser visited back-alleys of Ruby. Hopefully I can make some more time to blog about other Ruby/Unicode interaction but I have to start somewhere so Regular Expressions are as good a place as any. Perhaps better since they're their own dark art.
Tokenization refers to splitting any data into chunks, and in the case of this
post I'm focusing on splitting text into words. The process of turning free-
form text into individual pieces of information (word, phrases, sentences,
etc) is something that natural language parsing (NLP) researchers have been
interested in for years. There is a whole field of study on the subject that
this post does not hope to even touch on. For developers with no language
experience this process is usually overlooked as absurdly simple, I mean
split(/\W+/), right? If you nodded then this is for you. If you think that
was overly simple this will probably be old hat.
When English speaking developers first encounter languages like Hebrew or Arabic where things are written from right to left they react in one of two ways. Either they see this as insurmountable to support in their application or they feel the opposite and assume that since they have UTF-8 everything will just work. While most modern programming languages support UTF-8 encoding that does not mean that everything does it correctly, and often the right-to- left layout is an overlooked part of UTF-8 support. This post hopes to clarify a little bit about right-to-left processing and Arabic in particular since I speak some of that and it inspired this post. For the more detail oriented please note that I've skipped any discussion of endian-ness.
There have been a few questions on the Twitter API development list asking about how search.twitter.com is able to detect the language of a tweet. The methods used are nothing new to the field of natural language processing (NLP), but most developers haven't studied much NLP. I'll cover the industry standard method we're using, as well as the shortcomings. I'm a language geek but not a linguist or NLP scientist so I started with a knowledge of programming but not of the existing techniques for language detection. I was able to recognize spoken and written languages I didn't speak and that sparked my interest in what I was gleaning that information from. I'm no protege so there must be some simple mental process. I thought language-specific search would be nice so I read up and started on the code.