HubSpot Dev Blog

Current Articles | RSS Feed RSS Feed

My favorite kind of Subversion commit - removing dead code

Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Share on Facebook Facebook | Share on Twitter Twitter | Share on LinkedIn LinkedIn 

There are only a few things that make me happier than removing old, dead code.

One of them is when a colleague beats me to it!

I always feel like dead, unused, old code is like a heavy coat around my body on a warm summer day.  I get this overwhelming desire to ditch it, throw it aside, get rid of it.

Unfortunately, it's not always easy to find or spot dead code.  As the amount of code grows large, and some of it moves into "legacy" or "just barely maintained" mode, this gets more challenging.

Even though HubSpot has only been around a little more than three years, we have some such code.  We're fairly productive at cranking out code, and not bad at refactoring when we need to (which is regularly, as in most agile organizations).  But we haven't been great at removing old / dead / unused code.

Which is why yesterday's Subversion commit, by @katzj, made my day:


Remove some old code that isn't at all relevant any more

 

As Patrick said "The last remnants of the original business model. And
code I wrote my senior year in college."

(Fri, 22 Jan 2010 18:11:25 GMT) 


 

 

Writing A Google Chrome Extension

Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Share on Facebook Facebook | Share on Twitter Twitter | Share on LinkedIn LinkedIn 

 

When Google released its Chrome web browser, a lot of people (including me) loved it, but were missing Firefox like extensions, But these days are over now and Google's browser supports extensions.

So I decided to write a simple extension to see if the delay for extensions public release was for a good reason.  Extensions have been present in Chrome for a while, but only in the Beta or Developer versions.

I wrote a simple extension that generates a short URL for any web page you are viewing at any tab using HubSpot's own URL shortener service, Hub.tm , you can install it on your browser by going to https://chrome.google.com/extensions/detail/jhbjofkkhbgdgpbfkppblkpgkbefafkg

 It took less than an hour to write the extension, a really short time.   The developer guide is very clear, and the framework is built nicely. 

 The framework is all JavaScript- and JSON-based, which makes things easy.  You are using JavaScript to do all your logic, and using JSON for configuration and message passing between your extension code blocks.

I loved how JSON is used as a configuration format, as we already do at HubSpot in some places.  In my opinion, the JSON configuration format is more readable than XML, assuming you are carefully naming keys in your JSON.

Other things I liked about the Chrome extension framework:

1- Extension files are  as any web application: images, .css files, .html files, .js files.  Even the configuration file is .json , so there is no funky or new files formats.

2- Permission logic, and how can you control which URLs your extension can work on, which JavaScript code to be loaded and when.

3- Built-in JSON serializer.

4- Attaching to browser specific event (like tab opened, closed , extension icon clicked..).

5- Isolated world scripts execution environment,  which means your extension JavaScript code will never conflict with any code included in any page viewed in the browser.  This is also good for security.

6- How its easy to include external JavsScript , CSS libraries with your ext code.  I use jQuery in my extension, and it's trivial.

7- You can make cross-domain AJAX calls , thanks to HTML5 support in the browser).

8- The HTML5 local storage

 

Nonetheless, it's not all trivial.  I also have a couple of tips on speeding up your extension development learning curve: 

1- If your extension works for any URL, you are doing a "Browser" action.

2- If your extension works for specific URLs, or pages that must have some sort of pattern (e.g page that have an RSS feed) , you are doing a "Page" action.

3- If you want to show some output to the user , use a "Browser" action "Popup" which is a  box that can contain inside it an HTML page (any HTML you want, including CSS, JavaScript, etc.)

4- If you want to organize your JavaScript code, you will be using "Content Scripts", but remember they have limits.  Most notably, they can't do any AJAX requests, but they can access DOM of viewed pages. 

5- If you want to have long-running tasks, or save state information, use the "Background" page.  This page is always on and always running , even if the user did not interact with your extension.

6- If you ever wanted to check how an extension was built, you will find all the files of that extension on your disk inside the Chrome folder ;)

 

  Overall, when you are writing an extension you will feel like you are writing a simple static web application (bunch of CSS , bunch of HTML, bunch of JS code , images..etc) , and everything you will be doing is something you know you have done hundred times before (e.g manipulating the DOM of a page).

That's it.  Happy Chrome Extension-ing ;)

Take as much time off as you need, and autonomy

Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Share on Facebook Facebook | Share on Twitter Twitter | Share on LinkedIn LinkedIn 

As of today, there is a new HubSpot vacation / time-off policy: take as much time off when you need, when you need it.

No paperwork, no forms, no special accounting.  Let your teammates know.  That's it.  No fine print, no exceptions. 

Why did we do this?  I could try to explain with my own reasoning, but I would not do as good a job as Dan Pink (@DanielPink), in this awesome and inspiring TED Global 2009 talk:


 

I am a big believer in what Pink says.  Especially for the type of work we do on the engineering team at HubSpot, which is highly creative, we should trust our teammates with autonomy as much as possible.  This is one big step in that right direction.

This specific talk I actually didn't know about until yesterday, when my colleague Prashant Kaw pointed it out on our wiki.  Thanks, Prashant!

But there are two related references, one directly called out in the TED presentation and one not, that I'm familiar with and highly recommend.  The first is Dan Ariely, who was one of my teachers at MIT, and his range of talks, videos, blog posts, and publications.  Ariely has an active blog, and practically all his papers are publicly available, including this one from 2005 that is referenced in the presentation.

The other resource not in this TED talk is the Netflix culture / philosophy presentation, available on Slideshare, which we've talked about in the past.  It, too, is excellent and well worth reading through.  Coincidentally, HubSpot now has the same vacation policy as Netflix ;)

 

 

Many companies will tell you a lot of things you want to hear when you think about working there.  But at very few companies is the level of flexibility, enlightment, and open-mindedness not just high, but pervasive in thole organization including the complete management team.

That's what it takes to make a difference in people's work lives which translates into personal lives.  It's not some HR-friendly talk about "work-life balance," but the real thing.

And this is just one of many reasons I'm happy to work here.

Eric Ries Lean Startup talk at MIT -- videos!

Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Share on Facebook Facebook | Share on Twitter Twitter | Share on LinkedIn LinkedIn 

Last month a bunch of us HubSpotters went next door to MIT to listen to Eric Ries talk about his Lean Startup framework.  The talk was organized by Tom Summitt of Genotrope, another local startup like HubSpot -- thanks, Tom!

For those of you not familiar with Eric Ries, please stop reading this blog.  Go read his blog, Startup Lessons Learned, instead.  Really.  I'm not joking.  You'll do yourself a favor, and be happy for it.

Eric helped found IMVU and some other companies, has a bunch of operating experience, and also serves as an advisor to some startups.  He is a gifted writer, and as I found out, a gifted speaker as well.

He has so much valuable stuff to say about startups, growing companies, improving the product, running experiments, and using validated learning about customers to move forward.  It's impossible to summarize.  In fact, I think he's writing a book to collect the thoughts from his blog.

In case you thought I was joking, I really mean it: go read his blog.

With Eric's permission, we video-taped his talk at MIT.  The resolution is not great, since we were a bit far away with the camera.  I apologize about that.  Nonetheless, we hope you enjoy.

Thanks to @KarenRubin and @Abdinoor for taping the talk, and to @DanMil for organizing the whole thing.  And of course, thanks to @EricRies for sharing his thoughts, experiences, and approach with all of us.

First half:

Second half:

Who Loves the Magic Undocumented Hive Mapjoin? This Guy.

Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Share on Facebook Facebook | Share on Twitter Twitter | Share on LinkedIn LinkedIn 

So, I've got this nice Hive join statement, joining a tiny little partition from one table against a sizable set of partitions from another.  And I'm running it, and it's taking a while.  And I can tell,from looking at the job, that it's doing the join reduce-side --meaning, it's generating the cross-product in the mapper, and then sending it over to the reducer to filter it down. 

But, clearly, this is a perfect fit for a map-side hash join (meaning, hold the entire tiny partition in memory in each map task + run no reducers at all).  If I was coding it myself, I could make this happen via a bunch of coding +some configuration trickery.  But, surely, Hive will make it easier, no?

The docs had little to tell me, but I saw Jira tickets about adding this ability, and finally found a mailing list message which had the magic incantation.  It's a hint within the statement, just convert this:

  SELECT t1.portal_id, t2.lead_id, t1.visit_time,

to this:

  SELECT /*+ MAPJOIN(t2)*/ t1.portal_id, t2.lead_id, t1.visit_time,

Done, and now my entire job is running in the mapper and is taking about 30% of the time it used to.  Woo.  Big points for Hive, for damn sure.

Why do we have an img element in HTML? Because shipping code wins.

Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Share on Facebook Facebook | Share on Twitter Twitter | Share on LinkedIn LinkedIn 

My colleague Steve L pointed out this great blog post, documenting the history of the img tag in HTML.  It's written like an investigative journal account, not a boring technical manual, although the technical details are all there.

Mark Pilgrim is the author.  Thanks, Mark.  I've just subscribed to your blog ;) 

It boils down to shipping code.  Get it out the door.  Release early and often.  Remember that facts exist outside the building, while opinions live inside.  That's why you want actual feedback from actual customers, not would-be / theoretical folks.  Your baby is ugly, but it will get prettier over time if you do this.


The first ever Boston Hadoop User Group (BHUG) meetup

Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Share on Facebook Facebook | Share on Twitter Twitter | Share on LinkedIn LinkedIn 

Last night HubSpot hosted the first-ever Boston Hadoop User Group (BHUG) meetup.  Organized by Dan Milstein and hosted at our office, the event had a good turnout and great talks.

I think we had about 45 people RSVP on meetup.com, and about 40 showed, so that's pretty good.  There was a lot of decent socialization, pizza, and beer.  In fact, Dan kind of had to get people to shut up so we could have a talk or two ;)

Boston Hadoop User Group meeting

Ryan from ScanScout gave a fun talk about Apache Hive, which generated some discussion, and we also had a few quick "lightning" talks, although I couldn't stay around long enough to hear all of them.  I think overall the level of energy was fairly high, and there is a lot of evident interest in these topics.

We've been getting positive feedback today and expressions of interest, so we'll definitely do another Hadoop meetup.  Whether we host it or someone else does, the group has promise, and you should join if you care about processing large amounts of data. 

Useful links: 

Boston Hadoop User Group on Meetup.com 

Hadoop itself (and Hive)

Ryan's talk slides

 

Useful script for backing up MySQL on an Amazon EBS block

Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Share on Facebook Facebook | Share on Twitter Twitter | Share on LinkedIn LinkedIn 

At HubSpot, we use MySQL a lot (and more every day), and most of our MySQL servers live on Amazon's Elastic Cloud.  We use Amazon's Elastic Block Storage (EBS) storage service to store the database files and data.

I wanted to point out a very useful set of scripts, from Eric Hammond and Assaf Arkin (my fellow Apache committer).  These scripts, one in Perl and one in Ruby, will take care of creating snapshots of MySQL EBS volumes for you, post the snapshots to Amazon S3 if you want, and discard old snapshots if you'd like.

Eric's Creating Consistent EBS Snapshots with MySQL and XFS on EC2, and Assaf's MySQL backups with EBS snapshots.

I hope you find these as useful as we have.  Thanks, Assaf and Eric!

Why do all deployment systems suck?

Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Share on Facebook Facebook | Share on Twitter Twitter | Share on LinkedIn LinkedIn 

At HubSpot, we have a pretty wide array of different things being used for the webapps running behind the scenes. This isn't surprising. There'a also some home-grown scripts (in python, as that's the scripting language of choice... something I'm not complaining about) to take care of deploying the various webapps. It works, but I really want to get it doing a bit more so that it's more useful and also get the different scripts doing a bit more sharing of code so that we can improve one place and get the benefits for everything.

Given that this seemed like a pretty typical problem, I figured I'd take a look and see what open source projects exist out there to see if any of them were suitable or could be at least close to a good fit for what we need and want. Unfortunately, I was kind of disappointed...

  • Capistrano seems to be the big player in this arena. It was originally written for Rails and still very very strongly shows that heritage. This isn't necessarily bad, but it makes it a lot harder to get to work if you're not doing something that's rails-like. There are some people who have gotten some things working with Java app deployments for tomcat, but they all feel a bit hacky. The other downside for me/us is that Capistrano is very much Ruby-based, both in how its own deployment language looks as well as some of the "how it depends on things working" aspects. Also, the fact that it's written in Ruby and thus a little bit more difficult for us to hack on if/when we run into problems is a point against. So it's probably a non-starter for now, or at least a pretty difficult sell
  • Fabric is written in python and seems to be following in the footsteps of Capistrano. Right now, it's far far simpler. This is in some ways good but some of the pieces that we'd want (eg, scm integration) aren't there and so I'd have to write them. And I'm not sure if the Fabric devs are really interested in expanding in that way; haven't sent email yet, but planning to tomorrow to feel it out.
  • Config Management + Binary deployment is the approach taken in Fedora Infrastructure for app deployment and it seems to be working pretty well there. It might be something to get to eventually, but that's going to be a longer term thing and I'm not actually convinced that it's really the best approach. For Fedora it grew out of only a couple of things which could be considered "webapps" and a lot of system config that has turned much later into more webapps. It also pre-supposes a bit more homogenous of an environment than we use at HubSpot from the work I did there
  • Func is something that a few people have been working on that I keep wanting to find a use for but it seems a little less well suited to doing a lot of java app building/deployment given that it's more https/xml-rpc based than shell based.
  • Roll your own is what we're doing now and what it seems like is pretty common. I don't necessarily like this, but it's certainly the path of least resistance

So, what am I missing? Is there some great tool out there that I haven't come found that you're using for Java (and more) webapp deployments? Bonus points if its python-based and pretty extensible.

List of startup-oriented events, meetups, and people in Boston

Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Share on Facebook Facebook | Share on Twitter Twitter | Share on LinkedIn LinkedIn 

A lot of people always ask us how we found out about HubSpot when it was a tiny, unknown startup.  We're still fairly small and that well-known, but this question keeps rising in frequency.

Many of us found HubSpot through friends or through professional networking.  There are HubSpotters at many Boston-area startup-oriented events.

Don Dodge compiled an excellent list of these events.  Check it out on his blog at http://dondodge.typepad.com/the_next_big_thing/2009/10/boston-startup-events-resources-people-you-need-to-know.html .

Don is too humble to mention himself as a nice guy and a great connector.  If you see him, say hi ;)  He will always be happy to listen and possibly help you out.

The other person I'd mention as a connector in this list is our co-founder, Dharmesh Shah.  He, too, is too humble to mention himself in any kind of list like this, so we have to do it for him.

If you're looking to find a hot startup in Boston, join one, form one, or just network with other entrepreneurs, this is a great way.  Maybe the best way.

Are there any events you like that are missing from this list?  The only one that came to my mind was the occasional co-working sessions and parties at BetaHouse

All Posts
Subscribe to our blog
Your email: