Who Loves the Magic Undocumented Hive Mapjoin? This Guy.

So, I've got this nice Hive join statement, joining a tiny little partition from one table against a sizable set of partitions from another. And I'm running it, and it's taking a while. And I can tell,from looking at the job, that it's doing the join reduce-side --meaning, it's generating the cross-product in the mapper, and then sending it over to the reducer to filter it down.

But, clearly, this is a perfect fit for a map-side hash join (meaning, hold the entire tiny partition in memory in each map task + run no reducers at all). If I was coding it myself, I could make this happen via a bunch of coding +some configuration trickery. But, surely, Hive will make it easier, no?

The docs had little to tell me, but I saw Jira tickets about adding this ability, and finally found a mailing list message which had the magic incantation. It's a hint within the statement, just convert this:

SELECT t1.portal_id, t2.lead_id, t1.visit_time,

to this:

SELECT /*+ MAPJOIN(t2)*/ t1.portal_id, t2.lead_id, t1.visit_time,

Done, and now my entire job is running in the mapper and is taking about 30% of the time it used to. Woo. Big points for Hive, for damn sure.

The Road to My First Product Analyst Position

I didn’t grow up knowing I wanted to be a product analyst. Honestly, the role probably didn’t even exist when I was a child. It wasn’t until about ...

Erin Wilt (She/Her)

on Oct 15, 2020

Product

Growth Doesn't Stop When You Get The Job

Why you should absolutely, definitely, and confidently apply to join the Product Operations team.

Richard Ng Villalobos (He/Him)

on Oct 29, 2020

Engineering

Doing More with Less Using Bayesian Active Learning

In order to reduce our data labeling needs, the AI Product Group at HubSpot is implementing an Active Learning based approach to choose samples from ...

Mukul Surajiwale (He/Him)

on Oct 6, 2020

Who Loves the Magic Undocumented Hive Mapjoin? This Guy.

Recommended Articles

The Road to My First Product Analyst Position

Growth Doesn't Stop When You Get The Job

Doing More with Less Using Bayesian Active Learning

Join our subscribers

Who Loves the Magic Undocumented Hive Mapjoin? This Guy.

Recommended Articles

The Road to My First Product Analyst Position

Growth Doesn't Stop When You Get The Job

Doing More with Less Using Bayesian Active Learning

Join our subscribers

Get Connected