Merging Syntax

This is a little programming thing. and if that’s not your thing, please skip it and have a great weekend.

Like most longtime progammers, I prize brevity. That is: I am lazy.   Given a long sequence of data-frame merging operations to perform in R, I thought a binary-operator shorthand for natural joining of two data frames sharing a single common column might be handy. (That’s cetainly a special merging case, but also not uncommon.)

So something like:

joined_df  <- df1 | df2 | df3;

in lieu of a long sequence of merge() calls or equivalent subsetting logic.  I attached a short file that implements the “pipe” shorthand as an S3 subclassing and Ops-override. I used subclassing logic interally rather than merge (probably faster), but either would be OK. Examples are included, in case you want to take a look.  (Subclassing of the data frames is necessary, as | already has a use for data frames, so there is an appropriate constructor that works just like data.frame().  )   In my test cases (400K rows) performance was acceptable.


So, can you implement the shorthand in R?  You can…. As for whether something like this is a good idea in general, I’m not entirely sanguine.   Syntax can actually be too brief if we lose relevant context (here the joining column).  But for a lot of simple joins, it is handy.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s