ruby_dig Gem Adds Hash#dig and Array#dig from Ruby 2.3 to Earlier Versions

Introducing `ruby_dig`

It may take us some time to upgrade to Ruby 2.3.  But we’d like to be able to start using `dig` right away.  The `ruby_dig` gem solves this by adding the `dig` method to `Array` and `Hash` just like Ruby 2.3+ has natively.

The gem can be found on ruby_gems and on github.

Why Do We Need `dig`?

With the ever-growing popularity of JSON-based APIs, we all find ourselves writing code to “dig” through a parsed JSON response of nested hashes and arrays. This can be error-prone and tedious, so Ruby 2.3 added the `dig` instance method to Array and Hash to simplify the process.

For a simple example, let’s take some code that uses Github’s API to get the assignee of the first Pull Request for a given repo:

uri = Uri.parse("https://github.com/repos/:owner/:repo/pulls")
pulls_response = Net::HTTP.get_response(uri)
pulls_response.code == 200 or raise "Got non-200 response code: #{response.inspect}"

pulls = JSON.parse(pulls_response.body)

first_assignee = pulls[0]['assignee']['login']

But what if the response doesn’t come back in the expected format? Any of the `[]` operators above might return `nil` and then next `[] would raise the dreaded—and nearly useless—Ruby exception:

NoMethodError: undefined method `[]' for nil:NilClass

Here is the last line rewritten to use `dig` and to raise a useful exception if the format is unexpected:

#   first_assignee = pulls[0]['assignee']['login']
first_assignee = pulls.dig(0, 'assignee', 'login') or raise "Got unexpected response #{pulls.inspect}"

Implementation Notes

The dig method is implemented by calling `self.[]` so it will work with classes that derive from `Array` or `Hash`.  Most notably, `ActiveSupport::HashWithIndifferentAccess`.  Therefore `dig` will work fine in a Rails application looking in `params`:

params.dig(:user, :emails, 0, :friendly_name)
params.dig("user", "emails", 0, "friendly_name") # equivalent to the above

Also negative array indexes will work fine:

params.dig(:user, :emails, -1, :friendly_name) # find the last email friendly name

However, neither the Ruby 2.3 documentation nor the tests in the commit make it clear what should happen in `Array#dig` if you pass non-numeric index.  This can happen easily, if the result you got had a hash where you expected an array.  It seems in keeping with the spirit of `dig` that it should return `nil` in this case rather than raising an exception:

TypeError: no implicit conversion of Symbol into Integer

Building a Command Line Tool as a Ruby Gem, Part I

With the volume of work that software engineers do at the command line, the writing of command line tools to simplify and DRY up labor is an essential part of the development cycle. From the simple (adding alias ll='ls -laG' to your .bashrc) to the obtuse (alias searchCode='find -iname \*.rb | xargs grep $1') to the incomprehensible…

#!/usr/bin/perl
# fix_logs.pl
while (<>) {
    s/ \e[ #%()*+\-.\/]. |
       \r |
       (?:\e\[|\x9b) [ -?]* [@-~] |
       (?:\e\]|\x9d) .*? (?:\e\\|[\a\x9c]) |
       (?:\e[P^_]|[\x90\x9e\x9f]) .*? (?:\e\\|\x9c) |
       \e.|[\x80-\x9f] //xg;
       1 while s/[^\b][\b]//g;
    print;
}

…the modern developer would do well to build out their command line toolbelt.

Anyone who has struggled with proper tabbing in a Makefile or the correct spacing in defining a bash variable can attest to the difficulty (and learning curve) in writing command line tools using shell script. Fortunately, modern high-level programming languages like Python and Ruby provide structure to quickly and painlessly (really!) build command line tools. As Invoca is a Ruby shop, I’ll focus on Ruby.

Command line tools in Ruby are at the core of the work we do. A quick perusal of ~/.rbenv/shims reveals familiar names like bundle, gem, knife, rails, and rake. Each of these tools can be invoked from the command line, is implemented using the Ruby programming language, and typically eschews shell script altogether.

I recently used Ruby to build a command line sip client. The tool is a wrapper to another command line sip client, pjsua.

Now, why would I write a wrapper around an existing command line tool? Well, consider these two commands issued to perform precisely the same action (call a number on a sip server, wait 5 seconds, send a DTMF of 1, wait 5 seconds, hang up):

$ (sleep 5; echo "# 1"; sleep 5; echo h) | pjsua sip:8555550053@sip-server.net --log-level=0 --use-cli --id sip:8055551212@sip-client.com --playback-dev=2

$ invoke_call sip --promo-number=8555550053 --ringswitch-node=sip-server.net --call-scenario="wait 5, press 1, wait 5" --client-number=8055551212

While the first slouches toward obfuscation (is that a bash subprocess being piped into pjsua?), and which of its flags are essential is confusing, the second is hardly concise either. Fortunately, simply typing the basic command clarifies usage:

=> ~/ invoke_call
Commands:
  invoke_call help [COMMAND] # Describe available commands or one specific command
  invoke_call sip --call-scenario=CALL_SCENARIO --promo-number=PROMO_NUMBER --ringswitch-node=RINGSWITCH_NODE  # place a SIP call

Building out this tool in Ruby takes a complex, nested command that throws unclear errors when options are not passed correctly, and turns it into a straightforward command line tool with clear usage documentation and reasonable error handling.

The beauty of Ruby is that this is quick and easy to do using the extensive toolkit built by the Ruby community. I used bundle to instantiate my new gem’s directory structure and to build the final package to be installed by gem. I had my Cli class inherit from the thor gem and its extensive shortcuts for building command line tools.

Over the next few posts we’ll explore these Ruby tools, specifically rbenv, bundler, and thor, and walk step by step through building a simple command line tool using Ruby.

Postscript

If you want something pithy, do what I did:

function callmy {
  invoke_call sip --call-scenario="$2" --promo-number=$1 --ringswitch-node=$MY_COLLAPSED_IP
}

Then try

$ callmy 8555551212 “wait 5, press 1, wait 5"

Feedback on magic comment ‘immutable: string’

Charles Nutter, the mastermind behind JRuby, had some feedback on the magic ‘immutable: string’ comment we proposed in Ruby issue 9278,  (BTW our proposal is directly built on his Ruby 2.1.0 contribution, Ruby issue 9042.)

I figured it was worth copying the response up here where the formatting is more rich.

From Charles:

A magic comment should not completely change the semantics of a literal type. Encoding magic comments do not suffer from the same issue since they only change how the bytes of the strings are interpreted by the encoding subsystem…they do not change semantics.

I view this `immutable: string` comment as less intrusive than the encoding magic comment. The magic encoding was typically required in order get code to compile at all in 1.9. In this case the magic `immutable: string` comment is just a tip to Ruby saying “Feel free to optimize string literals here”. The comment is completely optional, as is the optimization. But it’s worth it because the optimization has a big speed payoff. 1.6X in this benchmark. I think most projects will run at least 1.1X faster.

A magic comment is far removed from the actual literal strings, meaning that every developer that enters a file will have to keep in mind whether the strings have been forced to be immutable before doing any work with literal strings.

I expect that the `immutable: string` magic comment would typically be dropped at the top of all files in a project. (As many did with the encoding comment when porting to 1.9.) Any project that has automated test coverage is going to want to put the comment throughout as a project policy, because of the speed payoff.  [BTW to make it easy to add this magic comment, we’ve cloned the magic_encoding gem into a magic_immutable gem here.]

Although this eliminates or reduces calls to .freeze, it causes the opposite effect to get a mutable string in the same file…specifically, you have to call .dup.

Yes, exactly! This magic comment eliminates a giant/infeasible/ugly change (`.freeze` after every string literal) and replaces it with a tiny/feasible one. That is certainly its goal.

Consider Rails. I just did a quick/approximate regex search to count string literals in Rails 3.2.12 and found on the order of 50,000 of string literals. How many of those do you think are mutated? I’m going to guess no more than 10 places across all of Rails. (I’m hoping to get some actual stats using a hack of this branch, but for a quick estimate I audited about 150 cases that mutate a string and didn’t find a single case where the receiver was a string literal.)

So if we add the magic comment in Rails, we’ll need to add `.dup` (or change to String.new) in up to 10 places instead of adding `.freeze` after 50,000 string literals!

I think it would be better to consider adding the String#f method proposed during the rework of .freeze optimizations. Adding a .f to a literal string is not very much to ask, and in your particular script, it would actually add *fewer* characters to the code than the magic comment.

I’m with Paul Graham, who argues that expressiveness is counted in elements (~tokens), not bytes. So `.f` is no more expressive than `.freeze`. Whether `.f` or `.freeze`, it’s still at least 1 element to be added after *every* string literal. One of Ruby’s main differentiators is its beauty and English-like readability. That’s certainly what drew me to Ruby. No one who feels that way is going to put `.freeze` or `.f` after every string literal.

In summary, there will certainly be projects that don’t bother with the `immutable: string` magic comment (for example, those that lack automated test coverage, or don’t run on MRI). But many projects will use it because they will get a big, immediate performance payoff in Ruby 2.1. We’re already looking forward to it!

Magic comment ‘immutable: string’ makes Ruby 2.1’s “literal”.freeze optimization the default

In Mutable strings in Ruby we saw that making Ruby strings immutable with .freeze can remove a common source of bugs.  We also saw that Ruby’s allowance for string mutation can cause significant performance degradation because string literals must be allocated (and later garbage collected) every time that code is run.

Ruby 2.1 takes a step towards addressing the performance problem with its “literal”.freeze (formerly “literal”f suffix) optimization. This addresses the performance problem described in Part 1, when you write ‘.freeze’ after your string literals:

def log_message(message)
  puts message + "[EOL]".freeze
end

The above change can make a big difference in performance.  But you have to remember to put ‘.freeze” after every string literal, which is ugly and impractical to do broadly.

# -*- immutable: string -*- Makes “literal”.freeze the Default

To be practical, we need a way to mark entire code files as having their string literals as frozen by default.  This pull request does just that with a magic ‘immutable: string’ comment at the top of the file.  Here’s a test that shows its usage:

# -*- immutable: string -*-

require 'minitest/autorun'
require_relative 'test2.rb' # file with a mutable string

heredoc_string = <<-EOS
  Hello World
EOS

strings = [
  "hello",
  %{Hello},
  %Q{Hello},
  %q{Hello},
  heredoc_string
]

def mutate(str)
  str.slice!(1, 2)
end

def log(message)
  "I'm logging: #{message}"
end

describe "strings defined in this file" do
  strings.each do |s|
    it "should raise an error" do
      -> {
        mutate(s)
      }.must_raise(RuntimeError).message.must_match(/can't modify frozen String/)
    end
  end
end

describe "string interpolation" do
  it "should fail" do
   -> {
      str = log("blah blah")
      mutate(str)
    }.must_raise(RuntimeError).message.must_match(/can't modify frozen String/)
  end

  it "should succeed" do
    -> { str = "foo#{some_string}" }
  end
end

describe "strings not defined in this file" do
  it "should be mutable" do
    mutate(Foo::CONSTANT)
    Foo::CONSTANT.must_equal "SING"
  end
end

describe "static strings" do
  it "should always have the same object_id" do
    def some_string
      "A nice frozen string!"
    end
    some_string.object_id.must_equal some_string.object_id
  end
end
# test2.rb
module Foo
  CONSTANT = "STRING"
end

Overriding immutable: string

Sometimes in a file with the magic comment you may actually need a mutable string.  That is easy to do with String.new or ”.dup:

# -*- immutable: string -*-

def concatenate(*args)
  result = String.new     # or:  result = ''.dup
  args.each do |arg|
    result << arg
  end
  result
end

Making Ruby More Functional

This new magic comment should allow some big performance gains and lessen bugs at the same time, taking Ruby one step closer to Functional Programming.  We sincerely hope this pull request will be accepted into the main Ruby 2.1 branch.

Questions or comments?  Feel free to post them here, email colin@invoca.com, or tweet to @colindkelley.

Mutable strings in Ruby

When people ask why we use Ruby instead of Python, I usually mention the beautiful, nearly-invisible English-like syntax, the ease of writing DSLs, trivially simple block arguments, meta-programming including reflection… But I have to admit that Python has a few clearly superior features.  First on the list: Python strings are better because they are immutable.

The Perils of Mutation

What’s so bad about Ruby’s mutable strings?  Consider this innocuous-looking Ruby method that puts “[EOL]” at the end of the message that is being logged:

def log_message(message)
  message << "[EOL]"
  puts message
end

LOOP_MESSAGE = "In the loop"

3.times do
  log_message(LOOP_MESSAGE)
end

Here’s the output:

In the loop[EOL]
In the loop[EOL][EOL]
In the loop[EOL][EOL][EOL]

What are those extra [EOL]s doing in there?!  Drop into pry and have a look at LOOP_MESSAGE after the loop has finished:

LOOP_MESSAGE
=> "In the loop[EOL][EOL][EOL]"

Uggh! Every pass through the loop mutated that “constant” LOOP_MESSAGE string.  Not good.

Avoid Mutating Methods and Operators

Let’s banish the nasty mutating “String#<<” operator:

def log_message(message)
  puts message + "[EOL]"
end

With that, the bug is fixed!

In the loop[EOL]
In the loop[EOL]
In the loop[EOL]

Besides String#<<, most of the other mutating operators to avoid end in !, like slice!, downcase!, etc.

Run-time Immutability with Object#freeze

Ruby has always had a run-time form of immutability.  If you call “.freeze” on an object, the Ruby runtime will raise an exception if any code tries to mutate that object:

>> pets = "cat"
=> "cat"
>> pets << " dog"
=> "cat dog"
>> pets.freeze;
>> pets << " canary"
RuntimeError: can't modify frozen String
 from (irb):8

So if you remember to freeze your string literals, you can avoid bugs with mutating shared references. You may have seen this done in your favorite Ruby gem or in Rails itself:

CONTENT_TYPE = "Content-Type".freeze

Performance Problems Caused by Mutability

Every time this code is called, the string literal “[EOL]” will be allocated again off the heap.  If this method were called inside the tightest loop in our code, the time to dynamically allocate this string literal (and later garbage collect it) can start to dominate the performance of the application.

The only reason Ruby goes to all this effort to allocate string literals every time the code is run is to allow the string to be mutated.  For example:

def concatenate(*args)
  result = ''
  args.each do |arg|
    result << arg
  end
  result
end

Non-Mutation is the Functional Way

Mutating variables is bug-prone, particularly in a language with shared references like Ruby, because side-effects from one method can change the behavior of another.  More generally, it is much harder for a programmer to understand the behavior of code that mutates because the name of an object is insufficient to predict its content—the programmer must also know the point in the object’s lifetime.

It may be less intuitive, but non-mutating code can actually perform better because objects that are never mutated can be freely shared, both by the language and the application.

Non-mutation is a cornerstone of the functional programming style.