Author Archives: colin

ruby_dig Gem Adds Hash#dig and Array#dig from Ruby 2.3 to Earlier Versions

Introducing `ruby_dig`

It may take us some time to upgrade to Ruby 2.3.  But we’d like to be able to start using `dig` right away.  The `ruby_dig` gem solves this by adding the `dig` method to `Array` and `Hash` just like Ruby 2.3+ has natively.

The gem can be found on ruby_gems and on github.

Why Do We Need `dig`?

With the ever-growing popularity of JSON-based APIs, we all find ourselves writing code to “dig” through a parsed JSON response of nested hashes and arrays. This can be error-prone and tedious, so Ruby 2.3 added the `dig` instance method to Array and Hash to simplify the process.

For a simple example, let’s take some code that uses Github’s API to get the assignee of the first Pull Request for a given repo:

uri = Uri.parse("https://github.com/repos/:owner/:repo/pulls")
pulls_response = Net::HTTP.get_response(uri)
pulls_response.code == 200 or raise "Got non-200 response code: #{response.inspect}"

pulls = JSON.parse(pulls_response.body)

first_assignee = pulls[0]['assignee']['login']

But what if the response doesn’t come back in the expected format? Any of the `[]` operators above might return `nil` and then next `[] would raise the dreaded—and nearly useless—Ruby exception:

NoMethodError: undefined method `[]' for nil:NilClass

Here is the last line rewritten to use `dig` and to raise a useful exception if the format is unexpected:

#   first_assignee = pulls[0]['assignee']['login']
first_assignee = pulls.dig(0, 'assignee', 'login') or raise "Got unexpected response #{pulls.inspect}"

Implementation Notes

The dig method is implemented by calling `self.[]` so it will work with classes that derive from `Array` or `Hash`.  Most notably, `ActiveSupport::HashWithIndifferentAccess`.  Therefore `dig` will work fine in a Rails application looking in `params`:

params.dig(:user, :emails, 0, :friendly_name)
params.dig("user", "emails", 0, "friendly_name") # equivalent to the above

Also negative array indexes will work fine:

params.dig(:user, :emails, -1, :friendly_name) # find the last email friendly name

However, neither the Ruby 2.3 documentation nor the tests in the commit make it clear what should happen in `Array#dig` if you pass non-numeric index.  This can happen easily, if the result you got had a hash where you expected an array.  It seems in keeping with the spirit of `dig` that it should return `nil` in this case rather than raising an exception:

TypeError: no implicit conversion of Symbol into Integer

Feedback on magic comment ‘immutable: string’

Charles Nutter, the mastermind behind JRuby, had some feedback on the magic ‘immutable: string’ comment we proposed in Ruby issue 9278,  (BTW our proposal is directly built on his Ruby 2.1.0 contribution, Ruby issue 9042.)

I figured it was worth copying the response up here where the formatting is more rich.

From Charles:

A magic comment should not completely change the semantics of a literal type. Encoding magic comments do not suffer from the same issue since they only change how the bytes of the strings are interpreted by the encoding subsystem…they do not change semantics.

I view this `immutable: string` comment as less intrusive than the encoding magic comment. The magic encoding was typically required in order get code to compile at all in 1.9. In this case the magic `immutable: string` comment is just a tip to Ruby saying “Feel free to optimize string literals here”. The comment is completely optional, as is the optimization. But it’s worth it because the optimization has a big speed payoff. 1.6X in this benchmark. I think most projects will run at least 1.1X faster.

A magic comment is far removed from the actual literal strings, meaning that every developer that enters a file will have to keep in mind whether the strings have been forced to be immutable before doing any work with literal strings.

I expect that the `immutable: string` magic comment would typically be dropped at the top of all files in a project. (As many did with the encoding comment when porting to 1.9.) Any project that has automated test coverage is going to want to put the comment throughout as a project policy, because of the speed payoff.  [BTW to make it easy to add this magic comment, we’ve cloned the magic_encoding gem into a magic_immutable gem here.]

Although this eliminates or reduces calls to .freeze, it causes the opposite effect to get a mutable string in the same file…specifically, you have to call .dup.

Yes, exactly! This magic comment eliminates a giant/infeasible/ugly change (`.freeze` after every string literal) and replaces it with a tiny/feasible one. That is certainly its goal.

Consider Rails. I just did a quick/approximate regex search to count string literals in Rails 3.2.12 and found on the order of 50,000 of string literals. How many of those do you think are mutated? I’m going to guess no more than 10 places across all of Rails. (I’m hoping to get some actual stats using a hack of this branch, but for a quick estimate I audited about 150 cases that mutate a string and didn’t find a single case where the receiver was a string literal.)

So if we add the magic comment in Rails, we’ll need to add `.dup` (or change to String.new) in up to 10 places instead of adding `.freeze` after 50,000 string literals!

I think it would be better to consider adding the String#f method proposed during the rework of .freeze optimizations. Adding a .f to a literal string is not very much to ask, and in your particular script, it would actually add *fewer* characters to the code than the magic comment.

I’m with Paul Graham, who argues that expressiveness is counted in elements (~tokens), not bytes. So `.f` is no more expressive than `.freeze`. Whether `.f` or `.freeze`, it’s still at least 1 element to be added after *every* string literal. One of Ruby’s main differentiators is its beauty and English-like readability. That’s certainly what drew me to Ruby. No one who feels that way is going to put `.freeze` or `.f` after every string literal.

In summary, there will certainly be projects that don’t bother with the `immutable: string` magic comment (for example, those that lack automated test coverage, or don’t run on MRI). But many projects will use it because they will get a big, immediate performance payoff in Ruby 2.1. We’re already looking forward to it!

Magic comment ‘immutable: string’ makes Ruby 2.1’s “literal”.freeze optimization the default

In Mutable strings in Ruby we saw that making Ruby strings immutable with .freeze can remove a common source of bugs.  We also saw that Ruby’s allowance for string mutation can cause significant performance degradation because string literals must be allocated (and later garbage collected) every time that code is run.

Ruby 2.1 takes a step towards addressing the performance problem with its “literal”.freeze (formerly “literal”f suffix) optimization. This addresses the performance problem described in Part 1, when you write ‘.freeze’ after your string literals:

def log_message(message)
  puts message + "[EOL]".freeze
end

The above change can make a big difference in performance.  But you have to remember to put ‘.freeze” after every string literal, which is ugly and impractical to do broadly.

# -*- immutable: string -*- Makes “literal”.freeze the Default

To be practical, we need a way to mark entire code files as having their string literals as frozen by default.  This pull request does just that with a magic ‘immutable: string’ comment at the top of the file.  Here’s a test that shows its usage:

# -*- immutable: string -*-

require 'minitest/autorun'
require_relative 'test2.rb' # file with a mutable string

heredoc_string = <<-EOS
  Hello World
EOS

strings = [
  "hello",
  %{Hello},
  %Q{Hello},
  %q{Hello},
  heredoc_string
]

def mutate(str)
  str.slice!(1, 2)
end

def log(message)
  "I'm logging: #{message}"
end

describe "strings defined in this file" do
  strings.each do |s|
    it "should raise an error" do
      -> {
        mutate(s)
      }.must_raise(RuntimeError).message.must_match(/can't modify frozen String/)
    end
  end
end

describe "string interpolation" do
  it "should fail" do
   -> {
      str = log("blah blah")
      mutate(str)
    }.must_raise(RuntimeError).message.must_match(/can't modify frozen String/)
  end

  it "should succeed" do
    -> { str = "foo#{some_string}" }
  end
end

describe "strings not defined in this file" do
  it "should be mutable" do
    mutate(Foo::CONSTANT)
    Foo::CONSTANT.must_equal "SING"
  end
end

describe "static strings" do
  it "should always have the same object_id" do
    def some_string
      "A nice frozen string!"
    end
    some_string.object_id.must_equal some_string.object_id
  end
end
# test2.rb
module Foo
  CONSTANT = "STRING"
end

Overriding immutable: string

Sometimes in a file with the magic comment you may actually need a mutable string.  That is easy to do with String.new or ”.dup:

# -*- immutable: string -*-

def concatenate(*args)
  result = String.new     # or:  result = ''.dup
  args.each do |arg|
    result << arg
  end
  result
end

Making Ruby More Functional

This new magic comment should allow some big performance gains and lessen bugs at the same time, taking Ruby one step closer to Functional Programming.  We sincerely hope this pull request will be accepted into the main Ruby 2.1 branch.

Questions or comments?  Feel free to post them here, email colin@invoca.com, or tweet to @colindkelley.

Mutable strings in Ruby

When people ask why we use Ruby instead of Python, I usually mention the beautiful, nearly-invisible English-like syntax, the ease of writing DSLs, trivially simple block arguments, meta-programming including reflection… But I have to admit that Python has a few clearly superior features.  First on the list: Python strings are better because they are immutable.

The Perils of Mutation

What’s so bad about Ruby’s mutable strings?  Consider this innocuous-looking Ruby method that puts “[EOL]” at the end of the message that is being logged:

def log_message(message)
  message << "[EOL]"
  puts message
end

LOOP_MESSAGE = "In the loop"

3.times do
  log_message(LOOP_MESSAGE)
end

Here’s the output:

In the loop[EOL]
In the loop[EOL][EOL]
In the loop[EOL][EOL][EOL]

What are those extra [EOL]s doing in there?!  Drop into pry and have a look at LOOP_MESSAGE after the loop has finished:

LOOP_MESSAGE
=> "In the loop[EOL][EOL][EOL]"

Uggh! Every pass through the loop mutated that “constant” LOOP_MESSAGE string.  Not good.

Avoid Mutating Methods and Operators

Let’s banish the nasty mutating “String#<<” operator:

def log_message(message)
  puts message + "[EOL]"
end

With that, the bug is fixed!

In the loop[EOL]
In the loop[EOL]
In the loop[EOL]

Besides String#<<, most of the other mutating operators to avoid end in !, like slice!, downcase!, etc.

Run-time Immutability with Object#freeze

Ruby has always had a run-time form of immutability.  If you call “.freeze” on an object, the Ruby runtime will raise an exception if any code tries to mutate that object:

>> pets = "cat"
=> "cat"
>> pets << " dog"
=> "cat dog"
>> pets.freeze;
>> pets << " canary"
RuntimeError: can't modify frozen String
 from (irb):8

So if you remember to freeze your string literals, you can avoid bugs with mutating shared references. You may have seen this done in your favorite Ruby gem or in Rails itself:

CONTENT_TYPE = "Content-Type".freeze

Performance Problems Caused by Mutability

Every time this code is called, the string literal “[EOL]” will be allocated again off the heap.  If this method were called inside the tightest loop in our code, the time to dynamically allocate this string literal (and later garbage collect it) can start to dominate the performance of the application.

The only reason Ruby goes to all this effort to allocate string literals every time the code is run is to allow the string to be mutated.  For example:

def concatenate(*args)
  result = ''
  args.each do |arg|
    result << arg
  end
  result
end

Non-Mutation is the Functional Way

Mutating variables is bug-prone, particularly in a language with shared references like Ruby, because side-effects from one method can change the behavior of another.  More generally, it is much harder for a programmer to understand the behavior of code that mutates because the name of an object is insufficient to predict its content—the programmer must also know the point in the object’s lifetime.

It may be less intuitive, but non-mutating code can actually perform better because objects that are never mutated can be freely shared, both by the language and the application.

Non-mutation is a cornerstone of the functional programming style.