Ruby 2.3: Working with immutable strings

By John Cinnamond on March 3, 2016

If you’re preparing for the upcoming Ruby 2.3 release, you’re in the right place. Among the many changes you can expect from this version is the confusingly named “Frozen String Literal Pragma.” This feature lets you add a magic comment to your Ruby source code, which makes all string literals frozen by default—that is, you won't be able to modify them. Let’s take a look at what this all means, and how it will affect every day programming in Ruby. 

What will change for Ruby developers?

First, it’s important to note that this change is just the first step towards all string literals becoming immutable in Ruby 3. String literals are simply strings that are created using Ruby's built-in string syntax. This includes those created using double quotes (“), single quotes (‘), or the special forms %Q and %q. Traditionally, these forms have all created strings that can be modified. For example, the following code could be used to create a string containing a greeting, and then to modify the greeting to make it more personal if a name is available.

     greeting = "Hello"

     if name.present?
       greeting << " #{name}"
     end

     puts greeting

Starting with Ruby 2.3, if you add the magic comment # frozen_string_literal: true to the file, the string literal for greeting is then automatically frozen (meaning it can't be modified). This means that if name is present and we try to personalize the greeting, a runtime error will be raised.

What are the benefits?

At first, this probably seems like a strange move for Ruby—after all, it breaks some existing behavior in the language. But the primary motivation for making this change is performance. Since Ruby 2.1, frozen strings have avoided duplicate object allocation by ensuring that identical frozen strings always refer to the same object in memory. For example, in the following code both 'a' and 'b' refer to the same object:

     a = "hello".freeze
     b = "hello".freeze
     a.object_id # => 70330438276980
     b.object_id # => 70330438276980

Allocating just a single object for both 'a' and 'b' reduces the amount of memory used and this, in turn, reduces pressure on the garbage collector.

Is this all about performance?

There is another reason why you might want immutable strings: Immutability can make code easier to reason about. For example, consider the following piece of code:

     name = "John Cinnamond"
     save(name)

     # What is the value of name now?

If strings were immutable, we could guarantee that the value of 'name' has not been changed by calling the method. But if strings can be modified, then we can't be sure. What would happen if the definition of ‘save’ contained some code like:

     def save(data)
       # Normalize the data
       data.gsub!(/\s+/, '-')
       data.downcase!

       # ...
     end

In this case the value of 'name' would be modified by the call to save, which may lead to unexpected results. Of course, this is a very short code example and it's easy to see what's going on. You could even argue that the code at the start of 'save' is bad code and that we shouldn't write such things. But as systems become more complex, the risk of accidentally creating these side effects increases. By making strings immutable there are a whole class of errors like this that become impossible.

What are the downsides?

Extra performance is great but, as we noted earlier, this change breaks existing behavior. This is why immutable string literals are being introduced slowly, starting with the magic comment to optionally enable them in Ruby 2.3. Quite a few libraries and some existing code will be affected by the change. Anything that uses '<<' to concatenate strings, or uses methods like 'gsub!' and 'downcase!' that modify the string in place, will need to be changed.

Thankfully, the fix for this is very simple. If you want to modify a string literal, you must call 'dup' on it first to get a new instance of the string. When you call 'dup' the string that it returns is unfrozen, meaning you can continue to make modifications. For example, the 'save' method from above could be rewritten as:

     def save(orginal_data)
       # Make an unfrozen copy of the string
       data = orginal_data.dup
    
       # Normalize the data
       data.gsub!(/\s+/, '-')
       data.downcase!
    
       # ...
     end

Aside from having to update every piece of code and every library that modifies strings, the introduction of frozen string literals itself feels a bit strange. There’s something slightly odd about string literals—and only string literals—becoming immutable by default. This brings us to the next big question…

Should other objects be immutable by default?

If immutability is OK for strings, should we also consider it for arrays and hashes? The short answer is, probably not.

In making this change, the Ruby core developers have done a lot of work to decide how much performance benefit it will really bring. Identical string literals are relatively common, particularly in large frameworks. Some estimates put the saving of object allocation as high as 30 percent of all string literals. So, making this change introduces some pain, but we can be reasonably sure that it will also bring real benefits.

But this doesn’t mean that the same holds true for other types of data. How often do we have identical arrays? Or identical hashes? Or even identical objects? Not very often. So making them immutable by default probably won't bring any performance benefits, but it will certainly bring some pain.

What about the benefit of making code easier to reason about by preventing accidental modification? Well, that benefit is already there, you just need to use it. You can call 'freeze' on any object and it will prevent modification. For example, if you have a hash with some data and you want to make sure you don't accidentally modify it, just call 'freeze.' Any subsequent attempt to modify it will lead to a runtime error. That's still a problem you have to deal with but it's better than unexpected inconsistencies in your data.

Takeaway

As it goes with most updates, there will be some short term pain as Ruby changes to make string literals immutable by default. The good news is that it will happen gradually, and it won’t be difficult to detect and fix existing code that’s affected by this change. In return, we’re going to get a version of Ruby that uses less memory and thus runs a bit quicker. Given the low overhead of having to call 'dup' when we want an unfrozen string, this is certainly a tradeoff worth making.

Get our content first. In your inbox.

Contributor

John Cinnamond

@jcinnamond

John is a Ruby developer, conference speaker and Pluralsight author. Having written software commercially for more than 10 years, and more years than he’d care to admit to as a hobby, John is still fascinated by learning new ways to improve writing software.