Symbols are one of the most mysterious things for Ruby beginners. When I was starting with Ruby I also wondered what is going on sometimes and what these symbols are at all. I’m not a programming veteran, but today I know a little bit more about some things and will share some thoughts here with you. This time about about Ruby symbols, not accidentally described as things above. We start with the basics about what are symbols and when to use them. Finally we’ll dive deeper to take a look on how they are managed by the Ruby GC.
Symbols vs. Strings
Bob: What is a Symbol? Is it a String?
Bob: An Integer?
Bob: So what it is?
Alice: It’s a… an object.
What a surprise! Yes,
Symbol is an
Object, just like
Integer, but it is much easier to point out the differences between strings and integers, that’s why it’s hard to define what a
Symbol is. We can say that symbols are kind of strings and integers at the same time. Ruby does not create an
Object every time we refer to
Symbol, Ruby maps each
Symbol to an
Integer. Like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
So if you use the same
Symbol, it always has the same
object_id. It refers to an
Integer and thus it’s immutable. That’s why symbols are mostly not garbage collected (more info later). Imagine the situation that a
Symbol is removed from the memory and recreated the next time you want to use it. It would have a different
object_id and that would not be an expected Ruby behaviour. This is an important fact that makes the symbols so strange for newbies. This is one of the most significant differences between a
Symbol and a
1 2 3 4 5 6
You see? We had the same id when we used symbols, here we have three strings, each with a different
Another huge difference is that symbols don’t have the methods you know from strings. Try to invoke
+ on a
Symbol and you’ll get
Still not enough? Try to hardcode a
Symbol starting from an
Integer, just like a
1 2 3 4 5
Of course you can do that using quotes, but in my opinion using symbols wrapped in quotes is not a best shot.
When to use symbols
Hash keys are an excellent example of using symbols. Just take a look: hash keys are things that should remain unchanged in your app forever. Using symbols is exactly what you need in this case, after you use the
Symbol for the first time you do not need more memory when using the same object again. Of course, premature optimization is usually a waste of time, but using symbols in some certain situations is definitely a good choice.
Symbols are obviously not the silver bullet and sometimes it’s better to use strings. Especially when you want to use
String class' instance methods. But that’s not the only disadvantage of using symbols - if you have e.g. a Rails app running Ruby version below 2.2 there’s one more important thing to notice.
The DoS problem using symbols in Ruby < 2.2
I assume you know what is the Denial of Service attack. Imagine that you have a method in your controller, which performs a task taking the user params and converting all of them to symbols:
In Ruby below 2.2 version all the symbols created above are not garbage collected. What does it mean? It means that if you allow user to flood your app with params, you will face a moment when you run out of memory.
As you know a living web app is a long-time running process, so created symbols will stay in the memory for some time. A day, a month, maybe a few years. If someone doesn’t like you and will purposely submit a million of params to your app, it can be dead very fast. Go on if you’re still curious.
What is this garbage collecting?
Garbage Collection is a secret way to automatically manage the memory. You probably use it all the time, even if you don’t know that. This is a thing that allows the programmer to forget about the memory limits, at least at the beginning of the project. This mechanism called Garbage Collector, shortly, removes the objects from memory when it decides the object is not longer in use. It’s a quite complicated part of programming, so it’s better for you if you never have to dig in the dark deep of garbage collecting.
GC of symbols in Ruby >= 2.2
The hardcoded symbols continue to use a mapped Integer ID all the time, so they’re never purged by the Garbage Collector mechanism. By the hardcoded symbols I mean all the elements of Ruby language, variable and method names and constants. Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Every time we run the
GC.start command it flushes all the unnecessary things from the memory, as you can see hardcoded
:ozimeu objects were not garbage collected. The number of all the symbols remains the same even after launching the Garbage Collector “cleaning” mechanism. Now take a look on all the dynamically created symbols:
1 2 3 4 5 6 7 8 9 10
We started with 3382 symbols, then we created one with
to_sym and the
Symbol.all_symbols.size incremented by 1. Next we run the Garbage Collector and it removed the dynamically created
Symbol from the memory. It’s a good news when we talk about that potential DoS attack problem and I think it’s actually a nice improvement.
Symbol instead of
String is a good idea when you don’t plan to change the
Object too often, preferably never :-) Symbols are immutable and improve the memory usage. It’s “DoS-safe” to convert user input to symbols when you use Ruby version above 2.2. The lack of
Symbol instance methods compared to
String may be annyoing, but quite often that means you want to change the
Object, so you should consider using