Working with Strings in C#

Strings and string manipulation are a complex and a much larger topic than it appears on the surface. There are many different complicated scenarios when using strings and their very nature is often misunderstood.

What makes strings different?

Strings are immutable which means they are read-only. When you alter a string in any way, behind the scenes, the .NET framework creates a new copy of the string with the modified value and the old version is sent to garbage collection. This is dangerous for a couple of reasons. If you have a string that has one million characters and you alter that string in a loop ten times, you now have sent ten different strings of at least one million characters to garbage collection. Also, since strings are a reference type and not a value type, they can be slower to work with so altering them many times can be a slower operation than you might expect.

Strings are an important storage mechanism for just about every application so there are options available to help get around the possible performance issues.

First, there is the StringBuilder object. The string builder is not immutable, which means that modifications to the string value inside the StringBuilder does not create a new instance each time. The StringBuilder object handles this by creating a buffer that can handle expansions to the string. When you modify the string, if the buffer can handle the modification, it does so. Otherwise, a new, larger buffer is created. Proceed with caution, however. You should not just always replace string with StringBuilder while coding. Small strings are faster than the StringBuilder. Choosing the StringBuilder over the string should be restricted to larger strings and strings that are manipulated many times, such as in a loop.

Second, the compiler is smart and handles the manipulation of strings in the most intelligent way possible. We will discuss that next but for now, the example below will illustrate how the compiler handles the creation of a string.

string someValue = "My dog " + name + " is " + age.ToString() + " years old and is a " + adjective + " pet!";

How many strings are created in the above example? If you said anything other than 1, you would be wrong. One string is created because the compiler knows how to handle that situation without creating seven different copies. This is a common scenario and, like I said, the compiler is smart. What about the example below?

string someValue = "My dog ";
someValue += name;
someValue = someValue + " is ";
someValue += age.ToString();
someValue = someValue + " years old and is a ";
someValue += adjective;
someValue += " pet!";

In the above example, there are seven different strings created. The semicolon is a stop statement so the compiler stops, compiles, and then moves on. The example uses two different methods of manipulation: += and = … +. They are the same, however.

Methods of string manipulation

There are several different methods of string manipulation available to us that can make working with strings a very nice experience. The joy of the experience, however, is dependent on the method used.

String concatenation

String concatenation has been used in both examples above. You are adding one string to another. Both += and =…+ are examples of string concatenation. While string concatenation has been around since the very beginning, it is my least favorite method to use for any scenario where I need to combine more than three or four different strings or if there is punctuation involved. It feels like an uphill battle.

String.Format()

String.Format() is a much cleaner method for complicated manipulations.

string someValue = string.Format("My dog {0} is {1} years old and is a {2} pet!", name, age, adjective);

String.Format() allows you to create the string using indexed placeholders ({0}, {1}, {2}) which correspond to list of variables (name, age, adjective). I prefer this method over string concatenation because it allows me to write the string in a more natural state without worrying too much about spacing and punctuation. It feels like a more natural flow.

String Interpolation

String interpolation is similar to string.Format() but is even more natural.

string someValue = $"My dog {name} is {age} years old and is a {adjective} pet!";

String interpolation allows a similar experience to string.Format() but rids us of the indexed variables and instead allows us to just insert the variables directly into the string. In nearly all cases, I prefer interpolation over format. It just feels better using it. The dollar sign is what allows interpolation to work. It must immediately precede the opening double quote.

Escape Sequences

There are times when you may need for items that are otherwise not allowed to be in your string. For example, you might want to add double quotes around the dog’s name in the examples so far. You do this with an escape sequence.

string someValue = $"My dog \"{name}\" is {age} years old and is a {adjective} pet!";

The backslash character is an opening escape character. It forces the compiler into ignoring or escaping the very next character, the double quote. In this case, the double quotes around the name variable will be displayed. Please note that the backslash character is an escape sequence whether or not you intend it that way. This is why you will see four backslashes for a network path. The first is ignored, the second is not, the third is ignored, the fourth is not.

You can find a list a list of the valid escape sequences here.

Conclusion

Strings are a strange beast in the .NET framework and it pays to understand them as much as possible to keep from feeling like you are fighting a battle when using them beyond the simplest scenarios. However, the .NET framework gives us great tools for dealing with them if you take the time to learn about those tools.