Title Case or Capitalize a Sentence or Word
The process of capitalising a string doesn’t require much thought. Just imagine your word, now remember what capitalising actually means: the first letter should be uppercase and the rest should be lowercase (it really depends on the trustworthiness of the input and if you wanna take in account uppercase abbrebiations too), that way, words like
daRKness can become
Darkness respectively. In most programming languages you have a helper function to uppercase or lowercase a letter or string and another helper to subtract portions of your string.
And here’s the ES6 version with a ternary operator just in case:
What the function does is first look if the string passed to it is empty, case in which it returns an empty string. If it does have at least 1 character, perform the operation: uppercase the first letter and append the rest of the string but in lowercase (this step is optional but you do need to append the rest).
I title case every subtitle in my posts, as you can see above, “Title Casing a Sentence” has every word capitalized, meaning that the first letter of each word was put in uppercase. For some reason, most algorithms to title case a sentence are made so that every single word including monosyllables are capitalized (like “or” and “a”) and that’s really up to the needs of the developer but I find it a good challenge to account for those cases where you want monosyllables NOT capitalized. I’ve seen bad attempts at using RegExp and I’m going to mention one method I came up with while struggling with this algorithm.
What if I want to split my sentence by not only spaces but also dashes and opening exclamation or interrogation (for the Spanish language and others), forward slashes or even underscores? See what could happen if I don’t take this edge case in account:
hi super-man, you look amazing/cool...would turn into
Hi Super-man, You Look Amazing/cool...instead of the desired
Hi Super-Man, You Look Amazing/Cool; check this example as well:
hola, ¿dónde Está el ¡¡baño!!?would become
Hola, ¿dónde Está El ¡¡baño!!?instead of the obviously expected
Hola, ¿Dónde Está El ¡¡Baño!!?.
I don’t want to capitalize words that are filler like “for, to, and, or, onto, in, on, into” and so on. You can learn more about these words here.
The first edge case can be easily solved by using a simple RegExp (regular expression pattern) and a little helper function to escape separators to make them compatible with a RegExp character list declaration or
/[abc]/ for example. The second case I won’t even look into it but if you’re up for the challenge, go ahead!
Tip: Use an array of filler words you want to avoid capitalising inside the
If you’re not familiar with
String.prototype.replace() I recommend you read MDN’s documentation article about it. This strategy is simple but can get messy if you don’t know the basics of regular expressions, basically what we are going to replace is the following: groups of characters that match the criteria of being composed of groups of characters that don’t contain any of your separators, meaning, if you have “_ -/¿¡” as your separators and this example string:
¿¿ dun, hello world-hehe yo/hey na_naa, when we run the RegExp it will make the following groups:
['dun,', 'hello', 'world', 'hehe', 'yo', 'hey', 'na', 'naa']. This group represents your words to be capitalized!
If we were to use
replace with our sentence, the RegExp will capture these words and capitalize them if we pass
escape helper function replaces EVERY single character in the string with itself, but having a backslash behind it. String interpolation is nice in ES6, for those who don’t do ES6, it’s just the equivalent of
'\\' + c. The
g flag is needed because it means “global”, not including it will only replace the first occurrence of the pattern.
Our new function declares a
wordPattern that will determine how words are selected from the string. This is a RegExp constructor that assembles this:
/[^\ \_\-\¡\¿\/]+/g behind the scenes. The caret character after the opening bracket will tell the RegExp engine that the characters inside the character list will not be taken into account, t’s an inverse selector that basically means grab one or more characters that aren’t “these”.
My two cents? For me, the mapping and joining approach that bad. Nowadays, computers have plenty of memory to handle extra arrays and the use cases of this algorithm aren’t likely to need any significant optimization, besides, RegExp can be daunting for newcomers. Also, the first strategy is complicated if you don’t understand recursion, the second one seems more elegant to me and handles extreme cases in my opinion. If you have more edge cases to consider or any other suggestion, let me know. For now, I’ll put the code that only handles splitting and joining by whitespace down below:
In modern languages, things get easier or more difficult depending on the implementation of functional programming concepts and native string methods. For example, in Elixir I could do something like this:
Beautiful isn’t it? If you’re new to Elixir but are currently learning it, the code just prints the result of grabbing the sentence, splitting it by whitespace (if you want to see the full implementation of strategies one and two in Elixir, give me a heads up in the comment section), mapping it to contain the capitalized words and then joining the words by a space.