I had an issue logged from a client the other day about address entries in their system getting automatically converted to title case when they didn't want this to happen, e.g. de Lorean Drive would get auto updated to De Lorean Drive.
I was hoping this was something being done at the application layer, but no such luck and it turned out there was stored procedure doing the title case conversion. This proc was taking a relatively simple approach and just finding a PATINDEX
match for a space followed by a letter, or the first letter in the string, and capitalising that letter.
Now, I'd much prefer that this sort of thing was handled in the front end, and title casing can be a very complex beast, but in this case the rules were pretty simple, and it seemed an interesting problem to solve so I set about writing some code that would capitalise each word unless that word was present in an exclusion list. I recalled I had a code snippet lying around that I could use as a starting point, so I dug that out and set to work. Not too much time later this was the basic result:
declare @t table(char_sequence varchar(10))
insert into @t(char_sequence) values('de'), ('te')
declare @input_string varchar(255) = 'de Lorean dr' -- just for testing
declare @pos int = 1
declare @next_pos int = 0
declare @out varchar(256) = '', @word varchar(255)
while @pos <= len(@input_string)
begin
set @next_pos = charindex(' ', @input_string + ' ', @pos + 1)
select @word = substring(@input_string, @pos, @next_pos - @pos)
select @out +=
case when not exists(select * from @t where char_sequence = @word)
then stuff(@word, 1, 1, upper(left(@word, 1))) else @word
end
+ ' '
set @pos = @next_pos + 1
end
select rtrim(@out)
I'll break it down. First up was to create a temp table to hold the exclusion words. This could be a permanent table, temp table or table variable. I not a big fan of table variables but I used one in my dev code out of laziness. In the final production code this was replaced with a temp table, and I used a couple of exclusions to make sure the code could handle more than 1. I also have a test string here for use during development, this is removed for the final code and replaced with a parameter.
declare @t table(char_sequence varchar(10))
insert into @t(char_sequence) values('de'), ('te')
declare @input_string varchar(255) = 'de Lorean dr' -- just for testing
Next up some variables to keep track of the position as we make our way along the string. @pos will be the starting position of a word and @next_pos will be the end of the word/start of the next one. @out holds our formatted output and @word is the word currently being checked for exclusions. Note that setting the varchars to an empty string is important for string addition. Using a concat function might have got around this but, hey, whatever works.
declare @pos int = 1
declare @next_pos int = 0
declare @out varchar(256) = ''
declare @word varchar(255) = ''
Rather than looking for space-letter patterns as the original code did, my approach was to effectively split the string into words, then check to see if the word was in the exclusion table. If so it's added to the output unchanged, if not then the first letter gets capitalised before adding to the output string.
I'm using a while loop to loop through the string until no more words are found. Adding a space to the end of the input means the while loop will stop at the end rather than entering an endless loop looking for that last word. So I set @pos to the CHARINDEX
of the start of the word, set @next_pos to the CHARINDEX
of the next instance of a space, then get a SUBSTRING
that starts at @pos and has a length of @next_pos - @pos. For example if the start of the word was at character 3 and the end of the word was at character 10 then SUBSTRING
would look for a string starting at 3 and 7 characters long.
while @pos <= len(@input_string)
begin
set @next_pos = charindex(' ', @input_string + ' ', @pos + 1)
select @word = substring(@input_string, @pos, @next_pos - @pos)
Once we have the string, check to see if it exists in the exceptions table. It it doesn't then STUFF
the capitalised first letter back into the word. If it is in the table then leave it alone, then add the word and a space to the output string.
select @out +=
case when not exists(select * from @t where char_sequence = @word)
then stuff(@word, 1, 1, upper(left(@word, 1))) else @word
end
+ ' '
set @pos = @next_pos + 1
end
Finally we can return the final string, trimming off the extra space on the end
select rtrim(@out)
That's it. It performs well, and does the, admittedly simple, job. Does it have limitations? Sure, but it's still relatively simple, is scalable in terms of exclusions and performs well in conjunction with the application. Enhancements could be to allow for an input string of separation characters, allow passing exclusions in (and then updating a permanent exclusion table so it's self maintaining), but that's a job for future me.
No comments:
Post a Comment