Split Full Name with Format: {Last, First Middle} Comprehensive Cases

Myles Baker picture Myles Baker · Aug 5, 2014 · Viewed 14.1k times · Source

My client sent me name data as a Name string which includes the last, first, and middle names in a single entry. I need them split into LastName, FirstName, and MiddleName. I have found some scripts online, but they don't serve my purposes because they either (1) use a different format, or (2) don't handle edge cases very well. See the examples below:

  1. Nightingale, Florence -> Florence Nightingale
  2. Bond, James Bond -> James Bond Bond
  3. Abbott, Edwin A. -> Edwin A. Abbott

Can someone help me write a SQL Server script that splits a string into the various pieces I'm looking for?

Answer

Myles Baker picture Myles Baker · Aug 5, 2014

Please note the following:

  1. Always request normalized data to ensure the highest data quality. I tried to enumerate all possible cases for last, first, and middle name combinations but I'm sure I did not get all of them.
  2. My script requires the format: LastName@DELIMITER1@DELIMITER2FirstName@DELIMITER2MiddleName, but can be easily altered for other formats.
  3. This script does not separate tiles like Dr., or handle suffixes.
  4. Credit to MemKills for the idea of the test data set, which I expanded.

>

DECLARE @DELIMITER1 varchar(1), @DELIMITER2 varchar(1), @MAX_LENGTH int
SET @DELIMITER1 = ','
SET @DELIMITER2 = ' '
SET @MAX_LENGTH = 50

SELECT  [Name],
    SUBSTRING(Name,1,CHARINDEX(@DELIMITER1,Name) -1) AS LastName,                   -- Less one char for @DELIMITER1
    SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH) AS FirstAndMiddle,   -- Plus two for @DELIMITER1 and @DELIMITER2
    CASE 
        -- Middle name follows two-name first names like Mary Ann 
        WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH), @DELIMITER2, '')) > 0
            THEN SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+2, @MAX_LENGTH)
        ELSE NULL
    END AS MiddleName,

    CASE 
        -- Count the number of @DELIMITER2. Choose the string between the @DELIMITER1 and the final @DELIMITER2. 
        WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH), @DELIMITER2, '')) > 0
            Then SUBSTRING(Name, CHARINDEX(@DELIMITER1,Name)+ 2, 
                 (LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH))
                 - LEN(SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+2, @MAX_LENGTH))))
        ELSE SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)
    END AS FirstName
FROM 
(
    SELECT  [Name] = 'Zzz, A' UNION ALL
    SELECT  'de Zzz, Aaa' UNION ALL
    SELECT  'Zzz, Aaaa' UNION ALL
    SELECT  'Zzz, A B' UNION ALL
    SELECT  'Zzz, Aaaa Bbbb' UNION ALL
    SELECT  'de Zzz, Aaaa' UNION ALL
    SELECT  'de Zzz, Aaaa B' UNION ALL
    SELECT  'van Zzz, Aaaa B' UNION ALL
    SELECT  'Yyy-Zzz, Aaaa B' UNION ALL
    SELECT  'd''Zzz, Aaaa B' UNION ALL
    SELECT  'Zzz, Aaaa Bbbb C' UNION ALL
    SELECT  'Zzz, Aaaa Bbbb Cccc'
) AS X