I am sure, many of you would be wondering what this is all about, and some might have heard it also. So you don’t have to search it further. I am going to give you some heads up:
- Soundex is a phonetic pattern-matching system created mainly for the American census. It was developed earlier by Margaret K. Odell and Robert C. Russell and also was awarded for their efforts.
- The main function of Soundex is to order the same sounding names together, which also turns out to be very useful for dealing with the database application containing the contact informaion.
- For e.g., was it Smith or Smythe? Chapple, Chapel or Chapelle? The Soundex Functions makes it easy to recognize these difficulties.
- This system uses a simple phonetic algorithm to reduce each name to a four character alphanumeric code.
- The first letter of the code corresponds to the first letter of the name. The remainder of the code consists of three digits derived from the syllables of the word according to the following code:
- 1 = B, F, P, V
- 2 = C, G, J, K, Q, S, X, Z
- 3 = D, T
- 4 = L
- 5 = M,N
- 6 = R
- The double letters with the same Soundex code, A, E, I, O, U, H, W, Y, and some prefixes are being disregarded. Like “Piyush” becomes “P200” by the following method:
- The P is stored.
- The i, y and u are disregarded.
- The s is stored as Soundex code 2.
- The h is disregarded.
- As you know to reduce each name to a 4-character alphanumeric code, then in the rest two places put 0’s
- You can see more examples like:
- Elephant = E415 (l=4, p=1, n=5)
- Computer = C513 (m=5, p=1, t=3)
- Database = D312 (t=3, b=1, s=2)
- SQL Server includes two Soundex-related functions, SOUNDEX () and DIFFERENCE (). We discuss each of them briefly:
Using the SOUNDEX () function:
- The SOUNDEX (string) function calculates the Soundex code for a string in SQL Server as follows:
- So we can use it easily in the SELECT statement and in the WHERE clause, as follows:
USE TEST SELECT FName, LName FROM dbo.Students WHERE SOUNDEX('Pareshan') = SOUNDEX(FName);
The result can be seen as follows where similar sounding Names are in one column:
- This implementation has the smaller impact on the data schema, but it will cause performance issues as the data size grows because the SOUNDEX() function must execute for every row in the database, due to which an index on the name column cannot be used with an efficient seek operation, but only with a much more expensive scan.
- Thus, this issue can be cured if the name with the same first letter is selected in a WHERE clause. This allows SQL Server to use any indexes to narrow the search, so fewer rows must be read and the SOUNDEX() function must be performed only for rows selected by the index as follows:
USE TEST SELECT FName, LName FROM dbo.Students WHERE SOUNDEX('Piyush') = SOUNDEX(FName) AND FName LIKE 'P%';
The results can be seen as same as the above what the big difference these two queries are in performance terms.
The first query have taken of about 40 miliseconds to execute while second one have taken just about 7 milliseconds. So you can see the difference between these two queries, which can be turned out to be very handy while accessing big databases.
Using the DIFFERENCE () Soundex function:
- The DIFFERENCE(String) function returns the Soundex difference between the two strings in the form of a ranking from 0 to 4, with 4 representing a perfect Soundex match:
USE TEST SELECT FName, DIFFERENCE('Piyush',FName) AS DIFF_NAME FROM dbo.Students ORDER BY DIFFERENCE('Piyush',FName) DESC;
The result can be seen as:
- The main advantage of DIFFERENCE() function is that it broadens the search keyword beyond the first letters.
Well this was all about Soundex function.
Hope You like it 🙂