MySql: Select Distinct for words in different order
up vote
1
down vote
favorite
I have problem with creating query, which getting no duplicate values form my table. Unfortunately, Full Name column has Name and Surname in different order.
For example:
+----+----------------------+
| ID | Full Name |
+----+----------------------+
| 1 | Marshall Wilson |
| 2 | Wilson Marshall |
| 3 | Lori Hill |
| 4 | Hill Lori |
| 5 | Casey Dean Davidson |
| 6 | Davidson Casey Dean |
+----+----------------------+
I would like to get that result:
+----+-----------------------+
| ID | Full Name |
+----+-----------------------+
| 1 | Marshall Wilson |
| 3 | Lori Hill |
| 5 | Casey Dean Davidson |
+----+-----------------------+
My target is to create query, which getting in similar way, for example: select distinct for Name and Surname in the same order.
Any thoughts?
mysql select distinct
|
show 10 more comments
up vote
1
down vote
favorite
I have problem with creating query, which getting no duplicate values form my table. Unfortunately, Full Name column has Name and Surname in different order.
For example:
+----+----------------------+
| ID | Full Name |
+----+----------------------+
| 1 | Marshall Wilson |
| 2 | Wilson Marshall |
| 3 | Lori Hill |
| 4 | Hill Lori |
| 5 | Casey Dean Davidson |
| 6 | Davidson Casey Dean |
+----+----------------------+
I would like to get that result:
+----+-----------------------+
| ID | Full Name |
+----+-----------------------+
| 1 | Marshall Wilson |
| 3 | Lori Hill |
| 5 | Casey Dean Davidson |
+----+-----------------------+
My target is to create query, which getting in similar way, for example: select distinct for Name and Surname in the same order.
Any thoughts?
mysql select distinct
WhyCasey Dean Davidson
changed toCasey Davidson
?
– Madhur Bhaiya
Nov 10 at 18:14
Thanks, It was my mistake ;)
– fioljnw
Nov 10 at 18:19
how do you assure that which one in the correct one. I meanMarshall Wilson
ORWilson Marshall
?
– FatemehNB
Nov 10 at 18:21
1
Is it possible to have more than 3 substrings in the name ?
– Madhur Bhaiya
Nov 10 at 18:26
1
@MadhurBhaiya, db-fiddle.com/f/qg1jVT23JSmfURjReWL1sn/0
– fioljnw
Nov 10 at 18:57
|
show 10 more comments
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have problem with creating query, which getting no duplicate values form my table. Unfortunately, Full Name column has Name and Surname in different order.
For example:
+----+----------------------+
| ID | Full Name |
+----+----------------------+
| 1 | Marshall Wilson |
| 2 | Wilson Marshall |
| 3 | Lori Hill |
| 4 | Hill Lori |
| 5 | Casey Dean Davidson |
| 6 | Davidson Casey Dean |
+----+----------------------+
I would like to get that result:
+----+-----------------------+
| ID | Full Name |
+----+-----------------------+
| 1 | Marshall Wilson |
| 3 | Lori Hill |
| 5 | Casey Dean Davidson |
+----+-----------------------+
My target is to create query, which getting in similar way, for example: select distinct for Name and Surname in the same order.
Any thoughts?
mysql select distinct
I have problem with creating query, which getting no duplicate values form my table. Unfortunately, Full Name column has Name and Surname in different order.
For example:
+----+----------------------+
| ID | Full Name |
+----+----------------------+
| 1 | Marshall Wilson |
| 2 | Wilson Marshall |
| 3 | Lori Hill |
| 4 | Hill Lori |
| 5 | Casey Dean Davidson |
| 6 | Davidson Casey Dean |
+----+----------------------+
I would like to get that result:
+----+-----------------------+
| ID | Full Name |
+----+-----------------------+
| 1 | Marshall Wilson |
| 3 | Lori Hill |
| 5 | Casey Dean Davidson |
+----+-----------------------+
My target is to create query, which getting in similar way, for example: select distinct for Name and Surname in the same order.
Any thoughts?
mysql select distinct
mysql select distinct
edited Nov 10 at 18:16
asked Nov 10 at 18:09
fioljnw
83
83
WhyCasey Dean Davidson
changed toCasey Davidson
?
– Madhur Bhaiya
Nov 10 at 18:14
Thanks, It was my mistake ;)
– fioljnw
Nov 10 at 18:19
how do you assure that which one in the correct one. I meanMarshall Wilson
ORWilson Marshall
?
– FatemehNB
Nov 10 at 18:21
1
Is it possible to have more than 3 substrings in the name ?
– Madhur Bhaiya
Nov 10 at 18:26
1
@MadhurBhaiya, db-fiddle.com/f/qg1jVT23JSmfURjReWL1sn/0
– fioljnw
Nov 10 at 18:57
|
show 10 more comments
WhyCasey Dean Davidson
changed toCasey Davidson
?
– Madhur Bhaiya
Nov 10 at 18:14
Thanks, It was my mistake ;)
– fioljnw
Nov 10 at 18:19
how do you assure that which one in the correct one. I meanMarshall Wilson
ORWilson Marshall
?
– FatemehNB
Nov 10 at 18:21
1
Is it possible to have more than 3 substrings in the name ?
– Madhur Bhaiya
Nov 10 at 18:26
1
@MadhurBhaiya, db-fiddle.com/f/qg1jVT23JSmfURjReWL1sn/0
– fioljnw
Nov 10 at 18:57
Why
Casey Dean Davidson
changed to Casey Davidson
?– Madhur Bhaiya
Nov 10 at 18:14
Why
Casey Dean Davidson
changed to Casey Davidson
?– Madhur Bhaiya
Nov 10 at 18:14
Thanks, It was my mistake ;)
– fioljnw
Nov 10 at 18:19
Thanks, It was my mistake ;)
– fioljnw
Nov 10 at 18:19
how do you assure that which one in the correct one. I mean
Marshall Wilson
OR Wilson Marshall
?– FatemehNB
Nov 10 at 18:21
how do you assure that which one in the correct one. I mean
Marshall Wilson
OR Wilson Marshall
?– FatemehNB
Nov 10 at 18:21
1
1
Is it possible to have more than 3 substrings in the name ?
– Madhur Bhaiya
Nov 10 at 18:26
Is it possible to have more than 3 substrings in the name ?
– Madhur Bhaiya
Nov 10 at 18:26
1
1
@MadhurBhaiya, db-fiddle.com/f/qg1jVT23JSmfURjReWL1sn/0
– fioljnw
Nov 10 at 18:57
@MadhurBhaiya, db-fiddle.com/f/qg1jVT23JSmfURjReWL1sn/0
– fioljnw
Nov 10 at 18:57
|
show 10 more comments
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
It requires lots of String operations, and usage of multiple Derived Tables. It may not be efficient.
We first tokenize the FullName
into multiple words it is made out of. For that we use a number generator table gen
. In this case, I have assumed that maximum number of substrings is 3. You can easily extend it further by adding more Selects, like, SELECT 4 UNION ALL ..
and so on.
We use Substring_Index()
with Replace()
function to get a substring out, using a single space character (' '
) as Delimiter. Trim()
is used to remove any leading/trailing spaces left.
Now, the trick is to use this result-set as a Derived table, and do a Group_Concat()
on the words such that they are sorted in a ascending order. This way even the duplicate names (but substrings in different order), will get similar words_sorted
value. Eventually, we simply need to Group By
on words_sorted
to weed out the duplicates.
Query #1
SELECT
MIN(dt2.ID) AS ID,
MIN(dt2.FullName) AS FullName
FROM
(
SELECT
dt1.ID,
dt1.FullName,
GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted
FROM
(
SELECT e.ID,
e.FullName,
TRIM(REPLACE(
SUBSTRING_INDEX(e.FullName, ' ', gen.idx),
SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
'')) AS word
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;
| ID | FullName |
| --- | ------------------- |
| 1 | Marshall Wilson |
| 3 | Hill Lori |
| 5 | Casey Dean Davidson |
View on DB Fiddle
Thank you so so much. I see how much work you put into it but I'm afraid that query can be too heavy for larger base.
– fioljnw
Nov 10 at 19:47
@fioljnw yeah I know about the performance part. But unfortunately in your case, I feel that this is one of the few possible solutions, using MySQL. You may also explore Full Text Searching or use specialized databases for this kind of fuzzy searching like Apache SOLR etc
– Madhur Bhaiya
Nov 10 at 19:49
you give me great idea. I add new column with sorted substring in full name. In simpler and faster way, I would get unique values. Thank you!
– fioljnw
Nov 10 at 23:57
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
It requires lots of String operations, and usage of multiple Derived Tables. It may not be efficient.
We first tokenize the FullName
into multiple words it is made out of. For that we use a number generator table gen
. In this case, I have assumed that maximum number of substrings is 3. You can easily extend it further by adding more Selects, like, SELECT 4 UNION ALL ..
and so on.
We use Substring_Index()
with Replace()
function to get a substring out, using a single space character (' '
) as Delimiter. Trim()
is used to remove any leading/trailing spaces left.
Now, the trick is to use this result-set as a Derived table, and do a Group_Concat()
on the words such that they are sorted in a ascending order. This way even the duplicate names (but substrings in different order), will get similar words_sorted
value. Eventually, we simply need to Group By
on words_sorted
to weed out the duplicates.
Query #1
SELECT
MIN(dt2.ID) AS ID,
MIN(dt2.FullName) AS FullName
FROM
(
SELECT
dt1.ID,
dt1.FullName,
GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted
FROM
(
SELECT e.ID,
e.FullName,
TRIM(REPLACE(
SUBSTRING_INDEX(e.FullName, ' ', gen.idx),
SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
'')) AS word
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;
| ID | FullName |
| --- | ------------------- |
| 1 | Marshall Wilson |
| 3 | Hill Lori |
| 5 | Casey Dean Davidson |
View on DB Fiddle
Thank you so so much. I see how much work you put into it but I'm afraid that query can be too heavy for larger base.
– fioljnw
Nov 10 at 19:47
@fioljnw yeah I know about the performance part. But unfortunately in your case, I feel that this is one of the few possible solutions, using MySQL. You may also explore Full Text Searching or use specialized databases for this kind of fuzzy searching like Apache SOLR etc
– Madhur Bhaiya
Nov 10 at 19:49
you give me great idea. I add new column with sorted substring in full name. In simpler and faster way, I would get unique values. Thank you!
– fioljnw
Nov 10 at 23:57
add a comment |
up vote
1
down vote
accepted
It requires lots of String operations, and usage of multiple Derived Tables. It may not be efficient.
We first tokenize the FullName
into multiple words it is made out of. For that we use a number generator table gen
. In this case, I have assumed that maximum number of substrings is 3. You can easily extend it further by adding more Selects, like, SELECT 4 UNION ALL ..
and so on.
We use Substring_Index()
with Replace()
function to get a substring out, using a single space character (' '
) as Delimiter. Trim()
is used to remove any leading/trailing spaces left.
Now, the trick is to use this result-set as a Derived table, and do a Group_Concat()
on the words such that they are sorted in a ascending order. This way even the duplicate names (but substrings in different order), will get similar words_sorted
value. Eventually, we simply need to Group By
on words_sorted
to weed out the duplicates.
Query #1
SELECT
MIN(dt2.ID) AS ID,
MIN(dt2.FullName) AS FullName
FROM
(
SELECT
dt1.ID,
dt1.FullName,
GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted
FROM
(
SELECT e.ID,
e.FullName,
TRIM(REPLACE(
SUBSTRING_INDEX(e.FullName, ' ', gen.idx),
SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
'')) AS word
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;
| ID | FullName |
| --- | ------------------- |
| 1 | Marshall Wilson |
| 3 | Hill Lori |
| 5 | Casey Dean Davidson |
View on DB Fiddle
Thank you so so much. I see how much work you put into it but I'm afraid that query can be too heavy for larger base.
– fioljnw
Nov 10 at 19:47
@fioljnw yeah I know about the performance part. But unfortunately in your case, I feel that this is one of the few possible solutions, using MySQL. You may also explore Full Text Searching or use specialized databases for this kind of fuzzy searching like Apache SOLR etc
– Madhur Bhaiya
Nov 10 at 19:49
you give me great idea. I add new column with sorted substring in full name. In simpler and faster way, I would get unique values. Thank you!
– fioljnw
Nov 10 at 23:57
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
It requires lots of String operations, and usage of multiple Derived Tables. It may not be efficient.
We first tokenize the FullName
into multiple words it is made out of. For that we use a number generator table gen
. In this case, I have assumed that maximum number of substrings is 3. You can easily extend it further by adding more Selects, like, SELECT 4 UNION ALL ..
and so on.
We use Substring_Index()
with Replace()
function to get a substring out, using a single space character (' '
) as Delimiter. Trim()
is used to remove any leading/trailing spaces left.
Now, the trick is to use this result-set as a Derived table, and do a Group_Concat()
on the words such that they are sorted in a ascending order. This way even the duplicate names (but substrings in different order), will get similar words_sorted
value. Eventually, we simply need to Group By
on words_sorted
to weed out the duplicates.
Query #1
SELECT
MIN(dt2.ID) AS ID,
MIN(dt2.FullName) AS FullName
FROM
(
SELECT
dt1.ID,
dt1.FullName,
GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted
FROM
(
SELECT e.ID,
e.FullName,
TRIM(REPLACE(
SUBSTRING_INDEX(e.FullName, ' ', gen.idx),
SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
'')) AS word
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;
| ID | FullName |
| --- | ------------------- |
| 1 | Marshall Wilson |
| 3 | Hill Lori |
| 5 | Casey Dean Davidson |
View on DB Fiddle
It requires lots of String operations, and usage of multiple Derived Tables. It may not be efficient.
We first tokenize the FullName
into multiple words it is made out of. For that we use a number generator table gen
. In this case, I have assumed that maximum number of substrings is 3. You can easily extend it further by adding more Selects, like, SELECT 4 UNION ALL ..
and so on.
We use Substring_Index()
with Replace()
function to get a substring out, using a single space character (' '
) as Delimiter. Trim()
is used to remove any leading/trailing spaces left.
Now, the trick is to use this result-set as a Derived table, and do a Group_Concat()
on the words such that they are sorted in a ascending order. This way even the duplicate names (but substrings in different order), will get similar words_sorted
value. Eventually, we simply need to Group By
on words_sorted
to weed out the duplicates.
Query #1
SELECT
MIN(dt2.ID) AS ID,
MIN(dt2.FullName) AS FullName
FROM
(
SELECT
dt1.ID,
dt1.FullName,
GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted
FROM
(
SELECT e.ID,
e.FullName,
TRIM(REPLACE(
SUBSTRING_INDEX(e.FullName, ' ', gen.idx),
SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
'')) AS word
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;
| ID | FullName |
| --- | ------------------- |
| 1 | Marshall Wilson |
| 3 | Hill Lori |
| 5 | Casey Dean Davidson |
View on DB Fiddle
edited Nov 10 at 19:33
answered Nov 10 at 19:19
Madhur Bhaiya
16k52136
16k52136
Thank you so so much. I see how much work you put into it but I'm afraid that query can be too heavy for larger base.
– fioljnw
Nov 10 at 19:47
@fioljnw yeah I know about the performance part. But unfortunately in your case, I feel that this is one of the few possible solutions, using MySQL. You may also explore Full Text Searching or use specialized databases for this kind of fuzzy searching like Apache SOLR etc
– Madhur Bhaiya
Nov 10 at 19:49
you give me great idea. I add new column with sorted substring in full name. In simpler and faster way, I would get unique values. Thank you!
– fioljnw
Nov 10 at 23:57
add a comment |
Thank you so so much. I see how much work you put into it but I'm afraid that query can be too heavy for larger base.
– fioljnw
Nov 10 at 19:47
@fioljnw yeah I know about the performance part. But unfortunately in your case, I feel that this is one of the few possible solutions, using MySQL. You may also explore Full Text Searching or use specialized databases for this kind of fuzzy searching like Apache SOLR etc
– Madhur Bhaiya
Nov 10 at 19:49
you give me great idea. I add new column with sorted substring in full name. In simpler and faster way, I would get unique values. Thank you!
– fioljnw
Nov 10 at 23:57
Thank you so so much. I see how much work you put into it but I'm afraid that query can be too heavy for larger base.
– fioljnw
Nov 10 at 19:47
Thank you so so much. I see how much work you put into it but I'm afraid that query can be too heavy for larger base.
– fioljnw
Nov 10 at 19:47
@fioljnw yeah I know about the performance part. But unfortunately in your case, I feel that this is one of the few possible solutions, using MySQL. You may also explore Full Text Searching or use specialized databases for this kind of fuzzy searching like Apache SOLR etc
– Madhur Bhaiya
Nov 10 at 19:49
@fioljnw yeah I know about the performance part. But unfortunately in your case, I feel that this is one of the few possible solutions, using MySQL. You may also explore Full Text Searching or use specialized databases for this kind of fuzzy searching like Apache SOLR etc
– Madhur Bhaiya
Nov 10 at 19:49
you give me great idea. I add new column with sorted substring in full name. In simpler and faster way, I would get unique values. Thank you!
– fioljnw
Nov 10 at 23:57
you give me great idea. I add new column with sorted substring in full name. In simpler and faster way, I would get unique values. Thank you!
– fioljnw
Nov 10 at 23:57
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241947%2fmysql-select-distinct-for-words-in-different-order%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Why
Casey Dean Davidson
changed toCasey Davidson
?– Madhur Bhaiya
Nov 10 at 18:14
Thanks, It was my mistake ;)
– fioljnw
Nov 10 at 18:19
how do you assure that which one in the correct one. I mean
Marshall Wilson
ORWilson Marshall
?– FatemehNB
Nov 10 at 18:21
1
Is it possible to have more than 3 substrings in the name ?
– Madhur Bhaiya
Nov 10 at 18:26
1
@MadhurBhaiya, db-fiddle.com/f/qg1jVT23JSmfURjReWL1sn/0
– fioljnw
Nov 10 at 18:57