Friday, December 26, 2014

Find/parse a string from within a string

So I noticed a few different questions and posts about parsing a string out of another string recently. While some solutions included creating new functions and etc it can also be done within a single query in some cases.

For example, let us say that we are looking to pull out the domain from a URL. I will try to go into detail as to why and how this works.
We have the following table.

CREATE TABLE `parse_example` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `urldemo` varchar(150) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;
+----+----------------------------+
| id | urldemo                    |
+----+----------------------------+
|  1 | http://www.mysql.com/      |
|  2 | http://www.percona.com/    |
|  3 | https://tools.percona.com/ |
|  4 | https://mariadb.com/       |
|  5 | http://planet.mysql.com/   |
|  6 | http://dev.mysql.com/doc/  |
+----+----------------------------+


The goal for this example is to disregard the http:// or https:// and anything after the .com. So we use LOCATE to find the locations.

The .com reference is easy since that is constant so we can start with that.

SELECT LOCATE('.com', urldemo), urldemo FROM parse_example;
+-------------------------+----------------------------+
| LOCATE('.com', urldemo) | urldemo                    |
+-------------------------+----------------------------+
|                      17 | http://www.mysql.com/      |
|                      19 | http://www.percona.com/    |
|                      22 | https://tools.percona.com/ |
|                      16 | https://mariadb.com/       |
|                      20 | http://planet.mysql.com/   |
|                      17 | http://dev.mysql.com/doc/  |
+-------------------------+----------------------------+


OK so we want to remove the / , what location is that?

SELECT LOCATE('.com', urldemo) as start, LOCATE('.com', urldemo) +4 as end, SUBSTRING(urldemo FROM LOCATE('.com', urldemo) + 4 )  AS resulting , urldemo FROM parse_example;
+-------+-----+-----------+----------------------------+
| start | end | resulting | urldemo                    |
+-------+-----+-----------+----------------------------+
|    17 |  21 | /         | http://www.mysql.com/      |
|    19 |  23 | /         | http://www.percona.com/    |
|    22 |  26 | /         | https://tools.percona.com/ |
|    16 |  20 | /         | https://mariadb.com/       |
|    20 |  24 | /         | http://planet.mysql.com/   |
|    17 |  21 | /doc/     | http://dev.mysql.com/doc/  |
+-------+-----+-----------+----------------------------+

This gives us our end position, I only put the field aliases in to make the results easier to follow.

Now sorting out after http and https is actually very easy as well as they both have :// after them, so we just need the location of the second / in the string.


SELECT LOCATE('/', urldemo) as first, LOCATE('/', urldemo) +1 as second, urldemo
FROM parse_example;
+-------+--------+----------------------------+
| first | second | urldemo                    |
+-------+--------+----------------------------+
|     6 |      7 | http://www.mysql.com/      |
|     6 |      7 | http://www.percona.com/    |
|     7 |      8 | https://tools.percona.com/ |
|     7 |      8 | https://mariadb.com/       |
|     6 |      7 | http://planet.mysql.com/   |
|     6 |      7 | http://dev.mysql.com/doc/  |
+-------+--------+----------------------------+


These queries are just showing what the different aspects of the final query will be doing. So let us put it all together.


SELECT
TRIM(TRAILING SUBSTRING(urldemo FROM LOCATE('.com', urldemo) + 4 )
FROM SUBSTRING(urldemo FROM LOCATE('/', urldemo) + 2 ) )  AS parsed_domain ,
urldemo as original_url
FROM parse_example;
+-------------------+----------------------------+
| parsed_domain     | original_url               |
+-------------------+----------------------------+
| www.mysql.com     | http://www.mysql.com/      |
| www.percona.com   | http://www.percona.com/    |
| tools.percona.com | https://tools.percona.com/ |
| mariadb.com       | https://mariadb.com/       |
| planet.mysql.com  | http://planet.mysql.com/   |
| dev.mysql.com     | http://dev.mysql.com/doc/  |
+-------------------+----------------------------+


Now hopefully that helps you be able to parse out whatever you need. This example is limited to a url. But since some examples of functions already here is my example of a function that you can use to parse whatever you need.



CREATE FUNCTION PARSE_STRING(delimiterA VARCHAR(50), delimiterB VARCHAR(50),  passed_string VARCHAR(255) )
RETURNS VARCHAR(255) DETERMINISTIC
RETURN
TRIM(TRAILING SUBSTRING(passed_string FROM LOCATE(delimiterB, passed_string)  )
FROM SUBSTRING(passed_string FROM LOCATE(delimiterA, passed_string) + CHAR_LENGTH(delimiterA) ) ) ;

SELECT PARSE_STRING('//','.com', urldemo) FROM parse_example;
+------------------------------------+
| PARSE_STRING('//','.com', urldemo) |
+------------------------------------+
| www.mysql                          |
| www.percona                        |
| tools.percona                      |
| mariadb                            |
| planet.mysql                       |
| dev.mysql                          |
+------------------------------------+


Pull a last name from a full name field:

SELECT PARSE_STRING('John ','', 'John Smith') ;
+----------------------------------------+
| PARSE_STRING('John ','', 'John Smith') |
+----------------------------------------+
| Smith                                  |
+----------------------------------------+


Pull the first name

SELECT PARSE_STRING('',' Smith', 'John Smith') ;
+-----------------------------------------+
| PARSE_STRING('',' Smith', 'John Smith') |
+-----------------------------------------+
| John                                    |
+-----------------------------------------+


Granted with the name examples you would need to know the delimiter values. But this is just an example you can build on.

1 comment:

  1. Good example, personally i prefer substring_index()

    select substring_index(substring_index(urldemo, '/', 3), '/', -1) AS parsed_domain,
    urldemo as original_url
    from parse_example;

    ReplyDelete

@AnotherMySQLDBA