Pin Me

Manage Information Using the Select Disinct SQL Query

written by: S. R. Obbayi•edited by: Michele McDonough•updated: 5/31/2011

You might be having trouble with inconsistent data -- more specifically, with duplicate information. There is a solution, which lies with the SQL select distinct query modifier. The rest of this article explains more about it.

  • slide 1 of 6

    SQL, or Structured Query Language, is popularly used to manage stored data within supported relational database management systems. This data can be stored in various formats, which are in no way meaningful to the naked eye. To extract data in a more sensical fashion from these relational databases, we use the SQL SELECT command.

  • slide 2 of 6

    The SQL SELECT Statement

    The SQL SELECT command does not work in a vacuum. It combines with other commands, clauses, functions and operators to provide very powerful snapshots of your stored data. The combination of all these together is called a SELECT statement. SQL statements are interpreted by the database engine, which in turn extracts the data and returns it to the caller. Before we move on, it is important to note that the keywords in SQL statements are not case sensitive.

    A simple SQL Statement can read as follows:

    SELECT * FROM students

    This is as simple as it gets where the SELECT keyword begins the statement followed by the list of columns in the table that stores your data. In the case above, we use the asterisk as a shorthand for ALL as in all rows. The keyword FROM follows specifying the table which the specified rows belong to. That is a simple SELECT statement. In short the above statement would extract all the data in all the columns from a table named students.

  • slide 3 of 6

    What Else Can SELECT Statements Do?

    The SQL SELECT statement is responsible for extracting all the data you require based on certain criteria embedded as parameters within the statement in addition to simple calls like the one in the example above. SELECT statements can do more than just simple extractions of data. This subject can get quite complex due to the nature of the SQL statement. Just to mention a few. A SQL statement can be used as a calculator for example:

    SELECT 1+1

    This would return 2 as the result.

    With built-in functions, these statements can do calculations on dates and on values in separate columns, among other things.

  • slide 4 of 6

    The SQL DISTINCT Keyword

    The SQL SELECT DISTINCT statement is variation to the SELECT statement that provides more optimized and specific information from the database to the user. The DISTINCT keyword introduced here acts more as a modifier to the existing SELECT statement. In many cases it can be used harmlessly in almost any SELECT statement.

  • slide 5 of 6

    What Does DISTINCT in SQL Statements Do?

    pet database table The DISTINCT keyword modifies the results of the specified SQL statement ensuring that it returns unique rows in its result set. For example, suppose we have a list of pets' names and their species in a pet store and we want to know how many species of animals are in the store. Assume we have a table with two columns – name and species – with the data as shown in the diagram on the left.

    Let's run the following SQL statement on the table:

    SELECT species from pets;

    This would give us the result in the following form:

    cat

    cat

    cat

    dog

    dog

    You can see the repetition is quite disturbing. Using the DISTINCT modifier like so:

    SELECT DISTINCT species from pets;

    We get our refined results as:

    cat

    dog

    In the second case our results are more palatable as we do not get repetition in the result set. What about running the following query?

    SELECT name, species from pets;

    This would return the entire table as is. If we are to run this distinct version:

    SELECT DISTINCT name, species from pets;

    This would also return the same result as though the DISTINCT keyword was never used. What happened here? Here the SQL database engine has evaluated the entire column as a single unit and is looking for distinct combinations factoring all rows. In this case, DISTINCT does not make a difference. It will still be wise to use the DISTINCT version regardless as you do not know how many rows you are dealing with. Your store could house thousands of pets and your data entry clerk could as well make a double entry and you would then need to filter out his additional entry.

  • slide 6 of 6

    References

    • MySQL Docs, http://dev.mysql.com/doc/
    • Image provided by writer