String datatypes are, by far, the most commonly used datatypes in any type of database. They’re also probably the easiest to understand. As the name suggests, string datatypes are composed of character strings, either of fixed or varying length. Examples of columns that would use strings of variable length include items such as name, city, and country.
Fixed-length strings are used by columns that contain data where all possible entries have exactly the same length. Columns such as telephone number, state postal abbreviation, and social security number are perfect candidates to define as a fixed-length string.
Although fixed-length strings are a subset of variable-length strings, it’s always a good idea to define the datatype as fixed-length if possible. Database management systems handle fixed-length strings much more efficiently than variable-length strings. This may not seem like such a big deal if the size of the database you are working with is relatively small, but it can make a huge difference when sorting and retrieving data from larger structures.
SQL Syntax for String Datatypes
Not all database management systems use exactly the same syntax for SQL datatypes, but here is a listing of some of the more common characterizations.
CHAR(X) – This datatype is used to define a fixed-length character string with length X. For example, a column designated to hold the social security number of an individual would be assigned the datatype CHAR(9).
NCHAR(X) – Similar to the CHAR datatype, the NCHAR datatype will support Unicode string data rather than just standard character strings. This Unicode option allows data to contain characters from the Universal Character Set in addition to standard characters.
While standard characters are included within the set of Unicode characters, it is still preferable to use CHAR over NCHAR for efficiency reasons whenever possible. Due to the additional number of elements available for Unicode data, NCHAR fields take up twice as much storage space as CHAR fields.
VARCHAR(X) – This datatype defines a variable-length character string with length less than or equal to X. A column set up to hold the last name of a person might make the assumption that each possible entry will have no more than 25 letters and use the datatype VARCHAR(25).
The TEXT datatype is used similarly to the VARCHAR datatype in some database management systems. If you’re working in a system that supports both the VARCHAR and TEXT datatypes, such as Microsoft SQL Server, it’s best to use the VARCHAR option. Not only is this true for efficiency reasons, but Microsoft has announced that the TEXT datatype will be removed in a future version of SQL Server.
NVARCHAR(X) – This is the Unicode equivalent to the VARCHAR datatype. Just as with its CHAR counterparts, it’s always better to use VARCHAR rather than NVARCHAR if possible.
This post is part of the series: Datatypes Used in SQL Programming
Choosing the right datatypes for your database columns in the initial design can save you from having a lot of headaches later on. In this series, we explore the definitions of datatypes used in SQL programming and when it is best to use each one.