Before we get down to discussing best practices, it would be good to note that the tips that will be referred to here are desirable for most major database management systems.
Databases are notorious for manifesting degradation in performance as they grow, especially if left unchecked. Generally, databases with tables containing rows in the millions may not be considered large as many can be found containing rows in the billions. So let us dive right in and see what can be done to make your large database perform at its optimum level.
The first thing to make sure is that you have the right hardware with plenty of RAM. It will also be beneficial if you have a 64-bit operating system and array disks. Data spread over several disks can be read faster as opposed to data being queried on a single storage disk. If possible, have dedicated machines for your database as well. On older systems, make sure you are using the right file system. You do not want a file system that will limit your file size to 2 GB and end up freezing the scaling of your tables.
Right Database Server Software
You will need to choose the right database server. Fortunately most of the major vendors today have the muscle to handle large databases. The configuration of the server is what becomes an issue. Take MySQL for example. Depending on what kind of queries, updates or inserts that will be happening on your database you may need to choose the right storage engine. The wrong storage engine can seriously cripple your database. Also consider using Grid or Clustering and Table Partitioning. You can partition your tables horizontally by row or vertically by column.
Log and Statistics Management
Statistics logging on a database for a busy web server can have serious performance penalties. Such insert and update intensive work is best performed on a separate server, or on plain text files saved to the file system. If you must save your log into the database, it is best to have a table designed without indexes to allow for fast inserts. The logs should be processed on a separate server as well.
Optimize Your Tables
Make sure you have the right indexes set up for your tables. You will need to determine the roles of your tables – for example, is your table primarily inserting data or being updated versus it primarily being queried as read only? This will determine how to set up the indexes for best performance. The rule of thumb is heavily read tables should be indexed well while tables primarily with inserts should have less emphasis on indexing.
Database Schema Design
Plan and design your schema well in advance. You will need to determine how much normalization you want in your tables. Here you will also decide how many tables you need and the number of columns you need in your tables. These design decisions can also factor in whether you may need multiple databases as opposed to a single database. You also need to be aware that as the database scales, you may need to make some changes to the schema. A good design will make this process less painful.
Splitting Up of Data
As your database scales you may notice via the logs that the majority of the queries are run against a minimal percentage of the data. In this case you might want to split up your data into tables containing old data and another containing current data, for something like the past week. This current data table will be much smaller and can give better performance results.
Optimize Your Queries
You can use a query profiler for the database server you are using. This will give you a snap shot into what kind of queries are hitting your database, what queries are slow and what queries are fast. The logging will also give you an insight into what times of the day and night your traffic comes in. With this information, you will be able to optimize your queries.
Use automated tools to monitor the performance of your server at all times, right from the hardware and the operating system to the database server itself. Database maintenance tasks should be schedules to a time when there are the least queries on your server. Do incremental backups on your server. Old data does not need to be backed up over and over again and the same goes for data that does not change.
There you have it, with these guidelines you now have the best practices for large SQL database management. All in all, optimization of your database and performance monitoring is a continuous process that will aid you in keeping your large databases running as efficiently as possible.
- MySQL Docs, https://dev.mysql.com/doc/refman/5.1/en/partitioning.html
- Image Credit: Wikimedia Commons/Gtgray1948c