Current Backups - and the changes that are here:
As a recent InfoStor article points out: tape-based backups traditionally have been chosen for enterprise-level backup and restore, primarily due to the fact that tape storage has, until now, been cheaper than disk-based storage. Additionally, tape-based backup provided portability over fixed, on-site backups.
But now, with the advent of USB-based ATA (SATA and PATA) drives, (hence, easily-detached after backups), as well as SAN-based drives, those reasons no longer are a major constraint. In addition, the cost of hard disk storage has plummeted such that it is now virtually cost-comparable to use hard drives as next-generation server backup media instead of slow, sequential tape media (first-generation backup solutions). Additionally, disk-based backups are considerably faster than tape-based backups and also provide random access, that is, direct access to backup and restore data vs. having to load an entire tape’s worth of data, or an entire sequential backup set.
Even though portability is no longer an issue, some combination of disk-based and tape-based backups might prove to be an optimal next-generation server backup solution. Still, all things considered, any such hodge-podge user-mixed solution is at best a second-generation backup solution; at least, without a vendor-provided, integrated solution such as IBM’s Tivoli TSM.
Some next-generation server backup and restore solutions such as IBM’s Tivoli TSM (formerly ADSM) are fully-integrated and have been in existence for some time; and they marry the best of all current worlds; thus they may be classified as third-generation backup solutions. TSM uses a robotic tape assembly containing a library of sequential tapes. Those tapes are controlled via the TSM backup client software placed on each server.
Several factors which make TSM a potential next-generation server backup solution are:
- The sequential tapes provide for dual-writing to two tapes simultaneously, so that one tape can be ejected from the tape silo, for off-site storage.
The tapes use a helical-scan technology and also start at the tape “mid-point” instead of at one end of the tape, virtually doubling backup and restore read/write speed. Depending on the file location, speed can more than double.
TSM allows utilization of a “sideline storage” pool of hard-drives, to which data initially can be backed up, reducing the amount of backup and restore time, and reducing or eliminating any downtime window, since backup to hard drives is much faster than to tapes.
The sideline storage pool can be set for tailored archiving; i.e., “When backup from server to sideline storage completes and/or after a certain time period, the sideline storage pool is backed up onto tapes.” This also virtually eliminates downtime for backups.
TSM provides for both hardware and software compression, thus producing smaller backup sets.
TSM now provides a companion Hierarchical Storage Management (HSM) solution for host-based data archiving, based on “data-use aging” or other user-tailored parameters; i.e., if the file has not been accessed (or modified) in x days (where ‘x’ is user-defined, such as 30, 60, 90), then “archive the data, but leave a small file-name stub header with an index pointer” (this is called ‘stubbing’ the data). Thus, a 45GB data file, having not been used in 6 months, might now consume only 2KB on the enterprise SAN disk, since it now has been “stubbed” and sent to tape. Upon user attempts to re-open the stubbed file, it is automatically restored from tape.
New Generations of Backups
Other next-generation server backup solutions (some still evolving) are as follows:
Fourth-generation backup and restore solutions: Products such as ExaGrid utilize only hard-disk and/or SAN-based backups, in conjunction with “data de-duplication;” i.e., Prevention of backing up identical files in separate locations on the target backup device. Obviously, this is accomplished via scanning files within the group of data being backed up and comparing CRC/checksum information; if the file is duplicated, a ‘stub pointer’ or ‘checkpoint’ is created to the copy that already has been noted as the ‘primary copy,’ and only that indexed stub pointer is backed up, thus potentially saving tremendous amounts of storage.
For example, imagine backing up an MS-Exchange exported mail store - that store might contain “Joe’s Joke of the Day,” duplicated to large numbers of users throughout the enterprise. De-duplication savings can be seen by using an example of a 6MB image file, sent to 3,000 users:
6,000,000 (6MB image file)
x 3,000 users
= 18,000,000,000 = 18GB storage required
With data de-duplication, only the initial copy of the 6MB file would be stored, thus saving approximately 12GB of storage.
Note that Exagrid’s solution also recommends sending a backup copy off-site and/or replicating the backed-up, de-duplicated data to a hot site or other off-site backup location. Since such backups are de-duplicated and compressed, the required across-the-wire bandwidth is significantly reduced compared to old-fashioned FTP of full image backups and compared to full, uncompressed, site-to-site replication.
Envisioning fifth-generation backup and restore: Encrypted, de-duplicated, compressed, secure, cloud-based backups; and that, dear reader, is an article unto itself. Basically, data to be backed up would be compressed, de-duplicated, and archived/backed up locally, then a copy sent via secure SSL and/or site-to-site VPN to a vetted third-party (i.e. “cloud-based”) company for secure off-site storage and retrieval; similar to the way physical tapes traditionally are sent to off-site secure storage companies. This truly will be the next-generation server backup solution.