Wednesday, March 12, 2008

Microsoft SQL Server 2008 - Get your cheese moved!

Hi comrades,

Below are the new features of Microsoft SQL Server 2008, I discussed at seminar held at Microsoft Mumbai by MumbaiITPro
(groups.msn.com/mumbaiitpro/) on 8th March 2008

New Variable Declaration

We have been well versed with the variable declaration syntax in the SQL Server, that we just declare the variable with datatype and if required with size.
Like

Declare
@variable int

And to initialize that variable we write set statement. Now with SQL Server 2008 both this activities (i.e. variable declaration and initialization) merged into one line.
Like

Declare @variable1 int = 10
Declare @variable2 int = @variable1 + 30

If you could recall your old C days…in which we used compound assignment to perform some mathematical operation at the time of assignment. Now this ease is shipped with SQL Server 2008.
Where you can assign to a variable like

Declare @variable int = 10
SET @variable += 1

So now we can cut short our initial codes while writing into SQL Server 2008

MERGE

Back before SQL Server 2005 released, one feature was highly discussed – and that was MERGE, also called “UPSERT”, which has ability to insert and update a table data into a single DML statement. But due to some glitches it didn’t make into RTM release of SQL Server 2005. But thankfully it has made into SQL Server 2008

As just mentioned new MERGE statement will be used in a single DML statement that help you insert new data into a target table and update or delete existing data in that target table directly or using a view.

Syntax:

MERGE [AS ALIAS]
USING '<'table'>' [AS ALIAS]
ON

[WHEN MATCHED [AND ]
THEN [ UPDATE …. DELETE….]]

[WHEN TARGET NOT MATCHED [AND ]
THEN INSERT…]

[WHEN SOURCE NOT MATCHED [AND
THEN [ UPDATE..DELETE..]];

Note:
Although using a semicolon as a statement terminator is not mandatory in most T-SQL statements, but in MERGE the semicolon is must as a statement terminator.

Example:

MERGE Customer As C
USING NewCustomer As NC
ON C.CustomerId = NC.CustomerId
WHEN TARGET NOT MATCHED THEN
INSERT (FName, LName, Phone)
VALUES(NC.FName, NC.LName, NC.Phone)

WHEN MATCHED THEN
UPDATE SET
FName = NC.FName,
LName = NC.LName,
Phone = NC.Phone;

However it is self-explanatory example, little explanation: Above example merge NewCustomer table data into existing Customer table. While doing this, MERGE statement checks whether the customer is already exists or not, based on CustomerId. If the record is not found [read TARGET NOT MATCHED], new record is being inserted. If record is available then [read WHEN MATCHED THEN], rest of the columns are updated for the same CustomerId.

CDC (Change Data Capture)

As an avid user of SQL 2000, we are habituated writing triggers if we want to spy on the on-going activities over the table. We track all the DML statements for the table and we use it for auditing purpose. Now with the advent of CDC (Change Data Capture), SQL Server 2008 would keep the log on behalf of you. So now you don’t require writing explicit triggers for the same.
Technically, SQL Server Agent writes this log, so before enabling CDC feature for your database you need to start the SQL Server Agent if not running.
To make sure, whether your database is CDC enabled or not, you need to execute following statement.

select [name], database_id, is_cdc_enabled from sys.databases

Value of is_cdc_enabled would be 1 if the respective database is CDC enabled, would be 0 in case of otherwise. To make the database CDC enabled execute following system SP with required parameters

Use
exec sys.sp_cdc_enable_db


When CDC is enabled on a database, a new user named "CDC", a schema named "CDC" and the following tables are created on the related database.

cdc.captured_columns
cdc.change_tables
cdc.ddl_history
cdc.index_columns
cdc.lsn_time_mapping


After enabling the database for CDC, now you can configure tables that you wish to track the modifications upon by running the sys.sp_cdc_enable_table system stored procedure.

Before running the sys.sp_cdc_enable_table stored procedure you can check whether the related table is enabled prior for data change capture by querying the is_tracked_by_cdc column value in the sys.tables for the related database table.

select [name], is_tracked_by_cdc from sys.tables

Example:
If we make Country table CDC enabled, as mentioned above in CDC schema one more table would be created with name dbo_Country_CT (Syntactically username_tablename_CT). In this table apart from columns of relevant table, some metadata columns are also created that prefix with __$ sign. It carries the audited details for that table. Out of them the most important column is __$Operation that signifies a kind of DML operation was occurred. Following are the denotations for the same.

1 – Delete statement, 2 – Insert statement, 3 – record before update, 4 – record after update

One can have the obvious fear if we are logging the data in this fashion we may end-up in running short of disk space. To this end – we have the freedom to decide for how long we want to save the data in CDC schema. By default it saves for 3 days and then data wipes out automatically. We need to change one of the parameters (from CDC schema tables). I don’t know which table and parameter it could be. I need some workaround for the same. To clean up the change tables execute following system stored procedure.

exec sys.sp_MScdc_cleanup_job

Minimally Logged INSERT

Normally, when you write data to a database, you must write it to disk twice: once to the log, and once to the database itself. This is because the database system uses an undo/redo log so it can rollback or redo transactions when needed. But it’s possible to write the data to disk only once in some important cases that involve inserting data into existing tables. This is what the new minimally logged INSERT feature does in SQL Server 2008.

Minimal logging consists of logging only the information that is required to rollback the transaction without supporting point-in-time recovery. Minimal logging is only available under the bulk logged and simple recovery models. When a minimally logged transaction is committed, a checkpoint is issued to flush the dirty data pages to disk and truncate the log. Minimal logging greatly improves large scale INSERT operations by increasing performance and reducing the amount of log space required. For a discussion of table requirements for minimal logging, see SQL Server Books Online. In particular, you must use table locking (TABLOCK) on the target table.
Operations that can be minimally logged in SQL 2005 include bulk import operations, SELECT INTO, and index creation and rebuild. SQL 2008 extends the optimization to INSERT INTO…SELECT FROM T-SQL operations that insert a large number of rows into an existing table under either of the following conditions:

1.Inserting into an empty table that has a clustered index and no nonclustered indexes.
2.Inserting into a heap that has no indexes but that can be non-empty

Data Compression

Other than backup compression, SQL Server 2008 has two different approaches to shrink the data on a disk. I would love to address backup compression in my future posts. Right now we will be looking into the other two approaches.

ROW Compression: It enables storing fixed length types in variable length storage format. As for example, int is a fixed length types, any column declared with int will always occupy 4 bytes. So assume a table with int column only and with 5 records into it. The total size would be 20 bytes. Now assume that values are between 0 to 255 in this table, each value technically requires 1 byte. But because they are under int declaration it will occupy 3 bytes extra per each record. Now with the concept of vardecimal data type, we can address this issue. Columns declared with this type will occupy only required number of bytes, means they are varying in nature unlike fixed length types such as (int, float, bigint). This approach is called ROW compression. The amount of disc space you save here depends upon your schema design and data distribution across the table. The very important thing to note here is that, if existing application has conventional datatypes mapped with your database, still you can change it to vardecimal without affecting or changing the application. This datatype is available with SQL Server 2005 SP2 onwards.

Page Compression: Page Compression is built on top of ROW compression. It minimizes storage of redundant data on the page by storing commonly occurring byte patterns on the page once and then referencing these values for respective columns. The byte pattern recognition is type-independent. One example of such a compression is as follows.
One table has a column with varchar definition and with default assigned. So for newly inserted records if data is not given into that column, database will hold default value for that column. Imagine that table has huge number of records with such default. In this scenario SQL Server 2008 would copy the byte pattern of default and would store at once only, rest of the records would refer that one value only as long as they are not changed.


Transparent Database Encryption

Transparent Database Encryption (TDE) is a new encryption feature introduced in Microsoft SQL Server 2008. It is designed to protect the entire database without affecting existing applications. Traditionally (including SQL Server 2005), database encryption involves complicated application changes such as modifying table design, removing functionality. It all leads to performance compromises. E.g. In SQL Server 2005, the column data type must be changed to varbinary. Ranged and equality search not allowed in encrypted data. Application must call some built-ins to handle encryption and decryption. Consequently database performance degrades. Even basic database elements such as creating an index or using foreign keys often do not work with cell-level or column-level encryption schemes because the use of these features inherently leak information. TDE solves these problems by simply encrypting everything. Thus, all data types, keys, indexes, and so on can be used to their full potential without sacrificing security or leaking information on the disk.

To enable TDE, you must have the normal permissions associated with creating a database master key and certificates in the master database. You must also have CONTROL permissions on the user database.

To enable TDE
Perform the following steps in the master database:
1. If it does not already exist, create a database master key (DMK) for the master database. Ensure that the database master key is encrypted by the service master key (SMK).

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'some password';

2. Either create or designate an existing certificate for use as the database encryption key (DEK) protector. For the best security, it is recommended that you create a new certificate whose only function is to protect the DEK. Ensure that this certificate is protected by the DMK.

CREATE CERTIFICATE tdeCert WITH SUBJECT = 'TDE Certificate';

3. Create a backup of the certificate with the private key and store it in a secure location. (Note that the private key is stored in a separate file—be sure to keep both files). Be sure to maintain backups of the certificate as data loss may occur otherwise.

BACKUP CERTIFICATE tdeCert TO FILE = 'path_to_file'
WITH PRIVATE KEY (
FILE = 'path_to_private_key_file',
ENCRYPTION BY PASSWORD = 'cert password');


4. Optionally, enable SSL on the server to protect data in transit.
Perform the following steps in the user database. These require CONTROL permissions on the database.

5. Create the database encryption key (DEK) encrypted with the certificate designated from step 2 above. This certificate is referenced as a server certificate to distinguish it from other certificates that may be stored in the user database.

CREATE DATABASE ENCRYPTION KEY
WITH ALGORITHM = AES_256
ENCRYPTION BY SERVER CERTIFICATE tdeCert


6. Enable TDE. This command starts a background thread (referred to as the encryption scan), which runs asynchronously.

ALTER DATABASE myDatabase SET ENCRYPTION ON

To monitor progress, query the sys.dm_database_encryption_keys view (the VIEW SERVER STATE permission is required) as in the following example:

SELECT db_name(database_id), encryption_state
FROM sys.dm_database_encryption_keys


When TDE is enabled on any user database, encryption is also automatically enabled for the temporary database (tempdb). This prevents temporary objects that are used by the user database from leaking to disk. System databases other than tempdb cannot currently be encrypted by using TDE.

When TDE is enabled on a database, all backups are encrypted. Thus, special care must be taken to ensure that the certificate that was used to protect the DEK is backed up and maintained with the database backup. If this certificate (or certificates) is lost, the data will be unreadable. Back up the certificate along with the database. Each certificate backup should have two files; both of these files should be archived (ideally separately from the database backup file for security).

Simple but useful facts

GROUPING SET – This new SQL construct is single equivalent to that of ROLLUP, CUBE and UNION ALL. Do note, that those are not replaced with this new construct, they are still supported while GROUPING SET is ANSI SQL 2006 compliant
One more function is added into the rich library of SQL and that is GROUPING_ID(), it returns the level of group and it is the enhancement over GROUPING() function that returns a Boolean value whether the grouping is on a set of record or not.
Another beautiful addition is the concept of SPARSE columns- Technically SQL Server supports 1024 columns per table, but with SPARSE columns you can push this limit.
Intellisense Like all other user friendly GUI interfaces, SQL editor has now ability to sense your intellect – popularly known as intellisense
Number of CPU Support: Now with SQL Server 2008, you can plug virtually n number of CPUs, provided that numbers should be supported by OS. This is true to enterprise version of the product, while standard version would support up to 4 CPUs.
Now while plugging extra CPU, you don’t have to have the downtime for your database system, as it is just plug and play with SQL Server 2008. We know that SQL Server 2005 has ability to plug and play with memory.

I wish you had a good session with it!