Sunday, May 4, 2014

Change the COLLATION of a SQL Server Column

Problem
You've installed SQL Server with all the defaults, created your database and schema then loaded a bunch of data into your database. Now you realize that some or all of the columns in your database needed to have the COLLATION set to be case sensitive. This tip will take you through the steps required to change the column COLLATION of your tables from SQL_Latin1_General_CP1_CI_AS to SQL_Latin1_General_CP1_CS_AS.
Solution

Table Setup

Let's first setup a couple of tables in order to walkthrough these steps required to update the column collation. The following script will create two tables as well as some constraints, indexes and statistics in order to illustrate each of the steps that may be required depending on the design of your schema. For this example we won't add any data to our tables as it wouldn't have any effect on the update process since we are going from case insensitive to case sensitive. It's important to note that if you were performing the reverse update that is from case sensitive to case insensitive, you would also have an extra step of resolving any data issues as the update could result in duplicate data in columns with unique constraints. That said, here is the complete setup script listing.
CREATE TABLE dbo.MainTable
 (PKColumn int NOT NULL IDENTITY (1, 1),
  CharColumn char(10) NULL,
  NCharColumn nchar(10) NULL,
  VarcharColumn varchar(50) NULL,
  NVarcharColumn nvarchar(50) NULL,
  VarcharMaxColumn varchar(MAX) NULL,
  NVarcharMaxColumn nvarchar(MAX) NULL,
  ComputedColumn  AS CharColumn + VarcharColumn)
GO

ALTER TABLE dbo.MainTable ADD CONSTRAINT CK_Table_NVarCharMaxColumn
   CHECK (NVarCharMaxColumn IN ('Apple','Pear','Orange','Banana'))
GO

ALTER TABLE dbo.MainTable ADD CONSTRAINT PK_MainTable PRIMARY KEY CLUSTERED (PKColumn)
GO

CREATE NONCLUSTERED INDEX IX_Table_NVarcharColumn ON dbo.MainTable (NVarcharColumn) 
GO

CREATE STATISTICS Stats_MainTable_VarcharMaxColumn
    ON dbo.MainTable (VarcharMaxColumn) WITH FULLSCAN
GO

CREATE TABLE dbo.FKTable
 (CharColumn char(10) NOT NULL,
  DataColumn int NULL)
GO

ALTER TABLE dbo.FKTable ADD CONSTRAINT PK_FKTable PRIMARY KEY CLUSTERED (CharColumn) 
GO

ALTER TABLE dbo.MainTable ADD CONSTRAINT
 FK_MainTable_FKTable FOREIGN KEY
 (CharColumn) REFERENCES dbo.FKTable
 (CharColumn)
  ON UPDATE  NO ACTION 
  ON DELETE  NO ACTION 
GO
Identify columns
Depending on your situation you may need to only change the collation of a few columns or possible every column in your database. You can use the following query which will list any columns in your database that have the default collation.
SELECT t.name "Table Name",
       c.name "Column Name",
       c.collation_name "Collation"
  FROM sys.tables t INNER JOIN
       sys.columns c ON c.object_id=t.object_id INNER JOIN
       sys.types s ON s.user_type_id=c.user_type_id
 WHERE c.collation_name LIKE 'SQL_Latin1_General_CP1_CI_AS'
   AND t.type like 'U'
   AND t.name not like 'spt%'
   AND t.name not like 'MSrep%'
For this tip let's assume that we are changing the collation of every column in our database. If you read the following link, Set or Change the Column Collation, from TechNet you'll notice that there are some restrictions on when you can change the collation of a column. Basically, if a column is referenced by any of the following objects you will not be able to change the collation without first removing these references.
  • A computed column
  • An index
  • Distribution statistics, either generated automatically or by the CREATE STATISTICS statement
  • A CHECK constraint
  • A FOREIGN KEY constraint
Identify referencing objects
Let's now identify which objects need to be removed before we can make the collation update to our columns. The following queries, one for each category listed above, will identify objects that have a dependency on a column that we are going to be updating. Since we are updating all the columns in the database the only filter I have is "WHERE c.collation_name like 'SQL_Latin1_General_CP1_CI_AS'". If your requirements are different you can expand on this as required. I've also include a sample output after each query for reference. One other item to note is with the foreign key section. We need to remove the referenced primary key index in order to make the collation update to that column as well so that the section includes two queries. One to identify the foreign key and the other for the primary key.
-- computed columns
SELECT OBJECT_NAME(c.object_id) "Table Name",
       COL_NAME(sd.referenced_major_id, sd.referenced_minor_id) "Column Name",
       c.collation_name "Collation",
       definition "Definition"
  FROM sys.computed_columns cc INNER JOIN
       sys.sql_dependencies sd ON cc.object_id=sd.object_id AND 
                                  cc.column_id=sd.column_id AND 
                                  sd.object_id=sd.referenced_major_id INNER JOIN
       sys.columns c ON c.object_id=sd.referenced_major_id AND 
                        c.column_id = sd.referenced_minor_id
 WHERE c.collation_name like 'SQL_Latin1_General_CP1_CI_AS'
   AND sd.class=1

Table NameColumn NameCollationDefinition
MainTableCharColumnSQL_Latin1_General_CP1_CI_AS([CharColumn]+[VarcharColumn])
MainTableVarcharColumnSQL_Latin1_General_CP1_CI_AS([CharColumn]+[VarcharColumn])

--foreign keys
SELECT f.name "Foreign Key Name",
       OBJECT_NAME(f.parent_object_id) "Table Name",
       COL_NAME(fc.parent_object_id,fc.parent_column_id) "Column Name",
       c1.collation_name "Collation",
       OBJECT_NAME (f.referenced_object_id) "Reference Table Name",
       COL_NAME(fc.referenced_object_id,fc.referenced_column_id) "Reference Column Name",
       c2.collation_name "Collation"
  FROM sys.foreign_keys AS f INNER JOIN
       sys.foreign_key_columns AS fc ON f.OBJECT_ID = fc.constraint_object_id INNER JOIN
       sys.columns c1 ON c1.object_id=fc.parent_object_id AND
                         c1.column_id=fc.parent_column_id INNER JOIN
       sys.columns c2 ON c2.object_id=fc.parent_object_id AND
                         c2.column_id=fc.parent_column_id
 WHERE c1.collation_name like 'SQL_Latin1_General_CP1_CI_AS'
    OR c2.collation_name like 'SQL_Latin1_General_CP1_CI_AS'

-- primary keys
SELECT i.name AS "Primary Key Name",
       OBJECT_NAME(ic.object_id) "Table Name",
       COL_NAME(ic.object_id,ic.column_id) "Column Name",
       c.collation_name "Collation"
  FROM sys.indexes AS i INNER JOIN 
       sys.index_columns AS ic ON i.object_id = ic.object_id AND 
                                  i.index_id = ic.index_id INNER JOIN
       sys.columns c ON ic.object_id=c.object_id AND
                        c.column_id=ic.column_id
 WHERE i.is_primary_key=1
   AND c.collation_name like 'SQL_Latin1_General_CP1_CI_AS'

Foreign Key NameTable NameColumn NameCollation
FK_MainTable_FKTableMainTableCharColumnSQL_Latin1_General_CP1_CI_AS
Reference Table NameReference Column NameCollation
FKTableCharColumnSQL_Latin1_General_CP1_CI_AS

Primary Key NameTable NameColumn NameCollation
PK_FKTableFKTableCharColumnSQL_Latin1_General_CP1_CI_AS

--indexes
SELECT i.name AS "Index Name",
       OBJECT_NAME(ic.object_id) "Table Name",
       COL_NAME(ic.object_id,ic.column_id) "Column Name",
       c.collation_name "Collation"
  FROM sys.indexes AS i INNER JOIN 
       sys.index_columns AS ic ON i.object_id = ic.object_id AND 
                                  i.index_id = ic.index_id INNER JOIN
       sys.columns c ON ic.object_id=c.object_id AND
                        c.column_id=ic.column_id
 WHERE c.collation_name like 'SQL_Latin1_General_CP1_CI_AS'
   AND i.is_primary_key <> 1
   AND OBJECT_NAME(ic.object_id) NOT LIKE 'sys%'

Index NameTable NameColumn NameCollation
IX_Table_NVarcharColumnMainTableNVarcharColumnSQL_Latin1_General_CP1_CI_AS

-- statistics
SELECT s.name "Statistics Name",
       OBJECT_Name(c.object_id) "Table Name",
       COL_NAME(c.object_id,c.column_id) "Column Name",
    c.collation_name "Collation"
  FROM sys.stats s INNER JOIN
       sys.stats_columns sc on s.stats_id=sc.stats_id AND 
                               s.object_id=sc.object_id INNER JOIN
       sys.columns c ON c.object_id=sc.object_id AND 
                        c.column_id=sc.column_id 
 WHERE c.collation_name like 'SQL_Latin1_General_CP1_CI_AS'
   AND s.user_created=1
   AND OBJECT_NAME(c.object_id) NOT LIKE 'sys%'
   AND OBJECT_NAME(c.object_id) NOT LIKE 'MSrep%'

Statistics NameTable NameColumn NameCollation
Stats_MainTable_VarcharMaxColumnMainTableVarcharMaxColumnSQL_Latin1_General_CP1_CI_AS

-- check constraints
SELECT OBJECT_NAME(cc.object_id) "Constraint Name",
       OBJECT_Name(c.object_id) "Table Name",
       COL_NAME(sd.referenced_major_id, sd.referenced_minor_id) "Column Name",
       c.collation_name "Collation",
       definition "Definition"
  FROM sys.check_constraints cc INNER JOIN
       sys.sql_dependencies sd ON cc.object_id=sd.object_id INNER JOIN
       sys.columns c ON c.object_id=sd.referenced_major_id AND 
                        c.column_id = sd.referenced_minor_id
 WHERE c.collation_name like 'SQL_Latin1_General_CP1_CI_AS'
   AND cc.type = 'C'
   AND sd.class=1

Constraint NameTable NameColumn NameCollation
CK_Table_NVarCharMaxColumnMainTableNVarcharMaxColumnSQL_Latin1_General_CP1_CI_AS
Definition
([NVarCharMaxColumn]='Banana' OR [NVarCharMaxColumn]='Orange' OR [NVarCharMaxColumn]='Pear' OR [NVarCharMaxColumn]='Apple')

Drop referencing objects

Now that we've identified everything that we need to remove we can use any tool, including SSMS, to create drop scripts for these objects. Remember to at the same time create a script that will recreate these objects so we can run that after our column definition update is done. Here is the script to drop the objects identified above.
ALTER TABLE dbo.MainTable DROP COLUMN ComputedColumn
ALTER TABLE dbo.MainTable DROP CONSTRAINT FK_MainTable_FKTable
ALTER TABLE dbo.FKTable DROP CONSTRAINT PK_FKTable
DROP INDEX dbo.MainTable.IX_Table_NVarcharColumn
DROP STATISTICS dbo.MainTable.Stats_MainTable_VarcharMaxColumn
ALTER TABLE dbo.MainTable DROP CONSTRAINT CK_Table_NVarCharMaxColumn

Update the collation

After all that work we can now issue simple ALTER TABLE ... ALTER COLUMN ... statements to update the collation for each column in our database. Here is the script to perform this operation.
ALTER TABLE dbo.MainTable
 ALTER COLUMN NCharColumn nchar(10) COLLATE SQL_Latin1_General_CP1_CS_AS NULL
ALTER TABLE dbo.MainTable
 ALTER COLUMN CharColumn char(10) COLLATE SQL_Latin1_General_CP1_CS_AS NULL
ALTER TABLE dbo.FKTable
 ALTER COLUMN CharColumn char(10) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL
ALTER TABLE dbo.MainTable
 ALTER COLUMN VarcharColumn varchar(50) COLLATE SQL_Latin1_General_CP1_CS_AS NULL
ALTER TABLE dbo.MainTable
 ALTER COLUMN NVarcharColumn nvarchar(50) COLLATE SQL_Latin1_General_CP1_CS_AS NULL
ALTER TABLE dbo.MainTable
 ALTER COLUMN VarcharMaxColumn varchar(MAX) COLLATE SQL_Latin1_General_CP1_CS_AS NULL
ALTER TABLE dbo.MainTable 
 ALTER COLUMN NVarcharMaxColumn nvarchar(MAX) COLLATE SQL_Latin1_General_CP1_CS_AS NULL

Recreate referencing objects

Once the schema update completes we can run the script we generated above to recreate all of the referencing objects that we dropped earlier. Here is the script for our example.
ALTER TABLE dbo.MainTable ADD ComputedColumn AS CharColumn + VarcharColumn
GO
ALTER TABLE dbo.MainTable ADD CONSTRAINT CK_Table_NVarCharMaxColumn
   CHECK (NVarCharMaxColumn IN ('Apple','Pear','Orange','Banana'))
GO
CREATE NONCLUSTERED INDEX IX_Table_NVarcharColumn ON dbo.MainTable (NVarcharColumn) 
GO
CREATE STATISTICS Stats_MainTable_VarcharMaxColumn
    ON dbo.MainTable (VarcharMaxColumn) WITH FULLSCAN
GO
ALTER TABLE dbo.FKTable ADD CONSTRAINT PK_FKTable PRIMARY KEY CLUSTERED (CharColumn) 
GO
ALTER TABLE dbo.MainTable ADD CONSTRAINT
 FK_MainTable_FKTable FOREIGN KEY
 (CharColumn) REFERENCES dbo.FKTable
 (CharColumn)
  ON UPDATE  NO ACTION 
  ON DELETE  NO ACTION 
GO

0 التعليقات:

Post a Comment