Search This Blog

January 27, 2010

Mastering the Metadata Part 1 (Tables #2)

n the last Metadata-themed post to my SQL blog, I discussed some of the metadata used to define tables – with particular emphasis on their storage.  I didn’t go deep into what the metadata available in the storage-specific tables describes, but I did give a brief description of their purpose and the kind of information they contain.  Today, I want to focus in more on the logical structure of tables, with a view towards being able to reverse engineer DDL statements to create a copy of the table.

I’ve already covered sys.tables and sys.objects, so I won’t rehash that ground.  Each table object logically contains (although strictly speaking, is made up of) one or more columns.  Defining columns starts with sys.columns.

Object_id refers to the parent object, which for the purposes of this discussion will be a table.  If you only want to get the object name for a single table, you can dereference the object_id to an object name using the Object_Name(<object_id>) system function as a means to avoid having to do a join against sys.tables or sys.objects.  The name column specifies the name of the column.  The column_id field gives you the ordinal for a given column in the table definition.  e.g. if we were to build the following table we would see 1 as the column_id for the field MyTable_ID and 2 as the column_id for MyTable_StringContent.

If we were to remove the WHERE clause from this query, we could return these results for all objects in the database that have columns defined in the sys.columns view.

System_type_id and user_type_id refer to data types that can be retrieved from the sys.types catalog view.  Max_Length, Precision and Scale are used to specify metadata specifying the length of the data type in each column.  Max_Length applies to character data and binary objects.  Precision and Scale apply to Numeric (AKA Decimal) data types, where precision defines the total number of digits that can be stored, and Decimal defines the number of digits to the right of the decimal point.

A number of bit-flags showing the boolean state of various column options follows the length metadata.  Most of these can be ignored in the column definitions if they are not actively set, with the exception of the Is_Null flag.  T-SQL table definitions require that any NOT NULL constraints are specified inline in their column specifications.  NULLable (1) is the default behaviour here, so we need to make sure we capture 0s rather than 1s in this column for scripting and configuration capture purposes.

Finally, the integer fields at the far right of the sys.columns view are used to specify external object dependencies (e.g. XML schema collections, Rule objects and parent objects), while the final two bit columns specify the boolean column-state with respect to SQL Server 2008’s sparse column feature.

I made mention of another important catalog view above – sys.types – which describes the physical data types of a given column.  The system_type_id and user_type_id columns refer to fields in this view, and can be used as lookup keys to determine the data-type primitives and any user-type dependencies we need to consider before scripting tables.

Think for a second about the following example.

I’ve created a SQL Server user type based on one of the base server types, then assigned that user type to the MyTable_String_Content column.  Using the inner joins listed in the query which link the user_type_id from sys.columns to the name in sys.types, and the system_type_id in sys.columns to the user_type_id of the primitive for that type, you can see that myTable contains a myStringContent typed column as expected, and that it is based on an nvarchar base system type.  We could get the length of the user type by querying for Max_Length in the sys.types table.

We can also query the is_nullable flag to see if the type is nullable, is_user_defined to see if we need to perform the secondary join to get a primitive base type, and to see if the type includes a T-SQL Rule object binding.  Note: Many of the system types are specific variations on a more basic type.  As such, it can be worthwhile adding the second join anyway.

Some of you might be wondering now, how is this useful?  Well… let’s say I want to build a database configuration snapshot tool.  I can use queries such as the ones above to build an inventory of the column definitions for each table, and in conjunction with system catalog views discussed last week, I can also build a base definition for the table itself.  Anyone who’s found that the backups for their SQL Server instance weren’t in great shape immediately understands the value of being able to run a set of T-SQL table creation scripts to generate copies of your tables on a nice fresh SQL Server install.

Other practical applications for this kind of metadata are as follows:

  • Querying user databases to list base tables and primitive data types when pulling together information about source systems for a data warehouse, data quality management system or enterprise system bus solution.
  • Generating OO data layer components from table definitions using an ORM tool such as LLBGen, MyGeneration, NHibernate or similar products.

In the case of a data warehouse solution, the table definitions from source database schemas can be “decorated” with additional metadata that supports the management extract, transform and load (ETL) operations, system auditing, idempotent load operations (load operations can be reprocessed multiple times without having to clean up previous load attempts) and slowly changing diimensions (SCDs).

The next post in this series will be the final one on table metadata – I’ll be looking at constraint and index definitions, and highlighting the catalog views that allow us to capture and/or script the behavioural elements of table design, as opposed to the table’s structure and physical storage properties.  In the meantime, thanks for your continued interest, and Code Well, Code Often!

No comments:

Post a Comment

There was an error in this gadget