Search This Blog

Loading...

January 13, 2010

Mastering The Metadata: Part 1 (Tables #1)

In this post, I’m going to outline the core metadata SQL Server 2008 uses for Tables, and how you might be able to use this metadata in a useful manner.  In this post, I’ll only go as far as outlining the potential for using table metadata to discover storage characteristics of the table.  I’ll provide more detailed examples in my next post of how you can put this to work.  Before we begin, a brief history lesson to do some context setting.

Once upon a time in the distant past, a relational database product called SQL Server was birthed unto the world.  Twice.  Once by a company called Sybase, and later, by an upstart operating system developer called Microsoft.  Microsoft helped Sybase and Ashton-Tate (developers of the dBase DBMS) to port SQL Server to their LAN-Manager 2.0 operating system, which later evolved into Windows NT 3.0.  When Ashton-Tate dropped out of the mix, Microsoft obtained a license from Sybase to develop the product on the MS Windows platform.  In the meantime, Sybase continued to develop their product for various *NIX flavours and other PC-based platforms.
As the products evolved, both Microsoft and Sybase both described new functionality through additions to the system catalog.  Microsoft went through an major leap at the SQL Server 7.0 release, which moved away from the old Sybase code-base to one that was more open, extensible and standards-compliant. Sybase and Microsoft took different approaches to managing catalog changes.  Microsoft maintained a copy of the old catalog, first as tables, to which additional fields could be appended using a shared PK; then moving on to creating new tables to replace the old ones, but adding views with the name of the old catalog tables.  Between SQL Server 7.0 and 2008 there’s been a number of changes to the approach.  In SQL Server 2008, all system metadata is now stored in the resource database (including user DB schemas).  Schema information for each database is referenced via views in the “sys” schema.  A version of the old “sys<blah>” SQL Server 6.x schema objects remains for backward compatibility purposes, but I’ll be describing metadata in the newer format. 
Sorry for the diversion, but sometimes a little background makes the present situation much easier to understand.  Onwards to Table Metadata!

Here we are, back at the original topic of table metadata.  Let’s start with sys.objects.

Sys.objects

Sys.Objects contains a row for each object in the database.  In data modelling terms, it serves as a master type for a number of more specific types, which we know as tables, views, stored procedures, constraints, keys, functions, triggers, etc.  The single most important thing to know is that sys.objects lets us get an object_id for our object using a join (instead of a function) which can yield better performance if we’re loading information about a LOT of objects at a time.  If there’s any overloads on object names (e.g. objects with the same in different schemas/owners within a database, or views over a table with the same name) we can also extract keys for the schema, owner and object type from this view.  Ultimately, what we’re after in most cases is a single unique set of identifiers for the objects we want information on.

The rest of the metadata in this view provides details on the object’s lineage (create/modified date, whether it’s been created by Microsoft or a third party) and whether the object  The is_published and is_schema_published columns are replication metadata.  If is_published = 1 (true) then the object is published for replication.  If is_schema_published = 1 (true) then the object definition itself is also published via replication.

Sys.tables

In data modelling terms, the sys.tables view is effectively one of several subtypes of the sys.objects view.  As such, it contains information that is common with its logical parent entity.

You can see in the SSMS snip above that name, object_id, and so on are all identical to the records in sys.objects down as far as is_schema_published.  From there on, all of the columns are specific to table objects.  These columns are described in detail in the MSDN Library for SQL Server 2008.
A number of SQL Server 2008 specific columns are worth pointing out.
The following columns in sys.tables are useful for identifying the placement of table data of various types (on either a file-group or partition schema).
  • lob_data_space_id
  • file_stream_data_space_id
These fields can both be joined against the data_space_id column in the following catalog views:
  • sys.filegroups (if you want to find the File Group name)
  • sys.partition_schemes (if you want to find the name of the partitioning scheme to bind the table to)
  • sys.indexes (if you want to identify the file-groups that indexes attached to this table are defined on)
  • sys.allocation_units (useful when you want to find the number of pages allocated to the table, its indexes and/or out-of-row LOB storage)
  • sys.fulltext_catalogs (useful for finding paths to system catalogs – be warned though – this view is due to be changed in a future release, so don’t hard-code references to data_space_id in this view into applications)
  • sys.data_spaces (useful for getting a name back for a data space if you don’t need to distinguish between file-groups and partition schemes)
  • sys.destination_data_spaces (useful if you want to get information about the number of potential destinations associated with a partition scheme for a given table)
  • sys.master_files (useful for enumerating all of the files that a given table will be written to and/or the types of those files – note this view returns file records for ALL databases on the server)
  • sys.database_files (functionally identical to sys.master_files except only files within the current database context can be shown)
These columns can also be joined against the backupfilegroup table in the MSDB database, using filegroup_id as the join key (useful to identify whether a given table was backed up during a given backup). 
Now… have a think about where all this could be useful.  If you were writing an script/application whose purpose was to enumerate a specified set of tables, and identify
To be continued… 

What’s Next?

Sadly, I’ve run out of time, and this post is already quite large.  In the next post, I’ll dive into some examples of how you can put some of the table storage lookups listed above to work, and why they might be important.

Useful references

Microsoft have included a good article on how to use some of the system metadata in the schema catalog views in this article.  The following links lead to the MSDN library pages for the catalog views and tables mentioned in this post.

1 comment:

  1. thanks, useful blog. Always nice to revisit sql history

    ReplyDelete

UpTweet