Search This Blog

January 15, 2010

A Brief Diversion on Database Design

My promised follow-up on the sys.tables post is still in the skunk works.  I got distracted by an interesting post by Buck Woody of Microsoft over on his Blog on SQLBlog.com.  The questions he posed are as follows:

1. What process do you follow?

2. How important are the business requirements?

3. What tool do you use to create the design, do you need it to diagram, do you even care about diagrams?

4. What’s your biggest pain-point about designing? Not with SQL Server, mind you, just the designing part.

If you’re a DBA, Data Architect or Data Modeller, feel free to head over to Buck’s blog and post your own response!

Given that my current blogging focus here is on metadata and data modelling for SQL Server, I thought I’d share my response.  Without further ado, here it is! 


1) I'm pretty much a traditionalist when it comes to DB Modelling.  I'm sure these approaches will be pretty familiar to you.

My DB Design work routinely starts with a big (A3 or bigger) piece of paper or a whiteboard and a domain map - for example, in a traditional order processing system I'd have a big box for Sales, another box for Products, another for Order Fulfilment, one for Employees, one for Suppliers, etc.

I then put boxes inside (or spanning) those regions, starting at the intersects with bridging entities e.g. Orders bridges Order Fulfilment and Sales, Products connects to Orders, Suppliers connects to Products, Employee Sales, etc.  I'll then work back towards the centre of each domain box with more specific entities (e.g. Order Details, Product Options, Employee Type, Sales Channel Type, etc.

Once I've sketched out the top level entities (Len Silverston's reference models are really handy for this) I then sit down with the clients/end-users and attack the model.  What bits have I missed?  What bits don't you need?  What are the business processes that are supported by each of these subject areas?  That usually leads to more entities, types, sub-types, etc.. and tends to surface parent-child hierarchies.

With the extra detail about the entities, I'll then start adding some meat to the bones.  What do you need to know about an order?  What information about an order do you need to fulfill it correctly?  What are the decision points around order processing?  Likewise for the other core entities identified.

By the time I'm done with this, my model will typically be in ERwin or Visio.  I usually have a pretty good handle on cardinality, required business information fields (e.g. Names, phone numbers, order quantities, etc), business metadata fields required to support business automation (e.g. order IDs, invoice numbers, payment IDs).  At this point I'll do a normalization and refactoring sweep to do things like convert subtypes to type detail tables with a type-parent intersect table.

Next step is adding unique keys.  I'll probably already have an Int/Decimal(N, 0) of some description doing duty as a surrogate key which also has a physical PK constraint defined on it.  I'll define unique constraints against natural keys.  I'll also start thinking about data domains (check constraints, defaults, rules, etc) and start adding additional "state" metadata - e.g. IsCourierDeliveryOnly on Product, Is401KEligible on Employee, etc.  I'll also have data-types on pretty much all of the fields in the model.

At this point, I'll put the model back in front of the stakeholders and do a walkthrough.  I'll correct clarified requirements, add new columns/constraints, refactor where necessary and make sure that all auditing requirements are clearly captured.  This usually gives me enough to freeze the model at a point where I can hand it to a dev team to start coding against it.

This is where I'll do a cut to SQL Files, import it into VSTS Dev/DB edition and start working on code-gen for standard CRUD procs, triggers and functions.  I'll keep tinkering with the model - adding metadata to support the generation of CRUD procs/functions/triggers (e.g. extended properties, etc), and move from managing the model as a single artefact to managing individual physical objects, synching the changes through source control on a per-item basis.

2) Business requirements are critical, but you can't do it all at once.  In my experience, end-users and stakeholders typically don't understand enough about their processes to come up with all of the required information at once.  I'm convinced you need at least 2, and up to as many as 5 detailed passes (if it's a big app) on the model to get to a point where the model stands up to whatever the devs and users throw at it.

3) Not only will I put the model into a diagram, but I'll try to make the diagram as readable as possible by eliminating crossing relationship lines where possible.  Readability is really important, because when a business user comes to me and says "Did we capture XYZ" I can find the entities concerned and trace their relationships quickly without having to navigate a huge snarl of overlapping relationship lines.  As discussed, ERwin and Visio are tools of choice - but if you have ANY pull with the Visio people, can you please get them to unseal the DB modelling classes so we can do automation against them in the same way as we can do in ERwin.

4. Biggest pain points? 

A) The lack of policy based design in most modelling tools - e.g. It would be fantastic to have Visio not only telling me that I have a subtype relationship that needs refactoring, but automatically refactoring it for me when I move to physical model mode using a configurable refactoring policy for which I can control the templates. 

B) The lack of XMI formatted export capabilities in most modelling tools.  From memory, it was possible to export SQL Server 2000 Metadata Services models to XMI, but since that product has been deprecated, we're stuck with forward engineering to SQL and having to export XSD's via "SELECT ... FOR XML SCHEMA". 

If the "M" language in Oslo fixes these issues, then I'll be thrilled.  As a data architect, I'm excited by what SQL Server Metadata Services offers, not only for SQL Server data design, but for pushing data designs deeper into the solution stack - e.g. generating biztalk schemas and XSL transformation defintions for common transactions based on updateable views, generating WSDL method signatures for those XSDs, generating data-grid controls from those views, generating data access layers from SQL (or model) metadata, etc.


Stay tuned for more on table metadata in the next day or so.

 

No comments:

Post a Comment

There was an error in this gadget