Yesterday I wrote a blog defining Data Architecture as a field of endeavour. Today’s entries is the first of several that will drill down into the individual concerns that data architecture addresses. I’m going to start with Database Infrastructure Architecture, as it probably has the fewest dependencies on other areas of data architecture.
Defining The Challenge
What is database infrastructure architecture? The short answer is that it’s all about the platform, and not about the content. Database infrastructure architecture addresses very concrete requirements around making sure that data can be accessed by the right people at the right time in a reliable manner within acceptable performance parameters. To these ends, database infrastructure requires that the BATOG principles address the following functional requirements:
- Data platform selection
- Data platform host environment selection
- Data platform capacity management
- Data platform configuration management
- Data platform support & problem management
- Data platform performance management
- Data platform security & access control
- Data platform backup & recovery
- Data platform license management
- Data platform application & integration interfaces
As discussed, this blog is mainly going to show how we address these issues from a SQL Server perspective, so we’re going to discuss these in the context of Microsoft SQL Server 2008.
Obviously, each of the bullet points expands into a sizeable body of knowledge in its own right, so today I’m only going to set the context for later posts on each topic.
Data Platform Selection
Choosing a data platform is a foundational piece in any data-related solution design. There are a number of different options for types of platforms, including “Document Databases”, marked up structured file-storage (e.g. XML), serialized binary object stores (AKA object databases). and relational database management systems – each with their own strengths and weaknesses. However, over the last 15 years the IT industry has moved far more decisively towards relational databases. This is because relational DBMS platforms tend to have a broader mix of strengths and fewer weaknesses than any of the alternatives, particularly with respect to data retrieval performance (thanks to generically functional indexing capabilities), data protection (backup and recovery) and system scalability. In a society where the net amount of data being stored grows by 15-20% per annum, scalability has become a critical factor in platform choice.
As discussed, my focus within this blog is on Microsoft SQL Server, so for my purposes, SQL Server can be considered the “Default” data platform selection, but there are still decisions to make about the editions of SQL Server to install. That choice will be driven to some degree by corporate standards, but there are other factors, such as the scalability required to meet the expected business workload, the number of SQL Server extended features required to manage and deliver the workload (e.g. Reporting Services, Integration Services, Analysis Services and other bolt-ons introduced in SQL Server 2008 R2).
Data Platform Host Environment Selection
Now that you’ve picked your database platform of choice, you need to start thinking about how it’s going to be hosted. SQL Server only runs on Windows, but that still leaves you with a number of choices. For example, are you going to build dedicated servers or run SQL Server within virtual machines to get higher utilization out of your hardware? Are you going to opt for a console-only version of the Windows Server operating system or a fully GUI enabled one? What about server operating system editions? You have price and scalability factors that need to be considered, for example, the RAM and CPU scalability supported by each edition of Windows, and your choice of SQL Server edition can also influence your operating system choices. There’s also questions about availability requirements – do you need to use failover clustering or some sort of equivalent virtualization technology to ensure high availability? This is already a lot of questions, and we haven’t even started to talk about the difference between scale-up, scale-out and logical vs. physical virtualization! [I’ll tackle those in future posts.]
Data Platform Capacity Management
Now that you’ve figured out the broad brush strokes on what you’re going to install, you’re going to have to figure out how much storage you’re going to need for your data. Here’s some things to think about. The most critical element of capacity for SQL Server is disk or SAN storage. You need enough storage to hold all the data that is currently in the database, as well as for future requirements. A starting point for this kind of estimation is to estimate the maximum expected number of rows in your tables and multiply that by the average width of data in those tables. However, storage estimation is not as simple as it seems.
Each data page in SQL Server is 8KB in size. If your rows are not exactly 8KB in size, this means there’s going to be quite a bit of dead space in your database. This is not a bad thing – the consistent page size makes tuning I/O for SQL Server quite simple, but it means that you need to factor in the white-space in your data pages when sizing the number of pages required for a given quantity of rows in that table. e.g. If you have data rows in TableX with an average of 6KB per row, and you need to store 1,000,000 rows – you’re not going to need 6KB x 1,000,000 = GB of data. You’ll actually need 8KB x 1,000,000 = 8GB of data. You also need to factor in index storage, LOB storage, full text index storage and XML storage in your space estimates. You need to size not just for current data loads, but for future volumes – and if you’re going to say “We’ll move to SAN storage only when we really need to,” you must make sure that you have the required capacity in your SAN ahead of this need. Also be aware that business needs change, and the capacity growth forecast by the business unit using your data server may well grow or shrink depending on business conditions. This has an impact on your database capacity, so you need to keep an eye on changes on the rate of change of data-server capacity over time – you can’t just trust your initial estimates and hope that they’ll stick.
Another key consideration is storage for data that sits outside the boundaries of the DBMS itself. This includes log files, historical configuration snapshots and performance monitoring/trace data, backups (an important one – don’t just plan for your full backups – also plan for your incremental and transaction log backups as well), and mirrored/log-shipped instances on “warm standby” hardware. It is not difficult to find yourself needing up to 5-6 times the amount of storage for a high availability system than is consumed by the user databases by the time you get into all of the different storage requirements mentioned above. Keep in mind that some data is more important than others, so you can run multiple storage tiers accordingly. E.g. SQLDIAG output could easily be written to a non-redundant SATA drive instead of high performance SCSI/SAS disks. On the flip-side, database backups should be written to tier 1 storage at some point to prevent data loss. This might seem like an expensive choice, but if your backups aren’t available when you need them then you’re carrying a large risk.
Of course, capacity management also needs to take into account factors such as I/O bus bandwidth, CPU load, RAM required to maintain high cache hit ratios, network bandwidth and limitations arising from low level algorithms for managing CPU instruction queues. These capacity requirements are rarely static. They will almost always have peaks and troughs that offer constraints and opportunities for savvy system designers.
Data Platform Configuration Management
One thing I’ve learnt over the years is that installing a database server is not difficult – any mug can do it, and to a larger extent than you’d realize, any mug does! But we’re talking about professional IT here, so there are expectations on database professionals that go well beyond just getting the data-server installed. The data architect needs to mandate (or at the very least, review and approve) policies for how data platforms are to be configured within the enterprise. This includes things like standardizing the collation to be installed on each server, transaction log and database backup policies, surface area configuration policies, disk layout standards, monitoring/pulse check configuration, standardized patch levels, consistent SQL Server install/database/log/backup directories, and all manner of other SQL Server configuration best practices. The single most important design factor here is that the configuration of every server must be recoverable in case of emergency situations.
Not only do these policies and procedures need to be created, but they also need to maintained, versioned, and applied as necessary to non-conformant systems. SQL Server provides some great tools to assist with this – especially the Policy Based Management feature shipped with SQL Server 2008 – but there are other options out there for doing the same thing. You can use System Centre to apply configuration policies to a named set of servers, and to monitor the SQL Server environment. The SQLH2 tool can be used to perform standardized monitoring of SQL Server instances. Additionally, you also have 3rd party options such as CA’s Unicenter and HP’s SA+OO (system automation + operations orchestration) toolsets to coordinate these kinds of tasks. Standards are good, but tools that apply standards for you are even better!
Data Platform Support & Problem Management
If you’ve been through all the steps above, you probably have a database or two installed by now, and you now need to start thinking about who’s going to look after it. This is where we start asking whether we need a dedicated Database Administrator (DBA) with skills in the platform we’re using (SQL Server in my case) or whether a DBA from a different platform (e.g. Oracle, DB2) can afford the time to get up to speed with a new DBMS – or possibly whether a Windows administrator can deliver the necessary level of support. In small office environments you quite often have a single IT person supporting all of the infrastructure technologies in use, but in large office environments a more tightly specialized workforce is often hired.
Having figured out who’s going to support your data platform, you now need to make sure they know what they’re doing. Have they passed any SQL Server certifications? Are they aware of the Sarbanes Oxley (SOX), ISO 900X and ITIL standards? Do they have all the tools required to support the environment effectively? Do you trust them to get the system back up and running when things go pear-shaped? Or is your system sufficiently non-critical that you can afford to be off the air for a couple of days while a consultant is engaged to help with recovery? Do you need to get training provided for the person you’ve hired? Do they need some non-delivery time to focus on ongoing learning? These are all important factors to consider when defining the human side of data platform support.
When things go wrong, do you have a process for how to address common issues? Does your DBA know how to get a database back online in emergency mode to allow data recovery to occur? Is a disaster recovery (DR) environment available to get users running while you fix the production environment? Do you have procedures in place on how to failover to DR and back to production? Have they been tested? What are the organizational processes that come into play when the DBA identifies a problem with the system? Are you following ITIL? Do you have a problem management board in place to communicating problems to the business and giving the techos enough space to stay focused on the problem?
As you can see… this area of data infrastructure architecture requires careful thought and consideration, as it needs ensure that the costs of preventing or minimizing the impact of system outages do not outweigh the business impact of having the system offline. The architect needs to have enough awareness of the cashflow impacts of outages to make the best choices on their clients’ behalf.
To Be Continued…
I’ve covered a lot here so far, but I need a break from typing and by now I’m sure any readers of this entry will also need a break from reading! I’ll continue with another instalment in the next 24 hours or so.