[2019] SQL Server Health Check Checklist
Some clients want a FULL SQL Health Check, while some only want to go deep on analysis for a SINGLE database.
This is a SINGLE database health check.
Even though it’s a single db health analysis, we still check for some server/instance level items. That’s intentional.
After analysis is done and presented to you, we will go over everything on the report until you run out of questions.
All of the checks are non-intrusive, very light weight and will not affect production SQL Server. Data collection can be done during normal business operating hours.
This is about 90% of what will happen. The remaining 10% will be spent on items discovered during analysis, requiring more analysis. So we will drill into that more as we find stuff.
Here is what a database health check will (approximately) check for:
Few SERVER level checks (I know this may not make sense, but these data points will direct where problems may exist):
isolate top server waits since sql instance start
I/O latencies for all drives
write and read stalls at each db file level
are there any I/O warnings?
which db consumes most CPU, RAM and I/O (not always obvious)
how heavy is tempdb usage?
review server waits for 30-60 min (ideally during production workload)
how well do parallelism settings match workload?
look into SQL Error log, is there anything unusual?
is deadlocking happening?
DB level – architectural design overview
db filegroup and file layout. how is this utilized? are files placed on best matching drives?
find top used objects: look at the sample data, data types in use, indexes, foreign keys, etc.
check virtual log files counts (VLFs)
drive latencies and how well db files are placed across storage drives
file growth settings
how often do db files grow? what causes the delay?
db properties: multiple db options checked
I/O statistics per each file (number of reads & writes, stalls per file)
read vs write ratio at db file level (this will show if workload is read or write intensive, which changes how tuning is done)
how majority of TSQL is coming in? is it ad hoc SQL queries, or stored procs?
DB level – Top objects and performance checks
what are the tables with the most activity?
review top objects (usually up to 5-10 is sufficient) to see how well data type choices were made, indexing, constraints, triggers, statistics on those tables
review top 5-10 tables/views and see how well indexing is used (is column order OK, etc.)
review top 5-10 worst performers
what are the largest tables?
fill factor check
which TSQL runs most frequently?
which TSQL takes longest to execute?
which TSQL consumes the most CPU, RAM and I/O (disk)?
which TSQL is in cache now? anything that costs too much in resources?
which TSQL execution time elapsed has high variance (if TSQL duration is sometimes fast and sometimes very slow, this is an opportunity for tuning)
how up to date are statistics
fragmentation levels (review top objects with more attention)
are plan guides being used?
are there single-use plans for one query that are wasting memory?
are there tables without clustering keys?
is there implicit data conversion happening?
are there any foreign keys that are not trusted?
are forced join hints? in use?
are table triggers in use? if yes, review how bad.
are there UDFs in use? if yes, review how bad.
DB indexing – or often the fastest way to add speed to db performance
find missing indexes
find useless indexes (that cost more to maintain than provide value)
are there any disabled indexes?
are there any indexes that can be merged into one?
how much data is in data vs indexes?
how many indexes per each object?
are any foreign keys not indexed?
missing index warnings for cached plans
are any tables over-indexed?
most frequently modified indexes and statistics
ix compression opportunities
DB reliability:
how are db backups set up for main db?
last corruption check date
is transaction log larger than db data file?
Here are some items that go into a FULL database health check, that will not be done on a SINGLE:
skip: Win OS settings
skip: Group Policy checks
skip: SQL Service Pack and Cumulative Update
skip: sql service and configuration manager settings
skip: storage/SAN load testing
skip: SQL memory dumps
skip: SQL Instance settings
skip: Trace flag usage
skip: memory config
skip: NUMA config
skip: how SQLAgent jobs are setup?
skip: security
skip: elevated permissions
skip: excessive sysadmin privilege
skip: maintenance plans (backups, index/statistics optimization, corruption checks, cleanup)
skip: clustering setup
skip: virtualization setup checks
skip: high availability setup
skip: hardware config
skip: drive setup for speed
skip: buffer pool extension
skip: suspected/corrupted pages
skip: CPU schedulers
skip: log check (Event Viewer, cluster log, etc)
skip: number of connections to SQL Server
skip: page life expectancy
skip: db owners
skip: SQL Agent job owners
Want a sample report? Send an email using our contact us form, and say that you saw this post, and I will send you a sample report.
For fellow DBAs – what did I miss? What can be done better?
コメント