This guide provides a set of best practices that will help you with the task of
documenting a SAS data set. These best practices were established by the
developers at Adivo who have extensive
experience in technical documentation, database design and development.
Start With a Good Database Design
Good database design can make documenting your SAS data set a much easier task.
Although it might not always be an option, if you have any influence over the
database design, here are some key design areas that you should look out for:
- Naming conventions - The lack of naming conventions can result in very
inconsistent and cryptic names for the tables, columns and other database
objects making it more difficult to document them properly. Applying good
naming conventions to the database design will promote better readability of the
- Primary keys - While every table should have a primary key that defines
which columns uniquely identify a record, most databases do not require them.
Not only does this affect database performance and data integrity, knowing what
the primary key is for a table is key to understanding the data model and an
essential part of the database documentation.
- Table relationships - Another key to understanding the data model is
documenting the relationships between tables. And just like primary keys,
they are not required to be defined.
And even if you do not have any control over the database design, a quick review
will help you better estimate how much time it will take and set expectations
for the completion of the documentation.
Embed Comments inside Your SAS Data Set
SAS supports the LABEL keyword for adding comments to tables and
columns. These comments are stored as metadata
within the database definition, making them easy to maintain as the database
structure changes over time.
By embedding comments inside your SAS data set, you create a
"self-documenting" database that will be easier for DBAs
and database programmers to understand.
For more information on embedding comments see
Documenting SAS Data Sets using SAS Workbench.
Comment What You Can and Annotate Everything Else
For SAS data sets, you will want to describe the following database objects:
- Tables - Most tables are typically used to store logical representations
of real world objects. For these types of tables you should
describe what that real world object is. Your database may also include
tables, sometimes referred to as "link tables" that are used to create
many-to-many relationships between tables. For these types of tables, you
should describe the purpose of the relationship.
- Columns - As tables typically represent real world objects, the columns
within those tables are used to store properties related to those objects.
Column descriptions should include what those properties are and how the column
The main purpose of the comments that you embed in your SAS data set should be
to provide descriptions for these database objects. These descriptions should be concise and
consist of one or two sentences. If there is additional information that
is important to document, you should consider structuring your comments into two
parts: a summary description followed by a remarks section. The remarks
section can be used to include the additional information.
Embedding comments in your SAS data set using the LABEL keyword does have some limitations. These include:
- SAS labels can be no longer than 40 characters.
- SAS labels are expected to be very short descriptions as the are used as column headings for reports.
To get around these limitations,
you can use a tool like TechWriter
to annotate SAS data sets with external comments that can be merged together with
the embedded comments to produce the complete documentation.
Use a Documentation Generator
Having a self-documented SAS data set is helpful for database design and maintenance,
but there are many instances where you are going to need actual database
documentation. Here are a few examples:
- System Documentation - If you are creating a SAS data set for a client, a
complete set of system documentation is typically a required deliverable.
- End-User Documentation - While your DBA and database programmer may have
full access to the database definition, you may allow some users restricted access
that only allows them to execute database queries. These users typically
will not have the access or the skills necessary to read and understand a database
definition. You will need to provide them with some form of documentation that
will help them create their database queries.
- Design Reviews - Not all participants in a design review are going to
have the same level of SAS knowledge and expertise. Your subject
matter experts may not be technical at all. To facilitate such a design
review, you will need to provide documentation that anybody can read and
Fortunately there are tools like TechWriter
that can be used to generate the database documentation automatically, leveraging the database definition
and the comments you've embedded.
Make Documentation Part of Your Development Process
If your project is using automated build tools like Ant, NAnt and
MSBuild, you can make generating the documentation part of the automated build
process for the system. By having the documentation as up-to-date as
the system itself, your project will benefit in the following areas:
- Knowledge Capture - While some development methodologies such as Agile
ascribe to a "document late" philosophy, projects can suffer greatly when they
lose a team member. The knowledge of a system literally walks out the
door. Documenting as you go minimizes the impact of losing a team member
and makes it easier for new team members to ramp up.
- Communication - Documentation can help communication among project team
members. The project manager can better assess the status of the project.
Subject matter experts can verify that developers are on the right track.
And mistakes can be identified earlier in the project life-cycle.
- Documentation Quality - Scrambling at the end of the project to throw the
documentation together for the final deliverable can result in the database not
being documented properly, if documented at all.
By making documentation part of your development process you will ensure
that the documentation is always in sync with your SAS data set, even in extreme
programming environments where the database is constantly changing.
For more information please visit: