About this project

ARCHIVED -- The final collections database suite that used SQL Server. The repository remains here as a historical reference.

Collections Database Suite History

Nimbus $\rightarrow$ Agate $\rightarrow$ Hoarder (current)

Overview

Agate is the latest version of the application suite that manages the Nancy Street collections database. This project has been evolving for 23 years and has used many technologies, languages and tools during its long and slow evolution. Previous projects have used Excel , Windows Forms , SQLite , Azure Table Storage , SQL Server Express , Entity Framework , netTiers , ASP.NET Web Forms , Silverlight , SOAP , WCF and WPF .

The fact that previous incarnations of this project have taken so many forms and used so many different technologies is a testament to how software development has changed over the last 20 years. Many of the tools and technologies have turned out to be evolutionary dead ends (Windows Forms, Silverlight, netTiers); some were too clumsy and inflexible (SQLite); some were inappropriate (Azure Tables); some were tedious and difficult to use (ASP.NET , WCF). After many years of journeys and coding experiments, the current version of the application has returned somewhat to well-known and familiar tools. The latest incarnation of the project is an application suite called Agate which is based upon SQL Server and Entity Framework 6 as the database storage layer and WPF as the presentation layer. Although the current application suite is basically two-tiered, it can easily be expanded to include a service over the database to support web or mobile clients.

Database History

Several years ago I had a web page containing a simple overview of relational database normalisation and how I had applied the principles to my collections database to reduce it to Third Normal Form (3NF) with the structure seen in the diagram. That page was surprisingly popular and produced a lot of feedback from students who stumbled upon the page by accident and said it had helped their understanding of the subject. Getting any database to 3NF is an important skill, because as the Wikipedia article says:

Informally, a relational database table is often described as "normalized" if it meets Third Normal Form. Most 3NF tables are free of insertion, update, and deletion anomalies.

The Wikipedia articles describe normalisation in a lot more detail, but this quote nicely summarises the reason you should get your data to 3NF. When you get to 3NF there are no duplicated keys, all data depends on its key and CRUD operations do not cause side-effects over multiple tables. There is a kind of mathematical elegance and neatness about data in 3NF.

The following sample table shows how the collection database data looked when it was first entered as 'flat' data into an Excel spreadsheet back in 1993.

Flat Table

For speed of bulk data entry during many man-hours of typing there was no normalisation and all data rows were filled with possibly duplicated data. Excel saved keystrokes thanks to its early form of intellisense which predicted cell contents. Once bulk data entry was complete, normalisation began into Access 2.0 tables. After identifying important key data such as Owner, Media and Location I used some VBA scripts to import the Excel sheet and normalise the data into multiple tables which looked something like the following diagram.

Access 2.0 Diagram

All tables have been given an IDENTITY (auto-increment) primary key called Id. Unfortunately, none of the collections tables have a column of data with values suitable for use as a "natural" primary key. User initials, catalogue numbers, titles are all unsuitable for use as primary keys. IDENTITY columns are frowned upon by some database purists, but they are integrated into SQL Server and very easy to use from .NET applications.

The Owners, Locations and Media types in the flat table have been normalised into their own tables. However, the "Titles" in the vintage flat table have not simply been turned into a single corresponding "Titles" table, they were normalised into a pair of tables called Volume and Instance. This was done to solve the particular problem of having multiple copies of an item in the collection. Different copies (Instances) of the same item (Volume) may have different owners, storage locations, prices, purchase dates or comments.

Volume Table
Each row represents a unique item in the world, such as an audio CD released by a specific artist and company, or a specific publication of a book. One Volume may contain multiple physical parts, such as a boxed-set of CDs for example, but it is still regarded as a single Volume. A row in this table doesn't mean that the item physically exists in the collection, it only describes the item.
Instance Table
Each row represents a physical copy of a Volume. If for example there are two copies of the same video DVD in the collection purchased at different times with different prices, then the DVD is described by a single Volume row, but the two physical copies are described by two corresponding Instance rows.

Modern Database

This simple structure of 5 basic tables used in the original 1993 vintage database has persisted through to the present day. Extra tables have been added over the years to support audio tracks, files, genres, pictures, names and tags, but the basic concept and structure is unchanged. The following diagram was exported from the Entity Framework 6 designer and colour annotated to show the important groups of joined tables in the database.

Click the thumbnail to open the full-size database diagram (1723 x 1853, 212 KB).

The yellow tables are the Instance, Owner and Location as they existed in the vintage database.

The green tables are the Volume and Media of the vintage database, but two new tables File and Track have been joined to the Volume to describe the contents of data and audio discs respectively. Because a Volume like a boxed-set may contain multiple parts, the File and Track rows have fields that can identify track 3 on disc 2 of a set of audio discs for example.

The orange tables were added in the late 90s to associate extra important information with Volumes, such as names, genres, tags and pictures. The Volume and the orange table pairs all participate in classical one-many-one joins that allow, for example, one Volume to have many genres joined to it. The following diagram extract shows how a VolumeGenre join table is used to implement this type of one-many-one join.

One-Many-One

The same one-many-one join technique is repeated for 4 tables. The Name table join is a slightly more complicated because it joins the Volume table to both the Name and Role tables, which joins {Name,Role} pairs to a Volume. This forces each Name to have a Role and it allows, for example, Alfred Hitchcock to be joined twice to the same movie, first in the director role, second as an actor role.

Notice that the middle join tables have an IsPrimary bit column which can specify which Name is the 'primary' one. This allows, for example, a movie to have dozens of people joined to it in different roles, but you may want to specify that the director's name is the primary one.

The pink tables were added in the mid 2000s to associate extra information specifically with the Picture table. They are independent of the other tables and are used to assign "types" to pictures (front cover, performer, etc) and another one-many-one join is used to describe the subjects in the pictures.