Orthogonal.ObjectDb

About this project
ESENT backed high-performance local object database.
Wiki / Home

Historical Note

The rising popularity of cloud storage in recent years has made the ObjectDb project redundant for a lot of common usage scenarios. It is now easier to store and retrieve arbitrary serialized objects using a variety of services available from major cloud provider companies. Microsoft provides the Azure Table API for simple scenarios, which is dirt cheap and fast with a huge capacity. An even better offering is Cosmos DB which stores objects as fully indexed documents, and there is a free tier available for lower demand usage.

However, ObjectDb may be preferable to cloud storage in situations where thousands of objects (or more) must be processed as rapidly as possible. In this situation, using an ObjectDb database on a local drive is likely to produce unbeatable performance.

Overview

The ObjectDb library implements an in-process database for storing and retrieving class objects and their property values. The library is designed to be as easy to use as possible, with a small footprint and trivial dependencies. The backing storage is the Microsoft Extensible Storage Engine (ESENT) high-performance, high-capacity, transactional database which is built into all modern versions of Windows. The ObjectDb runtime is distributable as two small managed libraries which can be added with the Orthogonal.ObjectDb.Esent NuGet package.

The following skeleton code shows how a database is opened (or created) in a folder, a new Employee object is inserted, all Employees are listed, and finally the database is disposed.

using Orthogonal.ObjectDb.Esent;
:
var db = ObjectDb.Create(@"C:\testing\Sample Database");
:
var emp = new Employee("Smith", "John", "London, UK");
db.Insert(emp);
:
foreach (emp in db.List<Employee>())
{
  Trace($"Employee Id {emp.Id} {emp.First} {emp.Surname} lives in {emp.Address}");
}
:
db.Dispose();

The library's API is small and consists of methods with obvious names like Insert, Get, Delete, List, Update, Save, etc. There are only 9 core methods and 5 utility helper methods.

When a class object is inserted into the database, the values of all public properties that are both readable and writable are extracted and stored. One or more class object properties must be specified as the primary key to uniquely identify database records (see the next section).

ObjectDb does not behave like a relational database and has no concept of a schema or relationships. It does have other interesting and useful features that make it more like a NoSQL database . Because there is no concept of a schema for the database contents, an inhomogeneous set of class objects can be saved in the same database. It's also possible to efficiently seek and list on any stored property name and an index over that property will be created on first use.

The underlying ESENT database can be up to 14TB in total size and rows can efficiently contain text or binary blobs up to 2GB in size. The ObjectDb library only uses the ESENT API functions related to indexing, effectively using the database in the style of a classical multiple-index ISAM file. The Wikipedia article on ESENT describes many of the more advanced features of the database which are not needed or used by the ObjectDb library.


Primary Keys

A class type can only be stored and retrieved from an object database if the type has one or more properties whose values can be used as a unique primary key. The primary key properties can be manually specified by applying the [DataObjectField] Attribute, otherwise the library attempts to make good guesses about which property might be suitable to use as a primary key. There are four ways the primary key can be defined for a class type.

1. Default Identity Key Property

If a class type defines an Int32 property called Id and no other properties are annotated as being the primary key, then the Id property will automatically be used as the Identity auto-increasing primary key. The caller does not need to set values in an Identity primary key property as the database automatically assigns unique increasing numeric values each time a new row is inserted.

public class Customer
{
  public int Id { get; set; }
  :
}

2. Default Primary Key Property

If a class type defines a single Int32 property not called Id and no other properties are annotated as being the primary key, then it will automatically be used as the primary key. The property is not treated like an Identity key and the caller is responsible for managing the primary key property values.

public class Employee
{
  public int Empnum { get; set; }
  :
}

3. Specified Identity Key Property

A single Int32 property can be specified as the Identity auto-increasing primary key. Annotate the property with the [DataObjectField] Attribute and specify primaryKey:true and isIdentity:true. The caller does not need to set values in an Identity primary key property as the database automatically assigns unique increasing numeric values each time a new row is inserted.

public class Customer
{
  [DataObjectField(true, true)]
  public int CustomerId { get; set; }
  :
}

4. Specified Key Properties

One or more properties can be specified as primary keys by annotating them with the [DataObjectField] attribute and specifying isPrimary:true. If multiple properties are annotated as the primary key, then they form a "compound key" of property values in the order they are defined in the class type. The caller is responsible for managing the primary key property values.

// A class with a single string property primary key
public class Country
{
  [DataObjectField(true)]
  public string Key { get; set; }
  :
}
// A class with a compound primary key made from two properties
public class Order
{
  [DataObjectField(true)]
  public string ProductKey { get; set; }
  [DataObjectField(true)]
  public byte Category { get; set; }
  :
}

Supported Classes and Types

Class types being stored in the ObjectDb database must have an empty constructor (the compiler creates one by default if no constructors are present). When objects are retrieved from the database, the empty constructor is required so that an instance of a class can be created and then have its property values filled.

The database can round-trip property values of type string, byte[], all signed or unsigned primitive numeric types, DateTime, DateTimeOffset, Guid and their Nullable<> equivalents. A user defined class type can be round-tripped if it has a TypeConverter associated with it to convert it to and from a string representation.

ObjectDb only processes an object's public properties that are both readable and writable. No recursion is made down through nested properties.

Versioning

Once a class type has been stored in the database, observe the following rules regarding the class definition or runtime errors will result.

  • Do not alter names or types of the properties used as the primary key.
  • Do not change the type of a property while leaving the name unchanged.

It is acceptable to add or remove properties so long as they are not part of the primary key.


Other Property Annotations

The [DataObjectField] attribute can be used to optionally provide extra useful information to the database about how to store property values. The attribute supports four values.

DataObjectField(primaryKey, isIdentity, isNullable, length)

  • primaryKey
    Specifies if a property is part of the primary key.
  • isIdentity
    Specifies if a property is an auto-increasing Int32 Identity primary key, when true, primaryKey must also be true.
  • isNullable
    Specifies if a property allows null values. This flag is only meaningful for the reference type properties string and byte[]. Primary key and identity properties are assumed to be not nullable.
  • length
    Specifies the expected maximum number of characters in a string property value or maximum number of bytes in a byte[] property value. The provided length value is a hint only and it does not set an actual limit in the underlying database column. The ESENT database column has unusual behaviour regarding maximum lengths, which is explained in the following section.

Maximum Length

The ESENT database has peculiar behaviour regarding the maximum length of data that can be stored in string or byte[] columns. These column types are defined as being either "small" or "large". A small string can be up to 127 Unicode characters long. A small byte[] value can be up to 255 bytes long. A large string can be up to 1GB Unicode characters long. A large byte[] value can be up to 2GB bytes long.

If the maximum length of a string or byte[] column is not defined then they are assumed to be "small". If the maximum length is defined to be any value larger than the 127 char or 255 byte capacity limit of a small column, then it will be created as a large column.

As an example, the following code defines a non-key, nullable string that is expected to have a maximum length of 1000 Unicode characters.

[DataObjectField(false, false, true, 1000)]
public string Comment { get; set; }

Since the length 1000 exceeds the "small" limit of 127 characters, the property will be backed by a "large" column that can actually be 1GB in maximum length.

Don't cause large string or byte[] columns to be created unless you are sure the extra capacity is needed. Large columns are stored in a special internal format and they can't be indexed for use by the library's list and seek methods.


Code Samples

Usage Pattern

An object database is a heavyweight object that will be opened or created on first use. The database should be opened early in application lifetime and kept open for as long as practically possible. Dispose of the database to immediately release native resources. A database is implemented as a group of files that are stored in a specified folder.

// There are 3 Create overloads
using (var db = ObjectDb.Create(@"C:\temp\myfolder"))
{
  // In the following sample code, assume that a Customer class has
  // been defined with an Id property which is the identity primary key.
}

πŸ‘“ Insert

var cust = new Customer() { Name = "ACME", Phone="555-1234", Rating = 3 };
db.Insert(cust);
Trace("New customer Id = {0}", cust.Id);

πŸ‘“ Insert Many

Due to transaction buffer limits in the underlying ESENT database there is an unpredictable limit on how many records can be inserted in one InsertMany method call. Tests indicate that 100 records of small to moderate size can be inserted in one transaction. If the limit is reached then calling code should "batch" the inserts into smaller groups.

IEnumerable<Customer> custs = GetManyCustomers();
db.InsertMany(custs);

πŸ‘“ Get by Primary Key

The following sample reads a Customer by a unique Int32 property value.

var cust = db.Get<Customer>(418);
Trace("Customer Id 418 = {0}", cust?.Name ?? "Not Found");

πŸ‘“ Get by Compound Primary Key

The following sample reads an imaginary record type by a compound primary key which is a string and Int32 pair. The parameters for the Get method are a variable length array of object values which must match the order and types of the primary key properties defined on the object type being retrieved.

var fin = db.Get<FooClass>("PROD", 521);
Trace("Read Foo = " + fin?.ToString() ?? "Not Found");

πŸ‘“ Get by Index

Indexes over non primary key properties are not unique, so the GetByIndex method returns an enumerable collection. Note that GetByIndex efficiently only returns objects where the property value exactly matches the value argument.

var rating4s = db.GetByIndex<Customer>(nameof(Customer.Rating), 4);
foreach (var cust in rating4s)
{
  Trace("{0} {1} {2}", cust.Id, cust.Name, cust.Phone);
}

πŸ‘“ Find First by Index

Find the first customer with Name "Jones". If none exist then null is returned. If more than one exist then only the first one is returned.

var cust = db.FindFirstByIndex<Customer>(nameof(Customer.Name), "Jones");
if (cust != null)
{
  Trace("The first Jones found has Id {0}", cust.Id);
}

πŸ‘“ List by Primary Key

// Listing starts at the first record (if it exists) in primary key order
foreach (var cust in db.List<Customer>())
{
  Trace("{0} {1} {2}", cust.Id, cust.Name, cust.Phone);
}

πŸ‘“ Seek and List by Primary Key

// Seeks to primary key 12 or higher and lists customer
// from that postion to the end.
foreach (var cust in db.List<Customer>(SeekType.GE, 12))
{
  Trace("{0} {1} {2}", cust.Id, cust.Name, cust.Phone);
}

πŸ‘“ List by Index

// Lists all customers in Name index sequence
foreach (var cust in db.ListByIndex<Customer>(nameof(Customer.Name)))
{
  Trace("{0} {1} {2}", cust.Id, cust.Name, cust.Phone);
}

πŸ‘“ Seek and List by Index

// Seeks to the first customer with a Name starting with M, then lists
// customers starting with M. Note that TakeWhile is more efficient
// than Where because it will stop on the first condition failure and
// not uselessly read to the end as a Where would do.
var query = db.ListByIndex<Customer>(nameof(Customer.Name), SeekType.GE, "M");
foreach (var cust in query.TakeWhile(c => c.Name.StartsWith("M")))
{
  Trace("{0} {1} {2}", cust.Id, cust.Name, cust.Phone);
}

πŸ‘“ Update

The Update method locates a existing record by its primary key and updates it. Returns true if the record was found and updated, false if the primary key was not found.

var cust = db.Get<Customer>(1072);
cust.Name = "Very Big Company";
bool updated = db.Update(cust);
Assert.IsTrue(updated);

πŸ‘“ Save

The Save method either inserts or updates a record depending upon the existence of the primary key in the database.

If the primary key is found then the existing row is updated with the properties of the object. In this case processing is identical to calling the Update method.

If the primary key is not found then the object is inserted as a new row. In this case processing is identical to calling the Insert method. Note that if the primary key is an auto-increment IDENTITY column then any provided value is ignored and an incrementing key value is automatically assigned, just as it would be by the Insert method.

var cust = new Customer { Id = 666, Name = "ACME Inc" };
bool inserted = db.Save(cust);
Print("Customer Id {0} was {1}", cust.Id, inserted ? "inserted" : "updated");

πŸ‘“ Delete

bool deleted = db.Delete<Customer>(123);
Trace("Customer Id 123 was deleted: {0}", deleted ? "Yes" : "No");

πŸ‘“ DeleteAll

bool deletedSomething = db.DeleteAll<Customer>();
Trace("Something was deleted: {0}", deletedSomething ? "Yes" : "No");
Assert.AreEqual(0, db.List<Customer>().Count());

πŸ‘“ Other

// Lists all object type names stored in the database.
string[] objnames = db.GetObjectNames();

// Lists all property names of an object type name stored in the database.
string[] propnames = db.GetPropertyNames("Customer");

// Gets the complete internal database schema as an XML element.
XElement schema = db.GetSchema();

// Export the entire database schema and data to an XML file stream.
using (var output = new FileStream("export.xml", FileMode.Create, FileAccess.Write))
{
  int count = db.Export(output);
  Log($"Exported {count} rows");
}

// Imports the entire database schema and data from an XML file stream.
using (var input = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
  int count = db.Import(input);
  Log($"Imported {count} rows");
}

Exception Handling

In normal use, catching exceptions from the ObjectDb library will not be necessary. An exception from the library indicates a severe external problem or incorrect use of the library. External errors include things like lack of file-system permissions or network errors, which are problems independent of the library. Incorrect use includes conditions like the following:

  • Processing a class Type without a primary key property.
  • Processing a class Type with an unsupported property Type.
  • Passing a null primary key property value to a Get or List method.
  • Listing using an alternate key property name that doesn't exist.
  • Attempting to Insert an object with a primary key that already exists.

Errors of these types indicate that the library is being used incorrectly or there are serious errors in the application logic. In all cases, a detailed explanatory message will be present in the exception message.


Performance

Because the ObjectDb is backed by an ESENT database, seek times are guaranteed to be very fast independent of the database size.

Unit tests show that reading a few thousand records over the primary key takes a fraction of a second. Reading a few thousand records over a property index takes about one second.

One million records can be inserted in about 20 seconds. Reading one million records over the primary key takes about 15 seconds.

Two hundred records containing binary blobs of random size 4K to 100K can be inserted in about 4 seconds. Reading all the blobs back over the primary key takes less than a second.


Notes

πŸ‘‰ Thread Safety

An object database instance is not thread safe. An exception may be thrown if multiple threads make overlapping calls to instance methods. This restriction is by design, as incorporating thread safety would unnecessarily complicate the library. Calling applications can simply use the lock statement or other synchronization techniques to ensure that only one thread is using an object database instance at any time. This library is not intended for use in high-performance multi-threaded scenarios.

πŸ‘‰ .NET Standard Library

The 4.0.0 major release of the library now references the new Microsoft.Database.ManagedEsent package which targets .NET Standard 2.0. Since Standard libraries can be consumed by all modern application types (including UWP), there is no need to maintain one ObjectDb project for the full .NET Framework and one for UWP. There is now only a single ObjectDb project that targets .NET Standard 2.0. This change has resulted in a sweeping simplification of the code and projects, and there is only the single Orthogonal.ObjectDb.Esent NuGet package.


Documentation

Reasonably complete library source code documentation generated by Sandcastle Help File Builder can be found here:

πŸ“˜ https://orthoprog.blob.core.windows.net/dochelp/objectdb/index.html


History

Author's Comments…

Many times over the last several years I wished I had a simple database for storing objects and their property values. My wish list included the following conditions for such a database: all managed code; in-process; simple public API; no installation; add-a-reference and go; no complicated dependencies; transactions not needed; no configuration.

In June 2013 I asked in the ozdotnet mailing list if anyone knew of such a database. I examined over a dozen resulting suggestions but found them all to be licensed, over-engineered, or incomprehensible. I therefore reluctantly decided to create my own simple object database. At around this time I had recently discovered the ESENT database that was built into Windows, and it provided a convenient backing storage for the database, and it satisfied all of my wish list.

ObjectDb does not attempt to compete with the features of sophisticated NoSQL or document databases like mongoDB , RavenDB , Cosmos DB and many other similar products, but it does share a lightweight and useful subset of their behaviour. It's interesting to note that RavenDb used an ESENT database as its backing storage for many years, I believe they recently replaced ESENT with their own low-level platform-independent storage engine called VORON. Blog posts by Oren Eini suggest that it took a lot of research to replace ESENT and then reach or pass its performance level.

Orthogonal Programming Link