Modelling dynamic attributes - Blueprints for the Entity-attribute-value model (EAV)

With a growing customer base, your customers often want to have additional input fields in your application. This might be an additional field for an employee number in an HR application, customer number in an CRM app or just a simple Notes field. From a data model perspective, your employee or customer entity would be extended with an additional attribute like employee_number or customer_number of a given data type.

The obvious solution for this would be simply adding a new column to your database table and extending the frontend with a new input field. By using this approach you will be confronted with the following issues:

  1. Depending upon your application's distribution method (on-premises with or without multitenancy, SaaS) each of your customers sees automatically the new attribute in the user interface. If you decide that the new attribute is relevant for all of your customers, this is fine. But if only a special customer needs this attribute, other customers might get confused over this new attribute. If you are familiar with https://www.atlassian.com/software/jira, you might have already experienced wrongly configured screens with input fields which are not relevant for your use case.
  2. Introducing a new column means, you have to create and apply a database migration and rollout a new version of the frontend and backend. Doing so is totally fine. But after adding the second additional attribute you might think that the effort for implementing a new field without any logic is not worth the time.
  3. That being said, a customer might want to add a new attribute or a number of attributes. He or she might not be willed to pay for just a new field without any application logic behind it.

Plug-ins and additional fields

If your application provides a plug-in architecture or a public API through e.g. HTTP(S), you have to think about third party vendors. Often, vendors want to keep track of additional information. This information can be either stored in an external database of the vendor or directly in your application's database.

Depending on your applications architecture, you can track those additional information in three ways:

  1. If your application provides a plug-in architecture with direct database access, vendors must store their information in additional database tables. Vendors should never change the core database tables of your application, because of security, reporting and other reasons. Without investing heavily, like relying completely on ORMs and/or inspecting every SQL statement, you can't prevent those changes.
  2. Each of database tables has an additional column in which structured information can be stored in a flattened way. This could be a column of type json or jsonb or a text type, holding an XML structure.
  3. Your application already provides a generic database model for storing additional information. You are doing this by adding the Entity-attribute-value model to your database schema.

Meet the Entity-attribute-value (EAV) model

The question is now: How can you easily store values of additional attributes in your database without modifying the database schema? You won't be surprised, because the answer is: Use the Entity-attribute-value (EAV) model.

The EAV model is a data modelling blueprint. It allows the implemented application or database to easily store additional information. Those additional information are also known as metadata.

Keyword Description Example
Entity The object (or row) you want to enrich with metadata Employee 21, Customer 55
Attribute The additional field you want to attach to the data model of the entity employee_number, customer_number
Value The value your additional field contains emp_id:55, 555

Wikipedia has an [https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model](excellent article) with in-depth information about the Entity-attribute-value model, which I like to recommend. But for this guideline, we take a look at practical examples.

What information must be stored?

Before implementing the EAV model, you should answer yourself the following questions:

  • Do I want to enforce the datatype of the attribute in the database model itself or should my application validate the data and its type?
  • Do I want to allow nullable attributes or ranges of values?
  • Do I need a grouping of attributes? This might be something like a group Additional information with the attributes Note and Description
  • Do I need sharing of groups between entity types? This might be automatically sharing the group Additional information between Employees and Customers.
  • Do I have a multitenancy application in which my users are able to define their own attributes, visible only in their own context?
  • Do I want to allow additional entity types defined? There might be your entity types Employee and Customer but also an entity type Project, created by one of your third party vendors.
  • Do I need to store the order of additional fields of an entity?

Simple EAV model with referential integrity

If you need to store only metadata for one database table, you want to go with this schema:

Simple EAV model with referential integrity

Table entity {
  id integer [not null, unique]
  name varchar(255) [not null]
}

Table entity_meta {
  id integer [not null, unique]
  entity_id integer [not null, unique, ref: > entity.id]
  key varchar(255) [not null]
  value text
}

The table name entity could be your customer or employee table.

Pros:

  • Easy to implement
  • Referential integrity is available

Cons:

  • Duplicate attribute keys
  • Only one entity type is attachable with metadata. Other entity types need additional _meta tables.

The schema itself is probably the most deployed in the world on webservers, looking at the distribution of WordPress. Because WordPress uses the very approach in their database schema:

Original database model used in WordPress

Table wp_posts {
  id integer [primary key, not null, unique]
  // foreign keys ...
  post_content text
  post_title text
  // additional attributes
}

Table wp_postmeta {
  meta_id integer [primary key, not null, unique]
  post_id integer [not null, unique, ref: > wp_posts.id]
  meta_key varchar(255) [not null]
  meta_value text
}

EAV for all entity types

With larger applications, you also want to attach metadata to more than one entity type.

EAV model with one table for all entity types

Table entity_meta {
  id integer [not null, unique]
  // this could also be an enum
  entity_type varchar(255) [not null] 
  entity_id integer [not null]
  key varchar(255) [not null]
  value text
}

Depending upon your requirements, the column entity_type could be an enum, containing only types of your core application. Vendors would not be able to use this approach for their own metadata.

Pros:

  • Easy to implement
  • One table to rule them all

Cons:

  • Duplicate attribute keys
  • No referential integrity. You have to enforce it either in your application logic or through CHECK CONSTRAINTs

Restricting attributes

One important aspect is, which attributes can be added and who is allowed to do so.

EAV with defined attributes

For restricting the usage of only specific attributes, we can introduce an attribute table. Users can only apply additional fields of the given attributes. This approach is nice if you want to introduce new fields during runtime for all users:

EAV model with dedicated attribute table

Table attribute {
  id integer [not null, unique]
  // for internal usage
  key varchar(255) [not null, unique]
  // public attribute name
  name varchar(255) [not null]
}

Table attribute_in_entity {
  id integer [not null, unique]
  attribute_id integer [not null, unique, ref: > attribute.id]
  // no foreign key possible
  entity_id integer [not null]
  // this could also be an enum
  entity_type varchar(255) [not null] 
  value text
}

Pros:

  • No duplicate attribute keys
  • Performant grouping of the same attribute over different entity types is possible

Cons:

  • No referential integrity. You have to enforce it either in your application logic or through CHECK CONSTRAINTs

EAV with assigned attributes

Let's assume you want to allow specific attributes only for a given set of entity types and users can also make their own attribute fields.

EAV model with attributes assigned to specific entity types

// this table stores your customer/tenant
Table tenant {
  id integer [not null, unique]
  name varchar(255) [not null]
}

Table attribute {
  id integer [not null, unique]
  // for internal usage
  key varchar(255) [not null]
  // public attribute name
  name varchar(255) [not null]
  // if this foreign key is NULL than it is applied for all tenants, otherwise only for the tenant with the given ID
  tenant_id integer [null, ref: > tenant.id]
  // add a UNIQUE CONSTRAINT (key, tenant_id) to make sure that the key is only allowed once per customer and global; an additional CONSTRAINT has to be a added if a global key does already exist with this name
}

// assign attributes for a given entity type
Table attribute_in_entity_type {
  id integer [not null, unique]
  // this could also be an enum
  entity_type varchar(255) [not null] 
  attribute_id integer [not null, unique, ref: > attribute.id]
  // add a UNIQUE CONSTRAINT(entity_type, attribute_id)
}

Table metadata {
  id integer [not null, unique]
  value text
  // no foreign key possible
  attribute_in_entity_type_id integer [not null, ref: > attribute_in_entity_type.id]
  // weak reference to the  
  entity_id integer [not null]
}

Pros:

  • No duplicate attribute keys
  • Performant grouping of the same attribute over different entity types is possible
  • Assignment of attributes only allowed for a given type
  • Customers can add their own attributes

Cons:

  • Higher complexity
  • Lookup for allowed attributes required
  • No referential integrity. You have to enforce it either in your application logic or through CHECK CONSTRAINTs

Type safety of metadata

The previous examples allowed you to store any type of data in the value column. The upper layers of your application had to deal with type safety.

Ensure type safety with multiple value columns

One way to ensure some sort of type safety in your database can be accomplished by adding an additional column for each type you want to handle.

EAV model with additional data type columns

enum data_type {
  string
  number
  text
  datetime
  boolean
}

Table entity_meta {
  id integer [not null, unique]
  // this could also be an enum
  entity_type varchar(255) [not null] 
  entity_id integer [not null]
  key varchar(255) [not null]
  // data type for this metadata
  type data_type [not null]
  value_string varchar(255) [null]
  value_number decimal(10,2) [null]
  value_text text [null]
  value_datetime datetime [null]
  value_boolean boolean [null]
  // add additional holder columns as you like
  // specify, if the designated value_* column may be null; either your upper layers have to check this or you have to enforce it with a CHECK CONSTRAINT
  is_nullable boolean [not null, default:`true`]
}

The column data_type decides, of which data type the metadata is. This is required, because we can't derive the datatype if is_nullable is true and each of the value_* columns contain a NULL value.

[https://developer.atlassian.com/server/jira/platform/database-custom-fields/](Jira uses this very model) to store the value of custom fields in their database. This is the database schema for the Jira table https://confluence.atlassian.com/display/JIRA041/Database+Schema:

Diagram

Table issue {
  ID decimal(18,0) [not null, primary key]
  // ...
}

Table customfield {
  ID decimal(18,0) [not null, primary key]
  // ...
}

Table customfieldvalue {
  ID decimal(18,0) [not null, primary key]
  ISSUE decimal(18,0) [null, ref: > issue.ID]
  CUSTOMFIELD decimal(18,0) [null, ref: > customfield.ID]
  PARENTKEY varchar(255) [null]
  STRINGVALUE varchar(255) [null]
  NUMBERVALUE decimal(18,6) [null]
  TEXTVALUE longtext [null]
  DATEVALUE datetime [null]
  VALUETYPE varchar(255) [null]
}

Ensure type safety with dedicated tables

Another option to ensure type safety is to add a table for each data type you want to deal with.

EAV model with dedicated data type tables like Magento

Table attribute {
  id integer [not null, unique]
  // for internal usage
  key varchar(255) [not null, unique]
  // public attribute name
  name varchar(255) [not null]
}

Table customer {
  id integer [not null, unique]
  // for convenience
}

Table eav_entity {
  id integer [not null, unique]
  // no foreign key possible
  entity_id integer [not null]
  // this could also be an enum
  entity_type varchar(255) [not null] 
}

Table eav_value_datetime {
  attribute_id integer [not null, unique, ref: > attribute.id]
  eav_entity_id integer [not null, ref: > eav_entity.id]
  // a null value would not make sense; if for an entity a row of an attribute does not exist, it is assumed implicitly as NULL
  value datetime [not null]
}

Table eav_value_decimal {
  attribute_id integer [not null, unique, ref: > attribute.id]
  eav_entity_id integer [not null, ref: > eav_entity.id]
  // a null value would not make sense; if for an entity a row of an attribute does not exist, it is assumed implicitly as NULL
  value decimal(12,4) [not null]
}

Table eav_value_int {
  attribute_id integer [not null, unique, ref: > attribute.id]
  eav_entity_id integer [not null, ref: > eav_entity.id]
  // a null value would not make sense; if for an entity a row of an attribute does not exist, it is assumed implicitly as NULL
  value int(11) [not null]
}

Table eav_value_text {
  attribute_id integer [not null, unique, ref: > attribute.id]
  eav_entity_id integer [not null, ref: > eav_entity.id]
  // a null value would not make sense; if for an entity a row of an attribute does not exist, it is assumed implicitly as NULL
  value text [not null]
}

Table eav_value_varchar {
  attribute_id integer [not null, unique, ref: > attribute.id]
  eav_entity_id integer [not null, ref: > eav_entity.id]
  // a null value would not make sense; if for an entity a row of an attribute does not exist, it is assumed implicitly as NULL
  value varchar(255) [not null]
}

The popular e-commerce platform https://magento.com/ uses such a model for storing metadata with type safety. You can take a look at into in [https://inchoo.net/wp-content/uploads/2010/09/MAGENTO_v1.3.2.4-Database_Diagram.pdf](this PDF)

Disadvantages of the EAV model

Apart from the Simple EAV model with referential integrity blueprint, each of the models has the fundamental problem of missing referential integrity. Your other application layers have to make sure that the referential integrity can be assured. This should be done

  • by adding additional CHECK CONSTRAINTs when adding and deleting rows
  • and in your application's service layer.

In addition to that, all metadata values have to be loaded by using JOIN clauses. You should carefully look when you return the value of the metadata.

Wrapping it up

With the help of the Entity-attribute-value model you can provide a way to store additional metadata. It makes it much easier for customers and vendors to extend your application with own attributes. You can also reduce the time to add attributes for all of your entities.

After you have read this guideline, you should be able to pick a blueprint which fits your current requirements the best.