With a growing customer base, your customers often want to have additional input fields in your application. This might be an additional field for an employee number in an HR application, customer number in an CRM app or just a simple Notes field. From a data model perspective, your employee or customer entity would be extended with an additional attribute like employee_number or customer_number of a given data type.
The obvious solution for this would be simply adding a new column to your database table and extending the frontend with a new input field. By using this approach you will be confronted with the following issues:
If your application provides a plug-in architecture or a public API through e.g. HTTP(S), you have to think about third party vendors. Often, vendors want to keep track of additional information. This information can be either stored in an external database of the vendor or directly in your application's database.
Depending on your applications architecture, you can track those additional information in three ways:
The question is now: How can you easily store values of additional attributes in your database without modifying the database schema? You won't be surprised, because the answer is: Use the Entity-attribute-value (EAV) model.
The EAV model is a data modelling blueprint. It allows the implemented application or database to easily store additional information. Those additional information are also known as metadata.
Keyword | Description | Example |
---|---|---|
Entity | The object (or row) you want to enrich with metadata | Employee 21, Customer 55 |
Attribute | The additional field you want to attach to the data model of the entity | employee_number, customer_number |
Value | The value your additional field contains | emp_id:55, 555 |
Wikipedia has an [https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model](excellent article) with in-depth information about the Entity-attribute-value model, which I like to recommend. But for this guideline, we take a look at practical examples.
Before implementing the EAV model, you should answer yourself the following questions:
If you need to store only metadata for one database table, you want to go with this schema:
Table entity {
id integer [not null, unique]
name varchar(255) [not null]
}
Table entity_meta {
id integer [not null, unique]
entity_id integer [not null, unique, ref: > entity.id]
key varchar(255) [not null]
value text
}
The table name entity
could be your customer
or employee
table.
Pros:
Cons:
The schema itself is probably the most deployed in the world on webservers, looking at the distribution of WordPress. Because WordPress uses the very approach in their database schema:
Table wp_posts {
id integer [primary key, not null, unique]
// foreign keys ...
post_content text
post_title text
// additional attributes
}
Table wp_postmeta {
meta_id integer [primary key, not null, unique]
post_id integer [not null, unique, ref: > wp_posts.id]
meta_key varchar(255) [not null]
meta_value text
}
With larger applications, you also want to attach metadata to more than one entity type.
Table entity_meta {
id integer [not null, unique]
// this could also be an enum
entity_type varchar(255) [not null]
entity_id integer [not null]
key varchar(255) [not null]
value text
}
Depending upon your requirements, the column entity_type
could be an enum, containing only types of your core application. Vendors would not be able to use this approach for their own metadata.
Pros:
Cons:
One important aspect is, which attributes can be added and who is allowed to do so.
For restricting the usage of only specific attributes, we can introduce an attribute
table. Users can only apply additional fields of the given attributes. This approach is nice if you want to introduce new fields during runtime for all users:
Table attribute {
id integer [not null, unique]
// for internal usage
key varchar(255) [not null, unique]
// public attribute name
name varchar(255) [not null]
}
Table attribute_in_entity {
id integer [not null, unique]
attribute_id integer [not null, unique, ref: > attribute.id]
// no foreign key possible
entity_id integer [not null]
// this could also be an enum
entity_type varchar(255) [not null]
value text
}
Pros:
Cons:
Let's assume you want to allow specific attributes only for a given set of entity types and users can also make their own attribute fields.
// this table stores your customer/tenant
Table tenant {
id integer [not null, unique]
name varchar(255) [not null]
}
Table attribute {
id integer [not null, unique]
// for internal usage
key varchar(255) [not null]
// public attribute name
name varchar(255) [not null]
// if this foreign key is NULL than it is applied for all tenants, otherwise only for the tenant with the given ID
tenant_id integer [null, ref: > tenant.id]
// add a UNIQUE CONSTRAINT (key, tenant_id) to make sure that the key is only allowed once per customer and global; an additional CONSTRAINT has to be a added if a global key does already exist with this name
}
// assign attributes for a given entity type
Table attribute_in_entity_type {
id integer [not null, unique]
// this could also be an enum
entity_type varchar(255) [not null]
attribute_id integer [not null, unique, ref: > attribute.id]
// add a UNIQUE CONSTRAINT(entity_type, attribute_id)
}
Table metadata {
id integer [not null, unique]
value text
// no foreign key possible
attribute_in_entity_type_id integer [not null, ref: > attribute_in_entity_type.id]
// weak reference to the
entity_id integer [not null]
}
Pros:
Cons:
The previous examples allowed you to store any type of data in the value
column. The upper layers of your application had to deal with type safety.
One way to ensure some sort of type safety in your database can be accomplished by adding an additional column for each type you want to handle.
enum data_type {
string
number
text
datetime
boolean
}
Table entity_meta {
id integer [not null, unique]
// this could also be an enum
entity_type varchar(255) [not null]
entity_id integer [not null]
key varchar(255) [not null]
// data type for this metadata
type data_type [not null]
value_string varchar(255) [null]
value_number decimal(10,2) [null]
value_text text [null]
value_datetime datetime [null]
value_boolean boolean [null]
// add additional holder columns as you like
// specify, if the designated value_* column may be null; either your upper layers have to check this or you have to enforce it with a CHECK CONSTRAINT
is_nullable boolean [not null, default:`true`]
}
The column data_type
decides, of which data type the metadata is. This is required, because we can't derive the datatype if is_nullable
is true
and each of the value_*
columns contain a NULL value.
[https://developer.atlassian.com/server/jira/platform/database-custom-fields/](Jira uses this very model) to store the value of custom fields in their database. This is the database schema for the Jira table https://confluence.atlassian.com/display/JIRA041/Database+Schema:
Table issue {
ID decimal(18,0) [not null, primary key]
// ...
}
Table customfield {
ID decimal(18,0) [not null, primary key]
// ...
}
Table customfieldvalue {
ID decimal(18,0) [not null, primary key]
ISSUE decimal(18,0) [null, ref: > issue.ID]
CUSTOMFIELD decimal(18,0) [null, ref: > customfield.ID]
PARENTKEY varchar(255) [null]
STRINGVALUE varchar(255) [null]
NUMBERVALUE decimal(18,6) [null]
TEXTVALUE longtext [null]
DATEVALUE datetime [null]
VALUETYPE varchar(255) [null]
}
Another option to ensure type safety is to add a table for each data type you want to deal with.
Table attribute {
id integer [not null, unique]
// for internal usage
key varchar(255) [not null, unique]
// public attribute name
name varchar(255) [not null]
}
Table customer {
id integer [not null, unique]
// for convenience
}
Table eav_entity {
id integer [not null, unique]
// no foreign key possible
entity_id integer [not null]
// this could also be an enum
entity_type varchar(255) [not null]
}
Table eav_value_datetime {
attribute_id integer [not null, unique, ref: > attribute.id]
eav_entity_id integer [not null, ref: > eav_entity.id]
// a null value would not make sense; if for an entity a row of an attribute does not exist, it is assumed implicitly as NULL
value datetime [not null]
}
Table eav_value_decimal {
attribute_id integer [not null, unique, ref: > attribute.id]
eav_entity_id integer [not null, ref: > eav_entity.id]
// a null value would not make sense; if for an entity a row of an attribute does not exist, it is assumed implicitly as NULL
value decimal(12,4) [not null]
}
Table eav_value_int {
attribute_id integer [not null, unique, ref: > attribute.id]
eav_entity_id integer [not null, ref: > eav_entity.id]
// a null value would not make sense; if for an entity a row of an attribute does not exist, it is assumed implicitly as NULL
value int(11) [not null]
}
Table eav_value_text {
attribute_id integer [not null, unique, ref: > attribute.id]
eav_entity_id integer [not null, ref: > eav_entity.id]
// a null value would not make sense; if for an entity a row of an attribute does not exist, it is assumed implicitly as NULL
value text [not null]
}
Table eav_value_varchar {
attribute_id integer [not null, unique, ref: > attribute.id]
eav_entity_id integer [not null, ref: > eav_entity.id]
// a null value would not make sense; if for an entity a row of an attribute does not exist, it is assumed implicitly as NULL
value varchar(255) [not null]
}
The popular e-commerce platform https://magento.com/ uses such a model for storing metadata with type safety. You can take a look at into in [https://inchoo.net/wp-content/uploads/2010/09/MAGENTO_v1.3.2.4-Database_Diagram.pdf](this PDF)
Apart from the Simple EAV model with referential integrity blueprint, each of the models has the fundamental problem of missing referential integrity. Your other application layers have to make sure that the referential integrity can be assured. This should be done
In addition to that, all metadata values have to be loaded by using JOIN clauses. You should carefully look when you return the value of the metadata.
With the help of the Entity-attribute-value model you can provide a way to store additional metadata. It makes it much easier for customers and vendors to extend your application with own attributes. You can also reduce the time to add attributes for all of your entities.
After you have read this guideline, you should be able to pick a blueprint which fits your current requirements the best.