Python Django EAV Model for Users to Easily Add Variables

Django ORM and our Motivation for EAV

Basic object-relational models in Python Django do not allow users to modify the data model within the application. For example, personal health assessments involve variables such as blood pressure, height, weight, waist size, blood glucose level. The basic model for collecting that data in Django would define every one of those variables in advance in our Python code, from which Django creates an object-relational mapping that amounts to a database table with one column for each variable.

If there are other variables we want to collect later, such as smoking status or cholesterol level, then we need to update the Python code for the model and perform Django migration to add new columns to our database table. Then we also need to update templates, forms, and reports to include the new variables. Doing this frequently could require and needlessly occupy development resources. So, what if we want the users of our Django application to be able to add variables themselves?

This is a particularly common request for applications involving collection of data in the field, in our case interview surveys and health assessments of voluntary participants. From this demand arises a desire to use Entity-Attribute-Value (EAV) models. The promise of an infinite number of user-defined variables combined in infinite ways tempts users seeking to maximize the value and longevity of an application. Real-life implementations of such models are non-trivial due to the administrative needs, domain-specific business logic and validation requirements, data entry interface, and reporting functionality.

Django Model Incorporating EAV

An EAV or sparse matrix model can make our implementation of data entry and reporting far more difficult because we are shifting from operating on a single record for all of the collected information for a participant to creating and maintaining a dynamic number of equivalent value records, each value record then mapped to the attribute or variable collected and a common parent entity reflecting our participant (or in a more complex case one instance of our participant completing a survey). Fortunately, Django is able to abstract some of this implementation for us.

See below for a representation of the sparse matrix as it applies to our Django data model. In practice there can be many empty or zero values because only a small subset of the potential questions for a survey are asked of and answered by a particular participant at an event.

Sparse Matrix for our Python Django EAV Model

See below for a conceptual entity relationship diagram of the EAV model.

Conceptual Entity Relationship for our Python Django EAV Model

Django data models to match our conceptual ERD look something like the following listings.

QUESTION_TYPES_TUPLE = (
    ('StringValue', 'String'),
    ('IntegerValue', 'Integer'),
)


class Question(models.Model):
    question_text = models.CharField(max_length=250)
    question_type = models.CharField(max_length=50, choices=QUESTION_TYPES_TUPLE)

    def __str__(self):
        return self.question_text

The Question model represents a question or measurement. Each Question is a variable for which users can collect data. Each question has question_text such as “What is your weight in pounds?” We define two types of questions - those where the answer is a number and those where the answer is text. In a real application you can code question types beyond integers and strings such as dates or digital signatures.

class Survey(models.Model):
    survey_name = models.CharField(max_length=250)
    questions = models.ManyToManyField(Question)

    def __str__(self):
        return self.survey_name
    
    def get_sorted_questions(self):
        return self.questions.all().order_by('id')

The Survey is a group of Question. Users can pick and choose from the all questions to construct a Survey. The ManyToManyField method tells Django that each Question our users create can be used in any Survey.

class Event(models.Model):
    survey = models.ForeignKey('Survey', null=True)
    event_date = models.DateTimeField(null=True)

    def __str__(self):
        return "Event %s %s:%s" % (self.id, self.event_date, self.survey)
        
    def eav_report_data(self):
        report_data = []
        participants = self.participant_set.order_by('name').prefetch_related('questionresponse_set')
        questions = self.survey.get_sorted_questions()
        for participant in participants:
            participant_data = [ participant.id, participant.name ]
            participant_responses = participant.get_response_dict()
            for question in questions:
                if question.id in participant_responses:
                    participant_data.append(participant_responses[question.id])
                else:
                    participant_data.append(None)
            report_data.append(participant_data)
        return questions, report_data

The Event is an occassion where users collect data for each Question in a Survey such as a company health fair. The ForeignKey method tells Django that every Event can have only one Survey or group of Question to ask. Every event also has an event_date on which the event is held.

class Participant(models.Model):
    name = models.CharField(max_length=250, null=True)
    event = models.ForeignKey(Event, null=True)
    date_completed = models.DateTimeField(null=True)
    notes = models.TextField(blank=True, null=True)

    def __str__(self):
        return "%s at %s:%s" % (self.name, self.event, self.date_completed)
    
    def get_response_dict(self):
        response_dict = dict()
        for question_response in self.questionresponse_set.all():
            response_dict[question_response.question_id] = question_response.value
        return response_dict

The Participant is a person who answers the questions asked by a user at an Event. Each Participant always has a name, the Event where he/she was given the Survey, the date and time he/she answered the Survey, and any notes the user added about the Participant while entering his/her answers.

class QuestionResponse(models.Model):
    participant = models.ForeignKey(Participant)
    question = models.ForeignKey(Question)
    value = models.TextField()

    def __str__(self):
        return "%s: %s" % (self.question, self.value)

    class Meta:
        unique_together = ('participant', 'question')

    def get_input_obj(self):
        return self.question.get_input_obj(self.participant)

The QuestionResponse is one Participant’s answer to a Question. We expect to have one QuestionResponse per Participant for every Question in the Survey for an Event. We use ForeignKey to tell Django that every QuestionResponse has one Participant and one Question. Django can figure out the rest from our previous models.

We need to allow for a means to store and retrieve values of multiple types (numeric and textual being the most common, but dates, times, and constrained choice selections are also desirable). Our QuestionResponse model uses a single value field defined as a large text object. This has the trade-off of the simplicity of using one underlying field for storage of all values versus the lack of explicit specification and enforcement of a particular data type at the model level for each of our questions. Alternatives include defining additional fields for storing values of other data types (in which case we must have logic that determines which field to retrieve when we want to display the collected data), and defining logically-separate relations for values of different data types (this technique is used for EAV product attributes in the Magento e-commerce platform, for example).

We also need to allow for varying the controls used by our Django form to match the type of data expected for a question. An additional method defined on our Question class allows us to define a separate collection of classes with form control specifications to match each question type we define within the attribute portion of our EAV model. In this way we can generate views that utilize the best UI and client-side validation for a user-defined question. Packages such as Floppy Forms or Crispy Forms may be used in place of or as an additional layer on top of our model to further customize input widgets. For the sake of simplicity, we will inline our widgets in our single data collection template in this example.

Reports with Dynamic EAV Columns in Django

Django templates do not allow passing arguments to functions and do not allow accessing dictionary values by a variable key (see https://code.djangoproject.com/ticket/12486). The Django philosophy is do all the data retrieval and manipulation within the Python code for models, forms, and views, and pass the data within a context for the template to display. (The “view” in Django then is essentially an MVC controller and the template is an MVC view). Obviously, this prevents a straightforward attempt to report our EAV survey responses in tabular format within a Django template.

One solution is to provide a function within our Event model to produce the virtual tabular view on the EAV survey responses. Using the prefetch_related Django queryset function on the question responses for the set of all event participants ensures that we will query our many-to-many relationship between participants and questions answered a single time when generating the report, rather than round-trip to our database multiple times for every participant or every collected data point. It is important that we define a consistent logical sort order for the questions within our Survey so we can efficiently iterate over every possible question and find the answer if present. Also note we are only iterating over the questions assigned to the Survey for the selected Event. The database may contain any number of other questions with data collected from other surveys without hindering our efforts.

Our Event method or member function eav_report_data builds an ordered list for each participant of the event such that the list contains one value that corresponds to each question on the Survey for the selected Event. We also pad the front of the list with two additional pieces of data, namely the participant identification key and participant name, for use in our report. The get_sorted_questions function added to our Survey model class ensures a consistent sort order is referenced across our application when compiling and displaying these reports.

Our view retrieves the requested report information from the Event using our eav_report_data function and passes it in context to our template for rendering. Our template then only needs to iterate once over the ordered list of questions to define the columns of our report, and then iterate once over the set of ordered lists of participant responses, rendering each response value in order corresponding to the list of questions above.

Simple refinements to such a reporting mechanism may include defining additional properties for specifying a custom sort order for our questions for collection or reporting, or a short name or unique textual key for each of our questions. Using Django templates it is also trivial to produce the same report output in a CSV or other delimited format.

Data Entry via Django Forms

For data collection, we create a very simple view for adding a participant to an event, a view for managing that participant, and a view for answering our questions. Utilizing our get_sorted_questions function of Survey allows us to provide a user-interface to walk a participant through the questions forwards and backwards with an indication of progress. Every response is saved to our QuestionResponse model which is in effect our sparse matrix, representing the many-to-many relationship between a participant at an event and an answered question. This example demonstrates a fully-functional though minimal interface for data collection, using only built-in Django packages and features. In practice an Ajax-enabled responsive front-end provides a more impressive-looking user interface and is able to leverage the same back-end code.

Some of the data we collect is not EAV-appropriate, the participant name in this example (typically other demographic data such as address as well), because we intend to always collect it for every subject participant. There is a disadvantage to storing this data in key-value format in our QuestionResponse model since it will make it less apparent how to access the data in our views (especially important in more advanced functionality such as automated email messages to our participants) and increase our storage and processing overhead needlessly. This data is modeled as fields of our Participant model, and added specifically by field name to our report and other views where needed. A proper EAV implementation in practice will nearly always work alongside a traditional non-EAV model to balance dynamic capability with performance and ease-of-development.

User Updates through Django Admin

For configuration of our Event, Survey, and Question objects we could define custom views with elaborate authentication and process logic depending on the project requirements. In this example, we enable the Django admin templates for these models. The built-in admin functionality allows an authenticated user a convenient means of defining any number of questions dynamically, grouping them arbitrarily into surveys, and scheduling events using those surveys. The data collection and reporting views continue to work with any new questions we define, fitting themselves to the surveys we configure.

Project source code at Github


James Walters

Originally Published July 8, 2016


Also published on Medium.

Leave a Reply:

  1. Alban says:

    Thanks for this, I’ve been looking for ways to achieve exactly that. However until I saw your article I was considering using Generic Relations (https://docs.djangoproject.com/en/1.10/ref/contrib/contenttypes/) and so just add any number of variables of any type.
    What would be, in your opinion, the pros and cons of each method ?
    Thanks!

    • Equastat says:

      If I understand correctly, the strength of Generic Relations is that they allow creation of one model that has a foreign key relationship to any number of other models. That would presume that you have already defined all of the models which you want to use. I could see you defining different types of variables (attributes in EAV) as different models and then using one generic relation model to store the values of every instance of every variable, in which case you could define multiple more elaborate attribute models for your users to utilize.

      If you are just creating a tag sort of model as in the Django documentation TaggedItem example to map values to entities then you do not need to use generic relations unless you are using more than one model for your entities. If you don’t care about having a central list of user-defined attributes to choose from but just want users to add them per value (i.e. add arbitrary “name=value” items to any instance of your entity model) then you can just use a single value model with a foreign key relation to your entity model (or a generic relation to all of your entity models). This is easier to implement than a full EAV model but not as convenient for implementing the tabular reporting in my example app.

      You could also use generic relations to define attributes you could associate to several different model entities. For example, to create user-defined variables that you can attach to all of the other models in your application. You will have the possible inconvenience of not being able to directly filter on the content_object field. It could be awkward and difficult performance-wise to extract all of the values for a particular attribute across entities this way. Generic relations are better for retrieving all the values for one particular entity instance at a time.

      An EAV model has a distinct attribute layer that defines your set of variables, along with possible constraints on each variable, and can more efficiently work with all data for one particular variable by maintaining a single sparse matrix-like table of values. You could change the foreign key to Participant in the QuestionResponse model in my example to a generic relation and have it associate attributes and their values to any other model you create. Then you are combining the strengths of both methods. Your reporting will have to filter by the generic relation content_type first to avoid mixing different entity models together in a report.

  2. Mr. Name says:

    Excellent writeup, thank you so much!

Leave a Comment:

Your email address will not be published. Required fields are marked *