Content modelling in Drupal: Entity fever

When you first experiment with Drupal, it's easy to get excited about the flexibility of the entity system. But there are traps for young players. In particular, if you're planning to use "secondary" entity types like taxonomy terms or users to store complex content, it's important to know what you're giving up.

If you haven't read my previous post about content modelling in Drupal, here's a quick summary of the entity system. In Drupal, from version 7 onwards, almost any piece of information becomes an "entity". If something is an entity, you can add structure to it via a standard set of fields. This means you can add custom fields not just to standard content items (which Drupal calls nodes), but also to users, taxonomy terms and more. Developers get a toolkit (the Field API and Entity API) for creating new kinds of field, and new kinds of entity.

The entity system has big benefits for developers taking on complex custom Drupal builds. But it also adds a lot of flexibility and power for everyone building or planning a Drupal site. If you're new to Drupal, it can be hard to appreciate why this is. So let me illustrate with a hypothetical example.

Hypothetical: A tourism information site

Imagine you want to create a website that contains tourism information for a particular region. The site will have information about tourist attractions, places to stay, places to eat, local news stories and events. Because each of these represents a specific kind of information, structured in a particular way (events have dates, hotels have star ratings, and so on), you'll almost certainly create a content type in Drupal for each of them.

But say you want to give users some flexibility about how they view this information. You don't want your site to have a section called "accommodation", a section called "news", and so on, with no way of filtering the information. What if I know I'm going to Bendigo, for instance, and I want to be able to go to a "Bendigo" page on the website and find see places to stay and eat while I'm there?

The obvious way to do this in Drupal is using the taxonomy system. Taxonomy is what we typically use for classifying content, especially when our classications extend across several content types. So we can create a taxonomy vocabulary (that is, a custom set of terms) for towns within our region. We can then add a term reference field to each of our content types, so we can associate each piece of content with a town or towns.

Having classified our content in this way, we can go ahead and create a page type on our Drupal site that displays a dynamic listing of all the content associated with a given town. Drupal provides a simple implementation of this taxonomy term page out of the box, but most sophisticated Drupal sites will use Views to "take over" the page and customise it — for example, by creating differently formatted sections for each of the content types, with different filtering and sorting rules ("let's show all the hotels in Bendigo in alphabetical order, but only the three most recent Bendigo news stories").

Static content in taxonomy terms?

So far, so good. As long as we have properly structured content, Drupal makes it easy to create dynamic ways of displaying it. But let's say that, in addition to listing all the content that's been classified with each of our towns, we also want those town pages to display some "one-off" content: say, a map of the town and a "hero" image of the main street. In other words, we don't just want to display information about things related to the town, we also want to display some information about the town itself.

Well, starting with Drupal 7, we can store this kind of information directly in the taxonomy term. This is the power of the entity system at work: Drupal allows us to add custom fields to taxonomy terms, in exactly the same way that we add them to content types. So in our case, we would simply need to add two extra fields to our "towns" vocabulary. To display a map, we might use a Geofield to store geographical coordinates (if we wanted to generate our maps within Drupal), or a text field to store a Google Maps embed code (if we were happy to use what Google provides). For our hero image, a standard Drupal image field would do the trick.

We now have what we need to create a page that's a mixture of static content directly about the towns, and dynamic content related to the towns, all using the flexibility of Drupal's taxonomy system.

When I started creating content models for Drupal sites, I got very excited about the potential of taxonomy terms to store "real" content in the same way that nodes do. I was tempted to use taxonomy to do more and more stuff. Why not use taxonomy for any content that could potentially be used to classify other content? If a business had a standard list of services, for instance, why not build that list as a taxonomy vocabulary rather than a content type and just add a whole heap of fields?

What you don't get in Drupal's Taxonomy

That sounds great in theory. But in practice, it might not be such a good idea. Here's why: although taxonomy terms are entities in Drupal, just like nodes, their base properties (the things every term has in common) are quite different. While all entity types have access to the same set of fields, nodes were designed from the ground up to store content, including complex content, while taxonomy terms were designed around the purpose of classifying other content. Because of that, there are things we get with nodes that we don't get with taxonomy terms. These include:

  • The ability to create approval and publishing workflows
  • Granular permissions about who can do what with which content type (permissions around taxonomy are much cruder; a user can either do everything or nothing)
  • Storing things like author, date published, and date modified as system-generated attributes
  • Version tracking, including the ability to revert to an earlier version of a piece of content
  • Searchability: in default Drupal search, only content within nodes appears in search results

Now, all this is fine when the information we're storing in our taxonomy terms is simple, static, and uncontroversial. The geographical coordinates of a place never change (except in Lost), so there's no need to track revisions, we don't particular care who entered them and when, and they don't contain any text that needs to be searchable.

But once we get into more substantial textual content, the lack of these standard node features can be a dealbreaker. To return to our example site, say we wanted to add a substantial written introduction to each town alongside our location info and photo. We would need to think hard about questions like: do we need to be able to have one staff member create a new draft of this content, and another approve it? Do we need to be able to switch easily between two different versions of the content, depending on the season? Do we need the content to be searchable using default Drupal search?

If the answer to any of these questions is yes, then you won't be able to use taxonomy terms to store this written content. Instead, your best bet well be to create another content type (called something like "Town information"), and once again use a term reference field to create the relationship with the relevant taxonomy term.

Rule of thumb: Use taxonomy terms to store information that's simple, static and uncontroversial. For everything else, use nodes.

What about users?

A user is another piece of information that becomes a type of entity from Drupal 7 onwards. In Drupal, a user can be someone internal to your organisation (like an admin or content editor), a member of the public with a username and password to your website, or even an ordinary, non-logged in visitor (known as an "anonymous user"). Access to the system for each of these types of person is controlled in exactly the same way. And because users are entities, we can also use user accounts to store any kind of information about these people: a photo, date of birth, address, Twitter username, biography, favourite band…anything we like. We can also make any user appear as the author of any node. Different kinds of user (known as user roles) have different sets of permissions. A user can occupy more than one role, or can change roles.1

It can be tempting to regard "users" as the correct place to store any information about people. User roles make it easy to cope, for example, with someone who starts off as an active user of the website, then leaves the organisation, but still needs to have their name preserved as an author of past content. No problem, we just change their user role! We can even create roles for people who will never be users of the website; it wouldn't be out of the ordinary to find a Drupal site with user accounts for Beyoncé, Napoleon or Jesus Christ.

So why shouldn't we regard "users" as the appropriate entity type for any and all information about people?

To begin with, you have the same set of issues that you do with taxonomy terms around things like revisions and searchability. Because nodes are the entity type that's specifically designed to store content, a lot of content-related features are only available in nodes.

There's also the fact that every user in Drupal — even Shakespeare — must have an email address and password in the system. It's easy enough to enter dummy values for these things, but it's messy and feels somehow wrong. This, together with the fact that Drupal calls people users, can be a real psychological barrier for site owners.

More importantly, there might be organisational barriers. In large organisations, it's often the IT Department that controls user accounts for any and all systems — including the website. If you're running a website where you frequently need to create an entity for a type of person who isn't a "real" user (external authors, for instance), do you really want to lodge a ticket with IT every single time?

Once again, in cases like these, we have to accept that users won't always do the job: we just need to accept the level of complication that comes with creating an extra content type (or even a custom entity type). Again, if we need to, we can create relationships between nodes and users using an Entity Reference field, so if we do end up having to create dual entities for a single person, at least our database "knows" they're the same person.

Content models in the real world

These examples might be specific to Drupal — and I guess that if you're not into Drupal you've stopped reading long ago — but they show what happens when a content model meets reality.2 We can go ahead and design a perfectly logical hypothetical system with the fewest possible moving parts. This takes work, time and careful thought, but in a way it's the easy part. To make our content model work in a real CMS, we have to think about the limitations of the CMS itself, and about how authors and the organisation are going to be creating and managing content, one day after another.

  1. This, incidentally, is why user roles are not set up as standard entity bundles — something that puzzled me when I first learned about Drupal entities. Because users represent people in all their messiness, the boundaries between roles need to allow for overlap or alteration. But this isn't how bundles work in Drupal. So, for example, a node can belong to one and only one content type, and (as all Drupal site owners have discovered to our pain) a node can never be converted from one content type to another. 

  2. Other CMSs have their issues too, just different ones. In WordPress, as Stephanie Leary explains, you can't add extra fields to taxonomy terms at all, so in my example above, you would have to create a content type (known as a "custom post type" in WordPress) even to store location information and a hero image for a taxonomy term. 

« back to blog index