Introducing Content Workflow
Facilitate teams and collaboration with versioned content in custom stages.
This is the fourth posting in the series Implementing a Content Database In 10 Steps. Please see previous posts for more background on our working example here.
Why Content Needs Workflow
One significant feature that differentiates content from data is its ability to evolve. Data in its simplest form is a particular record that represents information at a particular point in time. Content-driven applications however need data to change over time, whether that data is pages on a website, profile information for users, or products in an online catalog.
These changes are often the work of multiple team members across one or more organizations, and changes are not made overnight; for example, content creators may collaborate over days to generate text or other assets, which may then be passed along to editors, and may even require review by a legal team before approval and publication to live users.
These rounds of revisions can revolve around the same piece of content, and representing these revisions to the right users at the right time is a key challenge for a content database.
A Simple Workflow Model
Earlier posts in this series introduced a simple e-commerce product catalog. We introduced abstraction techniques and the simplified query language cQL that accesses content and exposes content-oriented features like inheritance.
Now, our team wants to add a new product to the clothing catalog. Their typical team workflow is as follows:
- One or more members of the team take the lead in creating the new product record and writing the initial title and description. The product isn't available for general review until this step is completed.
- Other members of the team make edits to the product description as needed, and contribute product images to go along side. The product is visible to other users in the Content Database and can be previewed, but still isn't visible to the public.
- The product's text, images, and price are finalized by management and approved to go live. The product becomes available to the public through the website and mobile app.
This workflow can be represented in three stages:
- Working Copy: Users of the Content Database can collaborate on content that is only accessible through a private user interface. For webCOMAND, the Bases and Content Manager apps display all content in this state to facilitate team collaboration in real time.
- Draft: Content is ready to be reviewed and incremental edits can be made. Multiple drafts can be created and even rolled back to earlier edits. Application logic can be configured to query for draft content as a 'preview'.
- Approved: Content is approved and visible in all queries against the Content Database. Consumers of the application (i.e. the public) will see approved content.
Adding Workflow to Content
Using our abstracted content type example from this post, we have a product catalog with a content type for Clothing. We are using a traditional relational database and implementing our Content Database within it, and so our clothing content type is a table in our database example. Here clothing has a Title, Price, Description, and Image. It also has a primary key ID for each row, and an object OID that uniquely identifies the object within our content repository.
ContentType_101 (Clothing) | ||||
---|---|---|---|---|
ID | OID | Title | Price | ... |
1 | 101 | Tank with logo graphic | 15.00 | |
2 | 102 | T-shirt with logo graphic | 20.00 | |
3 | 103 | Polo shirt with logo graphic | 25.00 |
We can access items in our product catalog using cQL, where this simple query for items under $100:
SELECT Title, Price FROM Clothing WHERE Price<100
is translated through the webCOMAND API into the more complex SQL that hides the abstraction from our application logic:
SELECT Title, Price FROM ContentType_101 WHERE Price<100
The first thing we can do is add a new State column to the table, which signifies the workflow state that the content is in. 'A' will indicate Approved content, whereas 'D' will indicate a Draft state and a 'W' will indicate a working copy state.1 We can then add a new working copy row for the new product being added to the catalog for our example.
ContentType_101 (Clothing) | |||||
---|---|---|---|---|---|
ID | OID | State | Title | Price | ... |
1 | 101 | A | Tank with logo graphic | 15.00 | |
2 | 102 | A | T-shirt with logo graphic | 20.00 | |
3 | 103 | A | Polo shirt with logo graphic | 25.00 | |
4 | 104 | W | Zipper hoodie with logo hoodie | 35.00 |
We can then add a directive to cQL2 that will query for Approved content only, which application logic would use when presenting products to the public. This would guarantee that only rows that are in the 'Approved' state would be returned. Therefore the following cQL:
USE State=Approved;
SELECT Title, Price FROM Clothing WHERE Price<100;
could become the following SQL:
SELECT Title, Price
FROM ContentType_101
WHERE Price<100 AND Status='A';
Using Multiple Rows for Content
The abstraction techniques discussed in this and previous posts allow us to make an important evolution in our content database schema in order to support workflow, version control, and content variants (discussed in later posts). That is, while simple data is represented in a database as a single row in a table, representing content in multiple states requires that we support storing a single piece of content in multiple rows.
Our above example adding the Status column does not go quite far enough to support workflows, for example, because it does not support editing content that has already been approved. For example, if the description and price on "Tank logo with graphic" needs to be modified, we need to support serving the old version to the public while the new version is being reviewed in-house.
When the product editing begins, the content database API must replicate the approved record as a working copy. This working copy will be what users collaborate on through the webCOMAND Apps or through their own custom interfaces, while public-facing application logic remains unchanged. Here we show the state of the clothing table once the tank product is being updated:
ContentType_101 (Clothing) | |||||
---|---|---|---|---|---|
ID | OID | State | Title | Price | ... |
1 | 101 | A | Tank with logo graphic | 15.00 | |
5 | 101 | W | IMPROVED tank with logo graphic | 18.00 | |
2 | 102 | A | T-shirt with logo graphic | 20.00 | |
3 | 103 | A | Polo shirt with logo graphic | 25.00 | |
4 | 104 | W | Zipper hoodie with logo hoodie | 35.00 |
Our above cQL/SQL query, which serves approved content to the public, will correctly show the old product at $15 because it filters down to rows with a Status of 'A' only, whereas users accessing working copies will see the new description and $18 price. Note that the ID of the product has changed, because the ID column references the specific row in our relational database table. The OID however is the same, indicating that these two rows reference the same content in the repository.
Content 'Fallback' with Multiple Rows
We also need to consider content workflow "fallback", where depending on the workflow state we are interested in, we retrieve the closest match. For example, the following cQL to fetch clothing in the 'Working' state:
USE State=Working;
SELECT Title, Price FROM Clothing WHERE Price<100;
which evaluates to the following SQL:
SELECT Title, Price
FROM ContentType_101
WHERE Price<100 AND Status='W';
would not retrieve the T-shirt or Polo because these are currently approved. This would present problems in the user interface, and prevent users from querying for the number of products in the entire catalog. We also have to be careful about simply amending the above SQL to include both approved OR working copy rows; it is critical that we receive at most one row for each content record in the repository, otherwise queries involving groupings or aggregations would return inconsistent results.3
We therefore add another column to indicate that a content record is the latest working row:
ContentType_101 (Clothing) | ||||||
---|---|---|---|---|---|---|
ID | OID | State | Working | Title | Price | ... |
1 | 101 | A | 0 | Tank with logo graphic | 15.00 | |
5 | 101 | W | 1 | IMPROVED tank with logo graphic | 18.00 | |
2 | 102 | A | 1 | T-shirt with logo graphic | 20.00 | |
3 | 103 | A | 1 | Polo shirt with logo graphic | 25.00 | |
4 | 104 | W | 1 | Zipper hoodie with logo hoodie | 35.00 |
The webCOMAND API would also manage updating the Working column as working copies are created. Our same cQL from above:
USE State=Working;
SELECT Title, Price FROM Clothing WHERE Price<100;
can now use the additional Working column to fetch the latest product records, regardless of workflow status:
SELECT Title, Price
FROM ContentType_101
WHERE Price<100 AND Status IN ('W','A') AND Working=1;
Supporting Multiple Drafts
We have reviewed how we track and query Approved and Working Copy states for content in the repository. The model above works well for these states because there are at most one of these rows per state in the table. However, we also need to support Drafts, and in our use-case above we showed that we need to support potentially multiple drafts for one item of content. If there are multiple drafts for a single product, how will we know which one to select when it is queried?
To solve this, we introduce timestamp metadata columns to our Clothing table. These timestamps, managed by the API when content is edited, track the Start and End times of a particular rows in the table. Below we show our modified table. Working copy records, and Approved records, all have End times of 0 because they are still the "latest".
ContentType_101 (Clothing) | ||||||||
---|---|---|---|---|---|---|---|---|
ID | OID | State | Working | DraftStart | DraftEnd | Title | Price | ... |
1 | 101 | A | 0 | 2019-02-01 10:54:33 | 0000-00-00 00:00:00 | Tank with logo graphic | 15.00 | |
5 | 101 | W | 1 | 2019-02-03 11:04:27 | 0000-00-00 00:00:00 | IMPROVED tank with logo graphic | 18.00 | |
2 | 102 | A | 1 | 2019-01-21 09:51:42 | 0000-00-00 00:00:00 | T-shirt with logo graphic | 20.00 | |
3 | 103 | A | 1 | 2019-01-28 16:27:51 | 0000-00-00 00:00:00 | Polo shirt with logo graphic | 25.00 | |
4 | 104 | W | 1 | 2019-02-04 16:51:21 | 0000-00-00 00:00:00 | Zipper hoodie with logo hoodie | 35.00 |
Now, suppose that the users collaborating over OID 101 save a draft of their "Improved tank" product. They then continue editing:
ContentType_101 (Clothing) | ||||||||
---|---|---|---|---|---|---|---|---|
ID | OID | State | Working | DraftStart | DraftEnd | Title | Price | ... |
1 | 101 | A | 0 | 2019-02-01 10:54:33 | 2019-02-05 11:55:27 | Tank with logo graphic | 15.00 | |
5 | 101 | W | 1 | 2019-02-05 12:02:01 | 0000-00-00 00:00:00 | NEW IMPROVED tank with logo graphic | 18.00 | |
6 | 101 | D | 0 | 2019-02-05 11:55:27 | 0000-00-00 00:00:00 | IMPROVED tank with logo graphic | 18.00 | |
2 | 102 | A | 1 | 2019-01-21 09:51:42 | 0000-00-00 00:00:00 | T-shirt with logo graphic | 20.00 | |
3 | 103 | A | 1 | 2019-01-28 16:27:51 | 0000-00-00 00:00:00 | Polo shirt with logo graphic | 25.00 | |
4 | 104 | W | 1 | 2019-02-04 16:51:21 | 0000-00-00 00:00:00 | Zipper hoodie with logo hoodie | 35.00 |
A number of things changed for this content record, which were automatically handled by the API:
- When the new Draft was created, a new record was created in the table with the latest values. It shares OID 101 with the other content records, relating it to the same piece of content.
- The Draft's DraftStart time was set to the time when it was created. It's Approved row's DraftEnd time was set to that start time as well. This is because the Draft record is now the "latest" record, replacing the Approved record.
- The Working Copy for OID 101 was edited after the draft was created, and its DraftStart time was incremented accordingly. It contains the latest values that the team is collaborating over, but the Draft reflects the version under review.
Now, the team saves another draft. The original draft is retained and can be rolled back if need be, and the new draft replaces the old draft as the "latest" record. The team keeps collaborating on the content after this second draft is created:
ContentType_101 (Clothing) | ||||||||
---|---|---|---|---|---|---|---|---|
ID | OID | State | Working | DraftStart | DraftEnd | Title | Price | ... |
1 | 101 | A | 0 | 2019-02-01 10:54:33 | 2019-02-05 11:55:27 | Tank with logo graphic | 15.00 | |
5 | 101 | W | 1 | 2019-02-05 16:45:02 | 0000-00-00 00:00:00 | *NEW* IMPROVED tank with logo graphic | 18.00 | |
6 | 101 | D | 0 | 2019-02-05 11:55:27 | 2019-02-05 15:33:20 | IMPROVED tank with logo graphic | 18.00 | |
7 | 101 | D | 0 | 2019-02-05 15:33:20 | 0000-00-00 00:00:00 | NEW IMPROVED tank with logo graphic | 18.00 | |
2 | 102 | A | 1 | 2019-01-21 09:51:42 | 0000-00-00 00:00:00 | T-shirt with logo graphic | 20.00 | |
3 | 103 | A | 1 | 2019-01-28 16:27:51 | 0000-00-00 00:00:00 | Polo shirt with logo graphic | 25.00 | |
4 | 104 | W | 1 | 2019-02-04 16:51:21 | 0000-00-00 00:00:00 | Zipper hoodie with logo hoodie | 35.00 |
These added metadata columns allow us to make queries against this content for all three workflow states and return consistent results. We can also ensure that at most one database row per content record is returned, even though internally we store several to maintain these workflow states.
Querying for Approved records in cQL:
USE State=Approved;
SELECT OID, Title, Price FROM Clothing WHERE Price<100;
results in this SQL:
SELECT OID, Title, Price
FROM ContentType_101
WHERE Price<100 AND Status='A';
which returns these rows:
OID | Title | Price |
---|---|---|
101 | Tank with logo graphic | 15.00 |
102 | T-shirt with logo graphic | 20.00 |
103 | Polo shirt with logo graphic | 25.00 |
We can also query in Draft mode to return the latest saved draft records for review. Content with no drafts should defer to the latest Approved record, because Working Copy records are still in active collaboration. We can use a combination of the Status and DraftStart/DraftEnd columns to return the correct rows, because we always query based on the latest record with an end time of 0.
Querying for Draft records in cQL:
USE State=Draft;
SELECT OID, Title, Price FROM Clothing WHERE Price<100;
results in this SQL:
SELECT OID, Title, Price
FROM ContentType_101
WHERE Price<100 AND Status IN ('A','D')
AND DraftEnd='0000-00-00 00:00:00';
which returns these rows:
OID | Title | Price |
---|---|---|
101 | NEW IMPROVED tank with logo graphic | 18.00 |
102 | T-shirt with logo graphic | 20.00 |
103 | Polo shirt with logo graphic | 25.00 |
And finally, we can query for Working Copy records using the Status and Working columns, to get the latest copies for collaboration. We fall back to Approved records when there are no working copies present.
Querying for Draft records in cQL:
USE State=Working;
SELECT OID, Title, Price FROM Clothing WHERE Price<100;
results in this SQL:
SELECT OID, Title, Price
FROM ContentType_101
WHERE Price<100 AND Status IN ('A','W') AND Working=1
which returns these rows:
OID | Title | Price |
---|---|---|
101 | *NEW* IMPROVED tank with logo graphic | 18.00 |
102 | T-shirt with logo graphic | 20.00 |
103 | Polo shirt with logo graphic | 25.00 |
104 | Zipper hoodie with logo hoodie | 35.00 |
In Conclusion
We've shown here that we can use the simplified cQL query language to abstract the complexity of managing multiple workflow states for content. Internally, by building on the abstraction techniques we showed in earlier posts, we are able to introduce multiple table rows to support more context around the content in the repository. See our next post for more ways that we can leverage these techniques for other context features.
1 For simplicity in these examples we represent these states as characters. In webCOMAND these are actually numeric values for performance reasons.
2 This is a future feature that is not in the current release of webCOMAND, at the time of this writing, but serves an illustrative purpose for simplicity here. This functionality is accomplished through a call to the webCOMAND API when connecting to the repository today.
3 Consider a cQL query like SELECT COUNT(*) FROM Clothing. If the SQL could return potentially multiple rows for the same content then we would receive inconsistent counts depending on the workflow states of the content. We must design the content database schema in a way that prevents this scenario from occurring.
Cover Photo by Linda Wilson on Unsplash