Audit Fields in Google AppEngine

Executive summary: Here’s how we’re implementing audit fields in AppEngine. IT’S BETTER THAN THE WAY YOU’RE DOING IT!

I considered saying “I hope there’s a better way of doing it” but I believe I’ll get more responses if I frame it in the form of a challenge.

For all entities in our datastore, we want to store:

  • dateCreated
  • dateModified
  • dateDeleted
  • createdByUser
  • modifiedByUser
  • deletedByUser

Here are the options we’ve considered

Datastore callbacks/Lifecycle callbacks

AuditAppEngine supports datastore callbacks natively. If you use Objectify, they have lifecycle callbacks for @PrePersist and @PostLoad. The former works fantastic for dateCreated, dateModified, and dateDeleted. Objectify can handle all three easily as well provided you use soft deletes, which we do. (And they aren’t as bad as people would have you believe, especially in AppEngine. You’d be surprised how many user experience problems you discover strolling through deleted data.)

Both of these led to problems for us when we tried to use them for the createdByUser et al methods. We store the current user in the session and access it through a UserRetrievalService (which, at its core, just retrieves the current HttpSession via a Guice provider).

If we want to use this with the Objectify lifecycle callbacks, we would need to inject either our UserRetrievalService or a Provider<HttpSession> into our domain entities. This isn’t something I’m keen on doing so we didn’t pursue this too rigorously.

The datastore callbacks have an advantage in that they can be stored completely separately from the entities and the repositories. But we ran into two issues.

First, we couldn’t inject anything into them, either via constructor injection or static injection. It looks like there’s something funky about how they hook into the process that I don’t understand and my guess is that they are instantiated explicitly somewhere along the line. Regardless, it meant we couldn’t inject our UserRetrievalService or a Provider<HttpSession> into the class.

The next issue was automating the build. When I try to compile the project with a callback in it, the javac task complained about a missing datastorecallbacks.xml file. This file gets created when you build the project in Eclipse but something about how I was doing it via ant obviously wasn’t right. This also leads me to believe there’s something going on behind the scenes.

Neither of these problems is unsurmountable, I don’t think. There is obviously some way of accessing the current HttpSession somehow because Guice is doing it. And clearly you can compile the application when there’s a callback because Eclipse does it. All the same, both issues remaining unsolved by us, which is a shame because I kind of like this option.

Pass the User to Repository

This is what was suggested in the StackOverflow question I posed on the topic. We have repositories for most of our entities so instead of calling put( appointment ), we’d call put( appointment, userWhoPerformedTheAction ).

 

I don’t know that I like this solution (as indicated in my comments). To me, passing the current user into the DAO/Repository layer isn’t something the caller should have to worry about. But that’s because in my .NET/NHibernate/SQL Server experience, you can set things up so you don’t have to. Maybe it’s common practice in AppEngine because it’s still relatively new.

(Side note: This question illustrates a number of reasons why I don’t like asking questions on StackOverflow. I usually put a lot of effort into phrasing the question and people often still end up misunderstanding the goal I’m trying to achieve. Which is my fault more than theirs but still means I tend to shy away from SO as a result.)

Add a User property to each Entity

I can’t remember where I saw this suggestion. It’s kind of the opposite of the previous one. Each entity would have a User property (marked as @Transient) and when the object is loaded, this is set to the current user. Then in your repositories, it’s trivial to set the user who modified or deleted. This has the same issue I brought up with the last one in that the caller is responsible for setting the User object.

Also, when new objects are created, we’d need to set the property there as well. If you’re doing this on the client, you may have some issues there since you won’t have access to the HttpSession until you send it off to the server.

Do it yourself

This is our current implementation. In our repositories, we have a prePersist method that is called before the actual “save this to the datastore” method. Each individual repository can override this as necessary. The UserRetrievalService is injected in and we can use it to set the relevant audit fields before saving to the repository.

This works just fine for us and we’ve extended it to perform other domain-specific prePersist actions for certain entities. I’m not entirely happy with it though. Our repositories tend not to favour composition over inheritance and as such, it is easy to forget to make a call to super.prePersist somewhere along the way. Plus there’s the nagging feeling that it should be cleaner and more testable than this.

Related to this is the underlying problem we’re trying to solve: retrieve the user from the session. In AppEngine, the session is really just the datastore (and memcache) with a fancy HttpSession wrapper around it. So when you get the current user from the session, you’re really just getting it from the datastore anyway using a session ID that is passed back and forth from the client. So if we *really* wanted to roll our own here, we’d implement our own session management which would be more easily accessible from our repositories.

So if you’re an AppEngine user, now’s where you speak up and describe if you went with one of these options or something else. Because this is one of the few areas of our app that fall under the category of “It works but…” And I don’t think it should be.

Kyle the Pre-persistent

This entry was posted in Google App Engine and tagged . Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://kyle.baley.org Kyle Baley

    Folks, when I said “IT’S BETTER THAN THE WAY YOU’RE DOING IT!” that was meant to inspire you to show me up by describing a better way of doing it, not discuss the semantics of the field names I used.

    “Audit fields” was clearly the wrong term to use. Please disregard it and substitute “stuff we care about”. Of the six fields, the dateModified is actually the one we use most often, primarily in administrative functions and reports.

  • Damien

    I agree – it’s fine to record “createdAt/By” and “deletedAt/By” because, *in those instances*, these are events that occur only once, and by a single user. But Modification is something that can/will occur multiple times.

    I had intended to post exactly this point, but found that Michael had already beaten me to it.

  • Wilmer Comeaux

    I have always tried to approach auditing and logging as a cross-cutting-concern that is attachable/detachable via configuration. My business process usually doesn’t care who updated an entity, or when. But, somesone will always need a report that shows the information. As Michael said, append-only is the only way to get a true log or audit of what has happened. And there are plenty of good ways to accomplish that and not have your repository or entities even know it is happening.
    I’ve been using frameworks like Spring for years to handle this – it is one of the textbook examples of a cross-cutting concern, even if your business process makes use of the audit-information.

    Also, as noted by Michael, if your entities have dateModified, modifiedBy, etc then there are better ways. I won’t say you are “doing it wrong” because, occassionally there may be a good reason for doing it. :) However, I have always found a more generic audit system is far more useful. I don’t often want to know who all has “updated a person”. What I find more valuable is knowing, “what was happening in the system when this person was updated”. A general auditing system will, however, provide both.

    Plus, you never have to write it more than once. :)

  • Anonymous

    No more bold than ”
    IT’S BETTER THAN THE WAY YOU’RE DOING IT!” :)

    My point is that you capture only the date of the most recent modification. You don’t capture the history of changes to the entity. To me, “audit” implies that I can see the full history, hence the need for an append-only model.

  • http://kyle.baley.org Kyle Baley

    I don’t know about that. Without knowing what we’re using the fields for, it’s kind of bold to assume we’re doing it wrong, yesno? I would think most people would look at dateModified and conclude that it means “date any fields other than the audit fields changed”.I could have called the fields “meta info we care about” but “audit fields” already brings up useful connotations that I didn’t want to have to explain.

  • Anonymous

    If you have an audit field called dateModified, then you are doing it wrong. How can you audit something that you modify? That destroys the prior state, as well as the prior modification audit record.

    Any auditable system has to be append-only.

  • http://kyle.baley.org Kyle Baley

    Yes, I probably could.

  • Ayende Rahien

    Can’t you create a global ThreadLocal variable, set it in the beginning of the request and then read it from your calback?