Poplux

This is the slightly longer version of a lightning talk I gave at PoplusCon. It still needs a lot more work, but in the spirit of “Post Early, Post Often”…

As discussed yesterday, I’m working on an application relating to legislative voting records (how your Members of Parliament are voting, ostensibly on your behalf).

So as to reinvent a minimum number of wheels, I’m using several Poplus Components underneath — each vote is by a Person (stored in PopIt), on a Motion relating to a Bill (stored in BillIt). (Future expansions could also store anything they’ve said explaining why they voted a certain way (in SayIt), and let you email them publicly to ask about it (using WriteIt), but let’s keep the discussion simple for now.)

I also want to expose all this information over an API.

But although Poplus is starting to talk about standards around data, there’s been very little talk about standards around APIs. This causes me two problems:

Firstly, I need to make decisions on the hundreds of little things that go into every API. Everyone else building a Poplus Component will also be making the same decisions, and even where answers aren’t specifically right/wrong or good/bad, everyone will end up making a different set of choices, thus making life harder for the poor developer who needs to learn how to use them all later.

So, for a while I was thinking that Poplus should define a common approach that Components should (not must, but preferably) follow. But even if that were so, it wouldn’t be enough, for reason #2:

Users of my API shouldn’t need to know or care what’s lower in the stack — i.e. for most simple usage, they shouldn’t also need to learn the APIs for PopIt and BillIt. They should be able to issue a query that says something like “For each bill introduced in 2009, show me all Green Party abstentions”, and have everything Just Work™.

And of course it’s not simply a matter of hiding what I’m doing from the API consumer — I also need to work out how on earth I even run a query like that against multiple distinct data-sources!

One way would be to install PopIt and BillIt locally, and then break their encapsulation: querying their underlying data stores directly, rather than through their APIs. This, of course, is incredibly brittle. APIs are stable — implementations aren’t. Any time either of these changed anything about how they store their data, I’d need to recode everything. It also rather defeats the purpose of using Components in the first place.

The current preferred route seems to be to use PopIt and BillIt as normal, but rather than querying their APIs, I regularly sync their data down into a local store.  Unlike the prior version this would be independent of their implementations — it would use a stable mechanism for getting all the data out — but it’s still far from problematic. Keeping in sync is non-trivial, and, again, if I’m going to have to create my own local data-stores for these, why bother using the Components in the first place?

No, what I want is the ability to keep everything well separated. My API will just have to know how to use the BillIt and PopIt APIs and effectively proxy them.

I could write that as a one-off thing relatively easily, but that doesn’t address the ‘common approach’ issue, and it isn’t really that much help for anyone who wants to build other Components on top of VoteIt, either. For Components to be most useful, then they should be as simple as possible to combine. So, can we come up with a standard way to query across multiple APIs simultaneously? I spent a while looking around for common approaches to this, but wasn’t able to find anything useful.

After a few weeks of beating my head against this problem, it finally dawned on me that I’ve not only seen a solution to this before — but I actually use it regularly. Estonia faced exactly the same issue when it came to e-government. They needed to find a way to allow all government databases to communicate with each other, and be queried in a consistent manner (including complex queries across multiple services), without having rewrite them all and standardise them. So they came up with X-Road — a data exchange layer that any service can join by providing a simple adapter.

The implementation details of this are surprisingly difficult to find and/or understand (My Estonian and my SOAP/WSDL are both quite basic), but we don’t really need anything quite so complex anyway — it should be enough to just steal the basic idea, transform it into a much simpler REST/JSON version, and iterate from there. For now I’m mostly ignoring the specifics of how X-Road actually works, and using the fact that it does work to imagine how it must work1.

So, with suitable handwaving over the implementation details,  in my imagination, a basic Poplus Data Interchange System2 could look something like this:

We need to be able to respond to an API query something like this3.

  /api/search/ballots?q=value:abstain+AND+motion.bill.introduced.year:2009+AND+voter.memberships.organization.name:Green+Party

This requires an adaptor for each Component, explaining, in a standardised machine readable format, what data it can provide, and where related data lives.

So, the PopIt Adaptor, for example, would explain that it provides:

person: /api/v0.1/persons:result[]
  person.id
  person.name
  person.memberships[]
    person.membership.id
    person.membership.organization_id = organization
organization: /api/v0.1/organizations:result[]
  organization.id
  organization.name
  organization.classification
  organization.memberships[]
    organization.membership.id
    organization.membership.person_id = person

(with something similar for BillIt, which would explicitly note that bill.introduced is a Date, so we can provide a virtual .year on it)

and the VoteIt Adaptor would note the foreign lookups:

ballot: /api/ballots:results[]
  ballot.value
  ballot.voter_id = person@<INSTANCE_NAME>.popit.mysociety.org
  ballot.motion.bill_id = bill@billit.ciudadanointeligente.org/<INSTANCE_NAME>

And from there it’s just a SMOP to transform the query into the remote calls needed to return a sensibly nested JSON list…

Interestingly, this means that not only does a Component author not need to adhere to a consistent standard, it can be someone else entirely who writes the Adaptor for it, which has the useful side-effect of allowing us to embrace non-Poplus APIs too. So, for example, you could store all your MP photos in Flickr and simply start referring to anything that their API gives, e.g. person.image.exif.aperture.

Thoughts, suggestions, criticisms, etc very, very welcome!

  1. I’ve found several times in the past that making this assumption can be a good way of coming up with a better solution, or even a previously undiscovered solution, à la George Dantzig []
  2. which I’m going to call Poplux, for now, even though that’s almost certainly a bad idea []
  3. This is, of course, a simplified version, that’s asking how anyone that has ever been a member of the Green Party voted on those bills. The real version would need to check that they were a member at the time of the vote, to cope with people who change party. The syntax for that will be rather more complex, however, and would be distracting here. This was also written before James released his draft Voting spec, and I should really rewrite the example to use the new terminology and vote_event etc., but later. []

Leave a Reply

Your email address will not be published. Required fields are marked *