I’m gonna talk a bit about clustering, the issues we are facing now and questions regarding those.
Clusters are nodes, too?
Are we going to treat clusters as individual nodes or groups of nodes? This has some impact in the next points.
Or, to put it differently, does it matter from a user’s point of view if we treat the clusters as entities or groups of entities?
If we have a cluster, it should be listed as an individual entity in the sidebar. How are we going to treat it, both visually and from a backend point of view? We can, of course, mark it as a cluster and be completely opaque on its members, so you’d have to open the clustering UI to check its members. Or we can list the members and group the cluster details by member nodes.
We can’t just add (+) all the figures on all members of a cluster to come up with the cluster details. While this technique holds for amounts and counts, it won’t be reliable for averages, for example.
We probably need some equations system in which
node + node = clusterwould reveal some of the operations we should make on specific properties. Or which fields to ignore completely in cluster context (or add?).
Our long term goal is to use the user provided information on how they clustered things to improve our own deduplication mechanism and maybe train some part of the processing pipeline to analyze data better.
However, if we do it now (i.e. on the current, Rails/Mongo based backend) we’d likely just be able to persist them and read them back, as any specific cluster treatment is made very hard by the way Mongo works. That’s why we’re going for graph DBs. I think @georgiana_b might be able to warn us about more specific issues with the Rails backend and clustering.
These are some points collected following calls with both @ca1yps0 and @georgiana_b. We should talk more about them before reaching a dead end, but while being aware not to fall in rabbit holes. What are the minimal use cases we can afford building now and would have enough flexibility to build on afterwards? What’s our maximum implementation scenario? We can go both very basic or very advanced with this feature.
And, more importantly: let’s make it doable for parallel backend/frontend work in the following week(s). Minimal changes on current backend, if possible, but forwards compatible with the new one. That’s why I’m asking so many things in advance
So @elvis people, let’s exchange some ideas!
P.S.: This post is pieced together following phone calls and one to one interactions. Hence putting it here and avoiding me acting as a (crappy) proxy.