Data Identity: Context


#1

Discussion of how to define the context of the data collected. This will likely decompose into two sub-parts:
Experiment - what is being done
Environment - the location and features


#2

Besides ownership, each data record needs to be stamped with context information; things that are assumed for each document/record.
Experiment is bigger than a recipe, though the recipe being used is one attribute of the experiment. It is possible that an experiment involves multiple recipes (ie seeing if changes of photo-period produce the same response in two different species). This is a description of the research project, with goals and methodologies.
I have not thought this through too much, other than knowing it is needed, especially for queries looking for all the data associated with a particular experiment, for running analytics.
One difficulty is assigning a unique identifier to an experiment. It would be most helpful if the experiment were created on the cloud database/UI (similar to recipes), then downloaded to the boxes that were going to run the experiment.
The experiment may also contain operational aspects of a recipe. For example, the water level of a reservoir may get lowered as plants develop roots and the roots get longer. The experiment may be where this information is stored as it applies to the box - ie the recipe may call for the water to be 2 inches below the bottom of the plant, and this information may need to be translated to the particular reservoir depth.
exp={‘recipe’:{},
‘start_date’:‘2017-11-10:14:13:22’,
‘setup_date’:‘2017-06-10:14:13:22’,
‘reservoir’:{‘empty’:220, ‘full’:200},
‘lights’:{‘on’:‘06:30’, off’:‘10:00’, ‘level’: 100}
}

Environment describes the physical features of the box. At a summary level it may be ‘PFC v2.0’, or at a detail level describing the model and location of the sensors. If programming becomes more abstract, this might be the pin assignments (or I2C address) or an individual sensor or actuator. This is where attributes like the MAC address and possibly geographic location are stored.
env={‘mac’: ‘xxxxxxxxx’,
‘location’:{‘city’:‘Bloomington’, ‘state’:‘IN’},
‘equipment’:{‘model’:‘MVP’, ‘version’:‘1.0’,
‘sensors’:{‘temperature’:{‘name’:‘temp_1’, ‘type’:‘si7021’, ‘location’:‘top’,
‘name’:‘temp_2’, ‘type’:‘mcp9808’, ‘location’:‘reservoir’, ‘address;‘9999’},
‘humidity’:{‘name’:‘humidity_1’, ‘type’:‘si7021’, ‘location’:‘top’
},
‘reservoir’:{‘capacity’:20, ‘empty_distance’: 230, ‘full_distance’: 180’, ‘fill_cycle_time’:45},
‘lights’:{‘type’:‘led’, ‘brand’:‘GE’, ‘GE Bright Stick’, ‘equiv’:‘100’}
}


#3

The relatively new ‘Minimum Information About a Plant Phenotyping Experiment’ standard might be worth considering: http://www.miappe.org/
See their PDF table (first bullet point). Pretty thorough (I think) about growth conditions and measurements, but not as oriented towards hardware.
I’ve looked at their software tools, but not successfully used any of them. We’d like PlantCV to be be able to save metadata in their format, presumably using the isatools python package.


#4

@jshoyer
It looks good for a quick read. I am always for using an existing standard rather than trying to create a new one. Any idea on what level of buy-in it has received?
That they don’t cover hardware doesn’t bother me; this is a focus on phenotype data, and is a ‘minimum’ standard. I envision hardware being in a different section. Besides, field experiments don’t need much hardware, so it shouldn’t be required in a minimal standard.
I need to look at isatools. Why write code if someone else already has.
I use to work with Claire Augustine at Monsanto, who wrote the original PATO plant ontology. We had many discussions about it, and ontology in general. I am a strong supporter of the OBO ontology, but while included in the OBO, PATO was never accepted as one of the standard ontologies. It is good in many places (and much better than nothing!!), but has some significant weaknesses, most of which I don’t think we will deal with for a while
Thanks for finding this.
The idea of trying to combine triple-store with CouchDB leaves me a bit light headed.


#5

I@jshoyer, @Webb.Peter
Taking a second look at this, it looks like a document to be attached to a data package at the end of an experiment. It contains both contextual information (environment) as well as summary of sensor information (averages of environmental measurements).
I need to think on this, as I have been assuming what data would possibly be needed during an experiment, or as meta-data about the the context (true to any experiment); and not about summaries for post-experiment.
That is not necessarily a conflict, as what I have envisioned could be the ‘source’ from which this could be generated, and the more the two look the same to start, the simpler things can be.


#6

I do not think MIAPPE has gotten all that much buy-in in the US yet, but it does have a fair amount of mind-share, partly because there are already multiple peer-reviewed publications associated with it. An Australian phenometrics group is also developing a standard called the Plant Ontology-Driven Data (PODD) repository project.

A consortium of European groups developed the MIAPPE standard, so I think it has more uptake in their software tools, especially at the IPK in Germany. I think you are quite right that the MIAPPE standard is geared towards post-experiment reports rather than real-time analysis. The IPK online database seems to be designed for people to download tarballs (100s of GB) with metadata included.

The TerraRef groups have thought quite a bit about metadata for many types of sensor in field experiments, but I have never wrapped my head around all their documentation. https://terraref.gitbooks.io/terraref-documentation/content/data-standards.html

(edited 7 hours later to link to PODD)