Data Collection & Sharing: Categories
Data tends to cluster in groups, with relationships (often activities) that connect the different groups. There are two main clusters:
- Plan: what you intend to do. This is documented before you start growing (ie. the recipe)
- Actual: data collected in the process of growing
The main categories are:
3a. Planting, Transplanting, Pruning, Harvesting, …
Context is the things that are assumed and often overlooked when thinking of data. For OpenAg this is relatively simple, as the primary context is the Food Computer itself - what sensors are installed, where it is located, type of growing (agroponic, hydroponic, raft, deep water, …). This can be simple, or go off into some complex detail (ie. sensors may have calibration requirements, drift rates, maintenance history, certification, …). This data can often be overlooked, but becomes critical when discrepancies are found (“but I thought you …”). The hardware_fixture and hardware_fixture_type would be pieces of information in this category.
Genetics This is a part of context, but a significant context. This can be as simple as Bently Buttercrunch Bibb lettuce, or the complexity of breeding history and DNA sequencing. I would recommend including a Latin name (though this has its own problems). Where you got the seed from (Bently, Burpee, …) is provenance (where you got it from), and not a direct attribute of the seed. I would suggest starting simple (Latin name, common name, supplier) for most work, unless you are getting into seed breeding or DNA/phenotype relationships. There are interesting things in the seed business, where a company will sell different genotypes in different markets under the same name; and with heirloom, open pollinated seeds there is no assurance that the genetics are consistent between seeds.
Agronomics: These are the farming activities - actions performed on the plants as a part of their life-cycle, or on their context (ie. plowing in traditional farming). These things are usually part of a ‘handbook’ of standard practices. Agronomic data will come as two parts: the plan of when and how to do something, and the actual of who did what when. A third part is derived data, comparing the actual to the plan. These comparisons are the bread-and-butter of administrative analysis, and irrelevant to research (assuming the plan was followed).
Treatment: applications of fertilizer, pesticides and pro-biotics. Where there is irrigation or ‘ebb - flood’ watering, this would also be considered a treatment. Treatments may be a regularly scheduled part of a recipe; or it may be an interrupt/exception such as a fungicide or insecticide treatment. pH up and down dosing is a treatment activity.
Observations: these will become the real ‘meat’ of OpenAg, as these are the variables that get mesured.
Environment: this is what most of the sensors are picking up; temperature, humidity, light, conductivity, pH. This is the easiest to collect, most abundant, and likely will end up having the least value (to be explained later). Unlike field agriculture, climate for the food computer should be a controlled variable.
Phenotype: this is what Caleb and others see as the future for OpenAg, it is also what nobody has defined as to how it will be collected and recorded. Phenotype data is the big variable that we want to watch as it responds to different controlled variables. In the phrasing of OBO, these are measures of the quality of a substance (leaf length). It is a personal bias, but I am going to push hard to use the OBO ontologies for this data.
Until we pin down how to collect phenotype observations, there will be little significant data to share.
Next Up: Data Levels