Example dataset with 7 days of time lapse + temperature log


#1

Hi folks,

One of the open questions about food computers is how to go about sharing recipes and data on what happens when you follow the recipe. As far as I’m aware, while there’s work in progress, the current software doesn’t have a way to do this other than replicating a CouchDB database–that’s a very awkward format for sharing information. There’s been discussion here on the forums about data formats and what sort of data is important, but, so far, I haven’t seen an example of people sharing data in a way that could scale. Granted, there are lots of good pictures and a few recipe links. That’s good, but forum posts don’t have much potential as a scalable data sharing architecture. So, as a step forward in addressing those issues, I’m offering an example of what could be done with GitHub.

Don’t worry about how this data wasn’t collected on an official PFC, how I wasn’t doing automated nutrient dosing, or which sensors I was using–that doesn’t matter. The point here is to give an example of how we could use a GitHub repo to share information about our equipment, procedures, and results in a way that’s easy for people to comprehend and to process with any software they like. If you want to feed my jpeg time lapse photos to OpenCV, you can do that. If you want to chart my TSV sensor logs, that’s easy too. I’d like to see what people can do with this.

cc @gordonb @rbaynes @webbhm @Caleb @will_codd @_ic @spaghet


#2

Using github is a great way to manually share recipes, data and images.

I am working on a backend that uses the couchdb data plus images to share data. I will share the plans and first pass when they are more complete (all I have is a demo now).


#3

Awesome! I think it’s important that we set up a good way to share recipes/data.

I think DB replication is not fundamentally awkward and can definitely be scalable.
Is there a particular reason that it’s not, aside from the Human Interface aspect? We also don’t have a good scheme for the data yet which might also be problematic.
Having a frontend layer wrapped over the DB replication with CSV/TSV/excel output could be scalable.
I think we already have CSV export in the frontend but a good scheme would help quite a bit.


#4

To understand my perspective, it might help to think about the economic concept of opportunity cost. Why spend precious developer hours on building custom systems to do things that could be done cheaply and well with existing tools?

It took me about one afternoon to prepare this dataset using rsync, a text editor, a little bash scripting, and GitHub. I didn’t know up front what interval I should use for the time lapse and sensor logging, so I logged more than I needed. I had to manually thin the images, but, in doing that, I learned what interval I want to use (1 hour). Now, I’ll put that interval in my code, and my next data export will be even easier.

The benefit of approaching things this way is that I get to iterate fast and learn quickly. Using simple, flexible tools lets me spend a high percentage of my time on the interesting stuff–observing plants and learning golang.

[edit: Assuming that your data is stored under the hood in CouchDB, providing CSV/TSV export like you mentioned would be great. I’d encourage you to think about rsync too because it works really well for moving lots of images. I was regularly syncing over 2GB of ~1MB jpegs easily–that would be a huge pain in a browser.]