"Backend" openag-cloud-v1 Q&A for PFC 3.0


#1

This is a good place to ask us about the backend projects, data format, the web UI, etc.


#2

@rbaynes I haven’t looked at the code yet, but this looks really awesome!

I’m interested in the flow of data to keep things ‘up to date’ in the UI. How do you plan on consistently getting the data from the food computer to the UI. i.e. will you use a polling HTTP request set at an interval, websockets, etc…

I see there is some sort of message queue, what triggers updates to this?

Hopefully these questions makes sense, I’m mostly trying to get an overview of the communication between the major components.

Thanks for opening the discussion!


#3

For folks wanting to get up to speed on how this works, here are a few resources I’ve found. @jmitsch, perhaps this will help answer some of your questions.

  1. Architecture diagram (original is on github in OpenAgInitiative/openag-cloud-v1)

  2. OpenAg Cloud Description” pdf. Also from github (check out the pdf’s parent folder for additional high-level documentation).

  3. The Google Data Solutions for Change article about OpenAg from this past summer.


#4

Thanks for reading the docs @wsnook and sharing with the community.


#5

@rbaynes with this new model of communicating with a central database, could other machines communicate with the central server/engine? i.e. could something be made to support the MVP model or raspberry pi based food computers?


#6

Yes, @jmitsch any machine could use MQTT to do pub-sub to the backend service. Look in the brain device code for the registration script and the IoT classes.


#7

Looks like I have some investigating to do :slight_smile:


#8

@rbaynes is the openag-cloud only for the PFC-EDU or do you plan on it also being the central data hub for the food server? I have designed and built a food server that I would be interested in pushing the data from.


#9

Hi @adam, the backend is generic and we already use it for our 4 food servers here and one in India. We also use it for our hazenut tree computer.

So, yes, any device can use it. Look at the IoT code in the device code to see the format of the messages and how to register your device with the MQTT system.


#10

this might help:


#11

@rbaynes I see the “flavor_data” GitHub has been added into the openag-cloud-v1. These data summaries of past experiments don’t appear to be in the format of executable sequences like “recipe_bag” and previous JSON formatted files.

  1. Have any recipes been developed which are intended for running on the openag-cloud-v1?
  2. What phenotypical measurements are you collecting in the lab and field (schools)?

I would like to attempt to run a group experiment with a common “recipe” between members of the community using MVP’s. Part of this would include manual phenotypical data collection and storage via a form we are developing. I’m considering:

Daily measurements:

  • Canopy Width (CM)
  • Canopy Length (CM)
  • Plant Height (CM)
  • Leaf Count (Count)

Post Harvest:

Curious to hear your feedback.


#12

@Webb.Peter favor_data and recipe_bag are old. I marked them as deprecated.

See the latest test recipes in the openag-device-software/data/recipies dir. https://github.com/OpenAgInitiative/openag-device-software/tree/master/data/recipes
These recipes are part of the device side code that we use mainly for testing. The real recipe is sent from the web UI on the backend to the device.

All the schools are running the same “Get growing basil” recipe now. Since you probably didn’t make an account on our web UI and look at the recipe directly, here’s a screen shot:

These are the manual measurements the students take:

And here is the device configuration all the PFC_EDUs that are being used in the pilot test are running: https://github.com/OpenAgInitiative/openag-device-software/blob/master/data/devices/edu-v0.3.0.json It tells you which sensors are active. All active sensors publish their values as they change.

@paula will have to comment about the post harvest measurements, as she and @hildreth are running the pilot test. The answer may also be in the curriculum that was shared via our wiki if you care to dig it out.


#13

@rbaynes I appreciate the detailed reply.

I really like what you’ve done with the recipes to define “environments” which as I perceive if are different physical configurations of the device which can be associated with various phases: https://github.com/OpenAgInitiative/openag-device-software/blob/master/data/schemas/recipe.json

I appreciate you posting pictures from the UI. I was not aware that I could create an account and assumed it was not yet available to the public and when I tried to register a device I received this error:

The “Horticultural Measurements” form is precisely what I was interested in seeing. This kind of approach (I love that the pictures are actually Basil) will definitely provide the most meaningful data. I look forward to hearing more about how the students are being instructed to enter this data (individual, teams, etc.) and at what cadence. I’ve debated whether or not this should incorporate some sort of workflow with prompts at different “phases”.


#14

Quick question for anyone who might know: Is there, or will there be, a public api created against BigQuery data stores? I see “Data Analytics” in the Future Additions section of the cloud description doc. Will this be available to everyone? Perhaps a better question: can you tell us what the data sharing policy is for this project? If it’s already defined a link would be appreciated! Thanks.


#15

I’m also curious about progress. But, I’ve re-calibrated my thinking recently to look more toward where things might be in 2-5 years. If you look at the milestones that OpenAg will need to hit to achieve Caleb’s vision for Open Phenome, it becomes clear that there’s a lot left to be done. I say that not as a critique but rather as an observation about setting realistic expectations and cutting Caleb some slack. He’s trying to accomplish very ambitious goals. Also, there are opportunities for community contributions to help move things forward.

My equivalent of a crystal ball for guesstimating what’s going on with OpenAg projects is to watch the commit history, issues, and network graphs for their repositories related to whatever project I’m wanting to keep an eye on. Caleb’s Instagram is also useful for tracking the pulse of things because sometimes he posts great pictures of equipment they’re working on. Once in a while Caleb tweets links to interviews or presentations where he talks about progress. Those are good for keeping track of how he’s refining and improving his master plan.

Taking all that into account, my guess is that a public api against BigQuery will make sense only after another year or two’s worth of work on refining the pfc designs followed by a period of community adoption and growing. I’d say three years might be realistic as a rough estimate.

Things to consider:

  1. A big challenge with the new PFC v3.0 is that, while it’s designed for manufacturing, it’s currently only available as plans rather than a kit that can be purchased. That means the only data being generated is whatever is coming from the pilot program that OpenAg is apparently running with some schools in Massachusetts. I think the OpenAg team made around 60 prototype devices for the pilot, but I can’t remember where I saw that number. In any case, a sample size of 60 crops from a hardware beta test isn’t much data for feeding to machine learning algorithms. I expect it would be better to have data from thousands of crops or perhaps tens of thousands. Also, lots would need to be figured out about data schemas, measurement standards and procedures, etc. And the hardware will probably need to be revised as issues are discovered during the initial field testing. All that will be slow.

  2. Progress on the https://github.com/OpenAgInitiative/openag-cloud-v1 repository where OpenAg has been working on their web console for the PFC v3.0 back-end infrastructure has been slow and sporadic in recent months. Based on the pattern of activity, I’d guess that there is no one free to work on it because the OpenAg developers are currently busy on other projects. It’s possible (likely?) that there is a separate private repository associated with a private web console specifically for the schools involved in the pilot program–maybe that’s why activity on the public repo has slowed. Or, perhaps the OpenAg developers are spending a fair amount of time on tech support to schools that are using the prototypes along with doing ongoing work on their Food Server projects (container farm and tree computer).

All that is to say that there’s an enormous amount of work needed to generate the sort of dataset that Caleb has described in his talks about where he wants to take OpenAg in the future. It’s easy to get excited about all the things that could be done with a treasure trove of data, but it’s hard to do the data collection.


#16

Hi @Drew,
What @wsnook said is correct. We are currently writing to a public dataset in bigquery that anyone is welcome to look at. All the code is in the github repo Will linked, specifically the MqttToBigQuery-AppEngineFlexVM folder.

The API that we are using internally is the one stated above over the MQTT protocol. At some point I may create a public API on top of bigquery for just data access.

Anyone can create a free google cloud platform (they give you $300 free credit to play around and learn the platform). You can use it to do queries against our public dataset, it is view only to any user.

https://bigquery.cloud.google.com/table/openag-v1:openag_public_user_data.vals

As @wsnook also correctly stated, this is all an ongoing work in progress and R&D effort. So things will change and data is constantly being added from our various ongoing experiments running in everything from small PFC_EDUs to our shipping container food servers in the US and India.


Do PFC's work as intended?
#17

Oh, and yes @wsnook we have some other private repos for internal web UI projects that we use for research and internal IT stuff. When they are ready they will be made public. The software team is still only 3 members and I’m about to lose someone to a start up, so as always progress is slow.


#18

sorry for the way late reply @Webb.Peter, that UI was going through some rapid changes (and not much testing before deployment) prior to our Boston area school and library pilot test.


#19

@rbaynes & @wsnook, Big thanks for the prompt replies and followup! It’s encouraging to see the data setup with public read perms, this partially addresses my initial concerns. I’m not encouraged by a requirement that I give Google my credit card information to access this data in any meaningful way. I understand I can create an account, create a GS bucket, and export to it without spending any money… initially.

I know first-hand the costs and complexity of big data and would never expect the OpenAg team to saddle the burden of providing this to the public. That said, data is arguably the most valuable part of this effort and avoiding a formal contract between an open source design and the data it produces is worrisome. Has anyone broached this subject on the inside? Although I see it as a major design consideration others may not.

While this is an R&D effort, I hope we all see it as a an effort with huge potential and one worth investing resources into. As both the formal OpenAg team and its ad-hoc public counterpart iterate on designs, I feel this formal data contract deserves definition. Without it I’m much more likely to bifurcate data produced by my builds into something else. This is not advantageous to us as a group, and by setting this precedent now I can see data hording becoming a thing. OpenAg has the potential to become PartiallyOpenAg and the world will benefit less because of it IMO.

That may be the case for the folks at the Media Lab, but it ignores the benefits of baking in data sharing expectations for the rest of us. My feedback is that it makes sense to have this defined now so as to unify a single repository both sides can contribute to with confidence.

In summary, I’m asking myself two things:

  1. Is there any way for me to take exported table data offline without setting up a billable account?

  2. Has anyone at OpenAg made a public statement concerning how data collected from these systems (as designed and published) will be shared with the public moving forward?

Again, thank you so much Rob and WS for your time and attention to this effort as a whole. Your work is inspiring and I truly believe it’s catalyzing something big for all of us!


#20

Here’s a great blog post that might give you a bit of perspective on how I’ve been thinking about this issue: https://embeddedartistry.com/blog/2018/11/12/a-look-at-ten-hardware-startup-blunders-part-1-process.

In particular, I think the section on “Failure to Separate Research and Development Phases” might be helpful for understanding OpenAg’s current situation.

The key point is that sometimes you have a problem that truly requires research–where you have to invest a lot of open-ended time and effort to figure out how to do a thing that you don’t know how to do. In OpenAg’s case, there’s a lot of existing knowledge about controlled environment agriculture and a lot of existing knowledge about crowd-sourcing giant data sets. But, much of that happens in a proprietary for-profit context where insights are treated as trade secrets. OpenAg is trying to figure out how to do an open version of that. One obvious problem is that hosting and processing large data sets is expensive.

It might be the case that the exploratory work OpenAg is doing now is actually the fastest path to a situation like what you are asking for. It’s just that since there are still important open questions, it’s not possible to set things firmly yet. For a related perspective on this issue, see point 3 in this comment from Howard: Do PFC's work as intended?