Units of measure (Celsius vs. Fahrenheit)


#1

Should we be standardizing on the units of measure of the data that we are logging?

I’m so accustomed to using Fahrenheit for temperature but celcius might make more sense.

  1. Practically the whole rest of the world uses celcius
  2. Many libraries for temperature sensors default to celcius

Is the resolution of a single celcius degree enough or should we record it to the tenth of a degree?

Reference
For a temperature of 50 - 80 degrees Fahrenheit the equivalent is 10 - 26.7 degrees celcius.

Anyone have any thoughts on the matter?

We can discuss gallons/liters later :wink:


#2

Short answer: Record your measurements in tenths of a Celsius degree. But, set your code up so you can pick the format you are most intuitively familiar with for viewing charts. If using Celsius means you will be constantly doing conversions in your head to Fahrenheit, why go through that hassle? Granted, metric is better in so many ways. But, in the US, it’s not the standard.

Long answer: About this time last year, I made a data logging system that I used for microgreens and then lettuce. One of my goals was to make a working example of sharing datasets. At the time, people were talking on the forum about sharing data, but nothing much was happening.

Anyhow, I grew my stuff, took a bunch of pictures, logged temp & humidity, made charts, stuck the datasets up on github, and linked to it in a post here. As far as I can tell, about 5 people looked at my data, and maybe about 2 of them engaged me in conversations about it. Beyond that, nothing much happened. At that time, I was using Celsius, but I don’t remember anybody saying anything to indicate that they cared about my units.

The lesson I took from that experience was that I should focus on collecting data in a format that was most useful to me, because I was probably the only one who would ever be looking at it closely.

For the next generation of my data logging system, I switched to doing temperature charts in Fahrenheit because I’m used to it. Also, lots of the hydroponic and horticultural resources written in the US refer to temperatures in Fahrenheit.

I suggest you log data in whatever format is most useful for your own growing. I suggest that you log data according to good scientific practice and then present it according to whatever is most useful for your own growing. At this point, data sharing in the OpenAg community is mostly just hypothetical–an ambition for the future. So far, lack of compatibility isn’t a practical problem because there’s not much of anything to be compatible or incompatible with. For now, it’s kind of a blank slate where you can do what you want.

edit:

This is an excerpt of what my current logging looks like (Celsius to 0.1 degree, append records to a text file with “%%” as record separator, field separator is “|”):

%%
2018/04/14 23:21:00|
BME280|22.1C|30.2%RH|979.1hPa
%%
2018/04/14 23:24:00|
BME280|22.1C|30.6%RH|979.1hPa
%%
2018/04/14 23:27:00|
BME280|22.1C|30.6%RH|979.0hPa
%%
2018/04/14 23:30:00|
BME280|22.0C|30.6%RH|979.0hPa

And, here’s an example of one of my charts. When I make a chart, I do the Celsius to Fahrenheit conversion in my code.

2018-04-14_charts

In my experience, rounding to integer degrees Celsius is not good. Changes of less than one degree are important. A good way to think about it is like using significant figures for chemistry or physics calculations. If the sensor datasheet says it can give you something like 0.04 °C repeatability for a 12-bit measurement (see Si7021 datasheet), then you could make good arguments for recording measurements with either 0.1 °C or 0.01 °C precision. On the other hand, using 0.001 °C would be claiming precision you didn’t measure, and 1 °C would be throwing a lot of your measured precision away.


#3

@wsnook Celsius to the tenth it is then (for the logged data).

I’m also going to switch from gallons to liters. I think the reason I used gallons in the first place was because of the dosing instructions for many nutrients are based on gallons/teaspoon/tablespoon.

I am surprised that we still don’t have even a general schema for data logging. I think everyone seems to be in agreement that recipes and logged data should be stored in couchdb. I will share my thoughts on the structure of this data shortly.


#4

FWIW, I’m not much of a fan of CouchDB. Also, reading between the lines from what little I’ve seen from OpenAg–github commits, software developer job postings, etc.–it sounds like Rob is working on a ground-up overhaul of their approach to data collection–seems like CouchDB is out, and Google Cloud Platform is in. I know Howard and Rob have exchanged schema ideas here on the forum some. If you haven’t seen those posts, it might be worth a look (see the Sensor Data Modeling thread).

Another bit of unsolicited but hopefully useful advice… People don’t seem to appreciate how far you can get with just appending log lines to a text file. For some reason, it’s a popular notion these days that you can’t build anything without a database, but that’s not true. Particularly if all you’re trying to do is record time series sensor measurements, appending lines to a text file works great. If you start each line or record with a timestamp, It’s easy to make filters to pull out data for time spans that you care about for charting or the like. When your data is text, you can use all the normal unix tools to great effect–rsync for replicating data, grep for filtering and searching, gnuplot or python and matplotlib for making charts, etc.

That said, there are times when a relational database is a good choice for recording data.


#5

I agree with you that text files meet a lot of needs, and for a minimal project should be the starting point. The first version of the MVP logged to a text file, and I only recently dropped it. The issue I find is that while I can easily build tools for working with text, there comes a point where it is easier to get a package where others have already written and standardized the tools and I can get back to focusing on the business issues. CouchDB has the tools for query, replication and web access, otherwise I regard no-sql databases as basically flat file managers (leaning toward JSON format).
There is always the danger of letting technology glitz get in the way finding the best solution to the problem.


#6

I believe you are just recommending a ‘model-view-controller’ pattern. Store the data in scientific standards (Celsius for the model), but let the view translate to whatever makes the user comfortable.
Having worked with international systems, it was a pain to figure out who was using what ‘standard’ (bushels, tons, metric tons, and multiple Indian standards) and trying to convert everything on-the-fly for analytics and reporting. Not only difficult, but also hard on performance. Converting at the interface is a lot easier.

As to how many decimal places… A big issue is the precision of the instrument. I always loved latitude & longitude measurements stored with with 14 decimals (because that is what the GPS NMEA sentence contains), when the the precision is being reported as ±3 meters!! At least the NMEA sentence contains a DOP (dilution of precision), though few applications show it, and fewer people know what it means. We should have something similar, but that gets into calibration certification and maintenance.

Know your data, and know your business needs.


#7

No, that term has a lot of baggage that I’d rather not invoke. But, yeah–other than the MVC terminology–I think we’re more or less on the same page… Using SI units saves a lot of trouble, pay attention to the precision & calibration of your instruments, etc.

To expand on why I don’t mean MVC, what I had in mind was is basically this:

from __future__ import print_function, division

# Log some data
with open("log.txt", "w") as f:
    f.write("timestamp,degree-C\n")
    f.write("noon yesterday,23.2\n")
    f.write("1 pm yesterday,23.5\n")

# Reformat the log
with open("log.txt", "r") as f:
    skip_header = f.readline()
    for line in f:
        t, c = line.split(",")
        print("At {}, it was {:.1f} F.".format(
            t, (float(c) / 5 * 9) + 32))

What I’m getting at is that this situation doesn’t require a framework like Django or Rails–the typical context for conversations about MVC. Rather, here we only need a few simple building blocks:

  1. File I/O
  2. The formula for Celsius to Fahrenheit conversion
  3. Print statements with decimal format specifiers

On its own, that code wouldn’t be sufficient for making charts like the one I included in my earlier comment. But, with a little bit more effort, it could be extended to format a CSV file for feeding to gnuplot or to make a suitable python data structure for matplotlib. If you add a web server like apache, you can generate the chart images into your web server’s root directory. If you do that after each new round of measurements (maybe every 2-5 minutes), the result will look like real-time charting of your sensor data. But, the mechanism behind it can still be very lightweight and simple.

Yes. That I definitely agree with!


#8

For my system I am sticking with couchdb for all data storage. I find it much easier to log and retrieve data compared to messing with text files. All the data is at my fingertips which I can then use for all my information gathering needs dynamically. My current couchdb has just over 250,000 documents (on average 9 “fields” per document) and my database is a mere 67 MB. It’s pretty amazing that a $35 Rpi has no problems running couchdb and well as many other processes that make up the systems “brains”.

As for OpenAg. I would imagine that they will stick with couchdb locally and use a google cloud database solution to aggregate every food computers data and apply some machine learning to it. I would be happy to contribute my data if the information gathered was shared with all contributors. There’s no way the OpenAg food computer would be dependent on an internet connection for the system to be able to log data. There will be some sort of fall back.


#9

My background is SQL (Oracle), and while I like CouchDB in some respects, I still don’t have the hang of writing queries (though Mango is a big help). Do you have any experience with reduce functions to create level-break data? In particular creating daily and weekly summaries of min, max and average temperatures.

Yeah, I am amazed at how small the footprint is on the Raspberry, I also find the security and authentication a very simple process.


#10

Hi @webbhm,

I run two RPI’s, one of them monitors and controls the reservoir and the other monitors the environment. Sensor readings and actuator states are logged every minute to a single couch database. Each document has a type with a value of “resData” or “envData” the associated data and a date/time it was created. Both of the RPIs synchronize their clocks to the same time server which keeps their clocks totally in synch (this is how I can combine data from multiple computers).

{

“_id”: “5068e51970e1f1892cf9c7b46eb2e77c”,
“_rev”: “2-dbd1096a74d09e92877993b6944db538”,
“airHumidity”: 71.2,
“airTemp”: 74.7,
“co2”: 200,
“dehumidifier”: 0,
“humidifier”: 0,
“co2Active”: 0,
“type”: “envData”,
“updateAt”: “2018-01-28T12:39:03”
}

{
“_id”: “0086552aa764a31a3fe8562fb2000066”,
“_rev”: “1-3a3141c89e09ec080cfc565a3aa4e0f7”,
“resTemp”: 71.37,
“outlet4”: 0,
“outlet2”: 0,
“outlet3”: 1,
“ec”: -0.06,
“outlet1”: 0,
“light”: 0,
“resDepth”: 7.2,
“ph”: 6.1,
“type”: “resData”,
“updateAt”: “2018-01-28T12:39:03”
}

My views are primarily indexed based on the date/time because most data being pulled into a graph will be date based (I pull every x record if too many data points are returned). The initial build of the view can take a little time to run but querying the data is very fast. I’m not currently doing any reducing… just mapping. I’m also planning on using couch for system settings but will use a separate database.


#11

I agree that we are capturing time series data, with the data/time being a key attribute. A trick I found is rather than worrying about tightly synchronizing the time, I bin the data. I am capturing data about every 20 minutes (though checking temperature more frequently for regulating the fan). When I go to run a chart I run: minutes=(int(math.floor(ts.minute/20))) and this lines things up.

If you have not already, I suggest you read the thread on data modeling. As I have explored broader used of the data and diverse (and changing) environments, I have found a need for a more robust and flexible data structure; and this is what I am currently recommending for exchanging operational data.