|
|
Representing Decisions with Graphviz |
[Decision Trees] |
Posted on July 29, 2013 @ 07:14:00 AM by Paul Meagher
Previously I discussed a simple risk management framework that involves precisely specifying the actions, events, and outcomes involved in a decision problem.
One aspect of "precisely specifying" a decision is representing the overall decision making problem in the form of a graph. We can use a pad of paper to draw lines representing possible actions, which lead to events, which lead to outcomes. Or, we can use a tool like Graphviz to construct much prettier graphs and make us feel more professional.
GraphViz was developed by AT&T Research and is considered a top-tier tool for creating/visualizing graph structures.
In today's blog, I want to give you a basic idea of how Graphviz works so you can judge for yourself whether you want to invest time into learning it.
To generate a graph using Graphviz you need to write some commands into a "dot" file that ends with the extension ".dot". The term "dot" is also used to denote one of the main Graphviz programs used to generate graphs from dot files. Also, the term "DOT" is used to refer to the command langauge you enter into your dot files.
Without further ado, here is dot file called DecisionTree.dot that depicts a decision in terms of an action, event, outcome framework for managing risk.
If I run this command from my Linux command prompt:
dot -Tpng DecisionTree.dot > DecisionTree.png
The dot interpreter will read the file and generate the graph in png format. This is what that output looks like.
As you can see, it is not difficult to go from entering commands into a dot file and generating a decent looking graph. The DOT language has much more powerful features for drawing graphs than what I am showing you here; however, in the initial stages of sketching out the actions, events, and outcomes involved in your decision problem, you may want to keep things simple and just focus on drawing out all the nodes and the lines between them.
The graph below visualizes a nitrogren application decision. Do I apply 90 lbs per acre of Nitrogren or 110 lbs? The effect of each action on the crop I want to grow is jointly determined by the amount of rainfall I'm likely to recieve over the growing season. The action forks (i.e., application amounts) lead to the event forks (i.e., rainfall amounts) which lead to the terminal outcomes (i.e., expected number of bushels).
|
Permalink |
Bird Nesting Site |
[Nature] |
Posted on July 27, 2013 @ 10:05:00 AM by Paul Meagher
I was taking junk to the dump when I noticed a very active bird nesting site on the property. The soil is a sandy clay which the birds appear to like for nesting. This is what an area of the nesting site wall looks like:
This is a closeup of what the birds inhabiting a nesting den look like.
Here is a short video (25 seconds) of the sights and sounds at the bird nesting site.
I thought it was a cool site and thought I would share.
|
Permalink |
A Framework for Managing Risk |
[Decision Making] |
Posted on July 24, 2013 @ 08:03:00 AM by Paul Meagher
What is risk and how do you manage it?
One aspect of the definition of risk is that it involves quantifying the probability of the relevant decision variables so that you can formally understand the probabilities associated with various possible outcomes. When I say "quantifying the probability" I generally mean specifying the probability distribution for that variable and using distribution statistics, such as the mean and standard deviation, to characterize the shape of
the probability distribution.
When you decide to formally manage risk in your line of business you might consider using a decision making framework consisting of Actions, Events, and Outcomes.
A decision problem starts when you have the choice between multiple possible actions {a1, a2, etc...} and must make a decision as to which one to choose.
The effect of each action is not deterministic. If you decide to hire a new sales person, for example, you can't predict exactly what the effect of that decision will be other than that it will likely increase sales revenue by a certain amount, or better, by a range of sales revenue amounts with differing levels of probability. Various events affect the probability
that you will achieve a certain level of sales - the economy, competitors, production capacity, etc... So, in addition to specifying the possible actions we can take, we must also identify the main events {e1, e2, etc....} that affect the outcomes we can expect.
The final component of a risk management framework involves specifying the outcomes {o1, o2, etc....} that
are relevant to our decision making (e.g., o1 = increase sales by 25% to 50%, o2 = increase sales by 50% to 75%,
o3 = increase sales by 75% to 100%).
We can now be very specific about what risk is: Risk = Actions {A} + Events {E} + Outcomes {O} where Events and Outcomes are quantified as probability distributions. In a later blog, I'll discuss how to use this framework to make calculations, but I'll devulge the goal of these calculations now - to compute p(O| A & E), in other words, the full conditional probability distribution.
|
Permalink |
Estimating Probability Distributions: Part 2 |
[Statistics] |
Posted on July 18, 2013 @ 04:17:00 PM by Paul Meagher
In my last blog, I discussed some useful probability distributions for representing our uncertainty about a parameter; the uniform and the triangular distributions.
Our uncertainty about a parameter θ such as "the price of gas next week" can be represented using a uniform distribution where the gas price could be anywhere between some low estimate and some high estimate of the price next week. If we also want to hazard a guess as to the most likely value, then we would be using a triangular distribution to represent our uncertainty about the price of gas.
There are other simple techniques for eliciting a probability distribution to represent our uncertainty about a parameter. In today's blog I want to discuss a simple technique called "Merit Scoring".
The Future Price of Corn
The easiest way to explain this technique is if you look at the table below.
Corn Price (per bushel) | Merit Score |
$4.25 | ? |
$4.50 | ? |
$4.75 | ? |
$5.00 | ? |
$5.25 | ? |
The table has future corn prices ranging from $4.50 to $5.50 per bushel (see quotecorn.com for current price). Now, I might ask you to assign a relative merit score to each price point in this range. A merit score can range between, say, 1
and 10. If you assign a merit score of 1 to a price point, that means you think the future price will not be nearest to that price - the price estimate has low merit. Conversely, a merit score of 10 means that you think the future price will be nearest to that price - the price estimate has high merit. My merit scores for the price of corn on Sept 1, 2013, looks like this.
Corn Price (per bushel) | Merit Score |
$4.25 | 1 |
$4.50 | 5 |
$4.75 | 10 |
$5.00 | 8 |
$5.25 | 3 |
In this example, we are not directly assigning a probability to each possible price point. Instead we are supplying a merit score to each possible price point. We can easily convert each merit score to a corresponding probability by summing all the merit scores and then dividing each merit score by this sum. The result is a probability assignment for each price point with probabilities for each price point summing to 1. This gives as a probability distribution for our parameter which is the price of corn on Sept 1, 2013.
To demonstrate how merit scores can be converted to probabilities and how this forms a probability distribution I have devised a PHP-based script that shows how the calculation is done, what the calculated price probabilities are, and that these probabilities sum to 1.
Conclusions
The merit scoring technique and script can be used to estimate a probability distribution for any parameter that interests you. One limitation of this technique is that it is discrete in nature so can't give you probabilties for prices that might fall between two price points (e.g., $4.85). This may be of concern if you think you should be trying to estimate the future price of corn with more resolution (e.g., 10 cent increments) and/or the daily variability in corn prices is not that high. The daily price of corn is actually quite high so being correct to within 25 cents might be a good goal for your predictions.
|
Permalink |
Estimating Probability Distributions: Part 1 |
[Business Models] |
Posted on July 11, 2013 @ 07:56:00 AM by Paul Meagher
In my previous blogs on modelling revenue for a season of lobster fishing I was fortunate in having data I could work with that allowed me to specify detailed probability distributions for the main revenue factors in my revenue model. I modelled the distribution of catch sizes with a normal distribution and the distribution of prices with a categorical distribution (I knew what the 4 price points were and roughly what their relative probabilities were). I was able to make some fairly strong
assumptions about how the revenue factors (catch size in lbs and price per lb) in my revenue model were probabilistically distributed.
When a startup is trying to model their expected revenues for a forecast period, there is often more uncertainty regarding how the relevant factors in their revenue model might be distributed (these "relevant factors" can also be called "random variables"). In such cases, we may need to resort to modelling these random variables (e.g., monthly sales) with distributions that are easier to specify and take better account of our level of uncertainty.
In this blog I want to discuss 2 distributions that are useful in such situations: a uniform distribution and a triangular distribution.
If you need to forecast the level of sales over a forecast period but are new to the market place and are uncertain as to what
the uptake of your product or service will be; or are uncertain about the level of production that you might be able to achieve (e.g.,
crop yield using a new growing technique), then you might want to consider using a uniform distribution to represent your level
of sales. Why a uniform distribution? To specify the parameters for a uniform distribution, all you need to specify are the upper
and lower bounds of that distribution (denoted a and b in the graph below). You assume that the actual level of sales can fall
anywhere within that range with equal probability (i.e., 1/(b-a)). Specifying the upper and lower bounds for your level of sales is
significantly easier than specifying the expected mean and standard deviation for, say, your monthly sales figures. Also, it can be
argued that a uniform distribution better reflects your more extreme state of uncertainty with respect to the variable you are trying to predict;
namely, your level of sales for each month or quarter in your forecast period.
Figure 1: Uniform Probability Distribution
Source: http://en.wikipedia.org/wiki/Uniform_distribution_(continuous)
If you have a bit more confidence about what your most likely level of sales might be, and also what the upper and lower bounds
of your sales might be, then you should consider using a triangular distribution to represent your uncertainty about your level
of sales. To specify the parameters for a triangular distribution, all you need to specify are three values: the lower bound,
the upper bound, and the most likely value (or modal value). These values are denoted as a, b, and c respectively in the
graph below. The probability of your most likely value is computed using the formula 2/(b-a).
Figure 2: Triangular Probability Distribution
Source: http://en.wikipedia.org/wiki/Triangular_distribution
My library of probability distribution functions includes a UniformDistribution.php object and a TriangleDistribution.php object that could be used to generate random values from these distributions, after you specify the relevant parameters to them. This means that even under conditions of extreme uncertainty regarding expected sales, you may still be able to model expected revenue, and expected variance in revenue, if you opt to model your revenue factors using a uniform or a triangular probability distribution. In my lobster fishing example, if I was more uncertain about the catch size or catch price to expect, I might opt to use a uniform or a triangular distribution to model the distribution of possible catch sizes and catch prices rather than the normal distribution and categorical distributions I chose because I had more data to go on.
|
Permalink |
Revenue Model Sensitivity |
[Business Models] |
Posted on July 9, 2013 @ 07:36:00 AM by Paul Meagher
Yesterday I posted a completed revenue model for a season of lobster fishing. The major outputs of that model, total catch and total revenue, are actually very accurate with respect to this season's catch and revenue totals for lobster fishing. I'll talk a bit about the issue of model fit in this blog, but I mostly want to focus on what we might do with our revenue model now that we have 1) identified the major revenue factors, 2) specified the appropriate probability distributions for them, and 3) performed the appropriate math to generate revenue per unit time (e.g., per catch) and total revenue amounts. What we now need to do with our revenue model is try to understand its behavior better. When we execute the lobster fishing revenue model once it will generate one set of values for total catch and total revenue for a season. When we execute it again, it will generate a different set of values. This might leave you scratching your head as to what to make of this variability. Well, one thing we can do is rerun the model many times and use the computed season totals to generate some statistics that summarize what the mean season catch and season revenue is most likely to be. We can also compute how much variability there is in the season catch and season revenue totals by computing the standard deviation of these totals.
My latest script for revenue modelling is called lobster_fishing_sensitivity.php because its primary purpose is to do a sensitivity analysis of our lobster fishing revenue model. There is a section of the script below that contains various parameter settings. To do a sensitivity analysis all you do is start playing around with these parameter settings and see how the model output changes. This "playing around" can offer "model driven insight" into your business - how might my revenues be affected if this or that parameter value is changed (e.g., price of lobster, decay value, initial catch size, etc...). Playing around with your revenue model can help you to anticipate the future revenues of your business better.
The lobster_fishing_sensitivity.php script is well commented so I'll just present it as is and then show what the out put of the script looks like when I point my browser at it.
The output of the script looks like this:
Season Catch Mean: 23426.71 lbs
Season Catch Standard Deviation: 957.77 lbs
Season Revenue Mean: $81991.88
Season Revenue Standard Deviation: $3373.00
If we re-execute the lobster_revenue_sensitivity.php script many times, we will observe that the Season Revenue Mean value will change a bit on each execution. It will not change by that much, however, owing to some central limit stuff that is going on here. So what we are getting from this sensitivity script is a cleaner estimate of what predictions our revenue model is making. When we "fit" our revenue model to the data, it is these sensitivity stats that we should be evaluating our revenue model against and not the output of a single run of our revenue model. These sensitivity stats do in fact match up with the observed data for this season fairly well. This is a plausible revenue model for a season of lobster fishing and could be used to frame a revenue prediction (via parameter settings) for the next lobster fishing season .
|
Permalink |
A Lobster Fishing Revenue Model |
[Business Models] |
Posted on July 8, 2013 @ 09:51:00 AM by Paul Meagher
In my last blog on the topic of revenue modelling (using a lobster fishing season as an example), I talked about categorical price distributions and how they would be a better representation of how lobster prices vary than using a normal distribution to model prices.
I thought I would finish of the exercise in modelling lobster fishing revenue by presented the completed lobster fishing revenue model. There are two main differences between this version of the script and my last lobster fishing revenue script.
- This version represents the variation in prices using a categorical probability distribution rather than a normal distribution. I discussed how a categorical probability distribution works and why it is appropriate in my last blog.
- I simplified the looping structure for each catch. I didn't realize that e0=1. I thought the natural exponent e raised to the power of 0 was 0 so did some unnecessary coding for the first season's catch. It is good to know that e0=1 (see Exponential Functions at Wikipedia) as it help to understand how the exponential decay function works when $c = 0 (i.e., first catch of the season).
So without further ado here is the completed lobster fishing revenue model implemented as a PHP program:
Using a categorical distribution to model prices instead of a normal distribution does not appear to make a big difference in total seasonal revenue numbers produced by the model (around 80k) so we probably could have gotten away with using a normal distribution to model prices; but prices were not in fact distributed in this manner and I preferred the model to be truer to the pricing facts.
In my next blog I will take this revenue model and re-run it many times in order to better understand the behavior of the model and come up with some estimates of total expected revenue and total expected catch for a lobster fishing season.
Addendum
A week ago I had the pleasure of having a feed of lobster at a family event that my father-in-law held. He has a fishing boat. His two sons use it to fish lobsters, crabs, and tuna. They held aside some lobster for the event. Below is the cooked "feed of lobster" we had for the event.
I am not that proficient in shelling a lobster so asked for some tutoring. My sister-in-law Amanda demonstrated what is possibly the fastest and most efficient method possible for extracting meat from the tail of the lobster. The culinary schools do not even appear to teach this method. Amanda must have perfected this technique from having to shell so many lobsters for the menu at her new Backroads Bistro business. This is a 5 second video so you may have to rewatch to get how to do it.
|
Permalink |
Farm Experiments |
[Agriculture] |
Posted on July 2, 2013 @ 10:27:00 PM by Paul Meagher
I haven't blogged much this week because I've been busy trying to get caught up with work on a second farmstead property we own. We have a couple of rental units on the farm that needed to be cleaned up for the upcoming tourist season. I also needed to deal with some land I plowed earlier in the spring - I rototilled it up and it is starting to look more like a field and less like a nuclear fallout site. We've had lots of rain over the last couple of weeks so I didn't get a chance to put in our gardens so I'm working on that these days. Got my tomatoes, lettuce, spinach, corn, yellow beans, bell peppers in the ground yesterday and will plant out some more today. I also planted some Harrington Malting Barley that I am trying to build up a seed supply for. I started with 4 seed packets last year and hope to get a burlap bag or more of seed this year. I am going this route because it was next to impossible to get good malting barley seed in this part of the world perhaps because the humidity is considered too high to grow it - can cause fusarium blight which is not good. We have good wind on the ridgetop farm so I'm thinking this might be less of an issue. Also, we grow alot of feed barley so I don't see why it should be so difficult to grow malting barley. If my seed supply does well maybe I'll start selling malting barley seed that is adapted to this climate.
I blogged earlier about my experiment with growing potatoes in the hay rows I made last fall for planting into for the spring. I planted 800lbs of potatoes so this was a fairly large experiment. It has been interesting so far. I didn't get the early season jump that I thought I would get from planting in the hay. I thought the decomposing hay might create heat and the hay would act as a nice insulating blanket over the potatoes to give them a head start. What happened is that the insulating blanket of hay made it harder to heat up the ground and potatoes just didn't grow until the temperatures picked up in the middle part of June. Neverthess, they did eventually start to take off and I'm expecting to know the final results by the middle of July when I will start removing hay and seeing how many potatoes I have per plant.
I planted 6 different varieties of potatoes so there will be varietal difference (different maturity dates). This was supposed to be a no-work form of gardening so I haven't done work to maintain the potatoes rows except drive a lawn tractor next to the windrows a couple of times to blow in some additional hay. This does not add much new hay but it does keep the potatoe field looking a bit more like a potatoe patch and less like a hayfield. The hay is coming on strong and has grown through the hay mulch but the potato is also a tough plant and my feeling at the point is that they will co-exist with the potato plant possibly becoming a bit more dominant as it grows more. No evidence of any colorado potatoe beetle or other pests or deficiencies. Hopefully my next update on this experiment will be a photo of some nice small round red potatoes (I'll likely start harvesting my Norland early season reds first).
|
Permalink |
|
|
|