Monday, June 15, 2009

Causality

When we evaluate something, we typically are trying to understand and make claims about causal relationships. When we create a system dynamics model, we are mapping and modeling causal relationships. But how do we tell what relationships are causal and which are correlational?

Thanks to a recent pointer on the evaltalk mailing list, here's Sir Austin Bradford Hill's “The Environment and Disease: Association or Causation?” Hill gives nine considerations to ponder.

For a rather shorter read, see xkcd's take on causality. Be sure to see the alt tag.

Labels: , , , , ,

Tuesday, June 09, 2009

Creating sustainability in complex ecosystems

I recently had the privilege of teaching a course in system dynamic for Willamette University's Sustainable Enterprise certificate program. The course lasted two days, with a follow-up two-hour web seminar. We focused on qualitative system dynamics, but we treated it at a somewhat more rigorous level than many such courses, I think.

I'm writing because of one particular lesson I learned—we all learned. Early in the course, we used a simulation game to help people have a common, shared experience of interacting in a challenging system environment.

As with many such games, the expected result is that people fail in making the system work. Typically, the debrief is used to help people understand the ways of thinking that led them into trouble and to prepare them for the material that's to come.

Unexpectedly, this class managed their challenges quite sustainably. While their skill wrecked the planned flow of that part of the session, I was really pleased to see their skill in action. We spent some time talking about what made them successful and how that might carry over to real-world situations. Their insights were useful enough that I wanted to share them (with the students' permission) with a larger audience: you.

I first asked what made them succeed in the game and what provided the most challenges.

Goals were the first. While the game tells them the goal they should have, they rapidly realized that focusing on the stated goals would lead to ruin, and so they decided to set a much longer-term goal.

Communications was the second factor. After the first round, they began to spend most of their time huddled in the center of the room, talking animatedly through their decision-making processes instead of working in isolated teams.

They noted that delays provided a key challenge. As they worked to establish trust in the social system they had set up, they were both trusting other teams' commitments and verifying that they were indeed living up to their commitments. That takes time: commitments made today may not show up for quite a while.

Those delay effects were complicated by the natural delays in the system. Without revealing the game we used, I will say that the dynamics of the game included natural delays between actions and results that complicated decision making.

Some noted this seemed analogous to the situation OPEC finds itself in. They rely on mutual agreement to limit production as a way to manage prices. If anyone in OPEC breaks that agreement, the system can collapse. OPEC's problems are complicated by uncertain demand and uncertain prices, factors that had no analogy in our game.

Math skills created another success factor, which some may find surprising. A subset of the players rather immediatedly began developing quite a useful understanding of their system based on a mathematical model they developed. Once others saw that their results were accurate, everyone became driven by the data. Without some in the group being able to pull that off, they would likely not have succeeded.

Interestingly, trust and math worked together. At one point, the analyst team made a numerical error and then made an especial effort to communicate that they had made that error to others so that the others would be able to differentiate that error from a breaking of the trust relationship. Apologies were key. Information and the lack of information thus played a key role in the group's success. Even then, it took time for the others to regain their trust in the analysts' team.

Playing into this was the lack of external shareholders. Everyone on the teams had a serious take in the workings of the game; no one was in it just for the "money." Similarly, there were no new entrants into the field who might have upset the cartel relationship they had crafted.

I then asked them what they'd advise people in the real world.

Collaboration was the first clear answer. Work together across groups to align goals and actions.

They then said, "knowledge is power." After a bit of reflection and revision, they revised that to "timely, transferrable, actionable knowledge is power."

They felt it was important for everyone to be clear on a vision.

They would encourage people to watch their egos and to be visibly trustworthy.

At one point, in an attempt to test the strength of their commitment (okay, as an attempt to derail their commitment), I as facilitator announced I was the government and was giving them something they really didn't want. (To be accurate, that idea came from Anne Murray Allen, the executive director of the program, who was running the simulation computer.) For a while, I felt as if I were about to experience the French Revolution, as some rather emotionally argued for standing up to government and refusing my help, a bit of resistance I wasn't accepting.

As a result, their last bit of advice was to "Don't trust the wisdom of government, of the private sector, ... of either." In other words, test the data and the reasoning yourselves instead of blindly accepting what others say is good for you.

This was an intense and very exciting two-day workshop. I think those in the class learned a lot; I know I learned as they taught themselves and me (and now perhaps you) how to make sustainability work.

Perhaps I'll see some of you there next year.

Labels: , , , , , , ,

Sunday, April 05, 2009

The (un)Sustainable Commentator on growth

Just to keep the question series on growth going, here's what Wayne Maceyka is saying on The (un)Sustainable Commentator.

Check out Wayne's blog, too, and his extensive list of links in the right-hand column.

Labels: , , , , ,

Monday, February 23, 2009

Cool tool

If you like making sense of (or with) numbers and use Linux, check out Qalculate!. The screenshots give you an idea of its power and ease of use.

Labels: , ,

Tuesday, February 17, 2009

Good graphs

Doing graphs well is important for communicating information (you do use graphs, don't you?). Rafe Donahue has published Fundamental Statistical Concepts in Presenting Data: Principles for Constructing Better Graphics. I think it's well worth our time to read and heed.

Thanks to Andrew Gelman for the tip.

Labels: , ,

Thursday, January 08, 2009

R in the news

If you're nervous about applying free software in your work, see what the New York Times had to say about R, the free statistics system. Thanks to Andrew Gelman for pointing out that article.

I'll highlight another free tool in my next posting.

Labels: , , , , , ,

Wednesday, December 31, 2008

Management Improvement Carnival: Annual Edition

John Hunter of Curious Cat has asked me to participate in the annual edition of the Management Improvement Carnival. I'm humbled to be invited and glad to participate.

The first station has to be Tom Peters' blog. I don't agree with everything he says, but I do find that he makes me think. Any of you who manage something are people, too. That's why my first link goes to his Christmas 2008. I share his sentiments, if not his bully pulpit. While I'm mentioning his blog, I'll also mention Repeat!.

The next station is MetaSD, the home of Tom Fiddaman and his Four Legs and a Tail. It's a good reminder of the leverage points we can seek in the systems in which we work as originally drafted by Dana Meadows, and it offers his notion that they don't necessarily compose an ordered list. Perhaps more importantly, it's a reminder that we need a mindset change to be successful in the world we're entering. Speaking of mindsets, Cynthia McEwen and John Schmidt of Avastone Consulting have published Leadership and the Corporate Sustainability Challenge: Mindsets in Action Report. While not a blog, that report does speak to mindset changes. Speaking of Tom Fiddaman, he has also posted My Bathtub is Nonlinear, an excellent reminder of the importance of grounding our assumptions in real data.

Times are tough, economically, and that's why I pick Paul Graham's Why to Start a Startup in a Bad Economy as the third stop. Don't do anything foolish, but don't think that the news from Wall Street necessarily predetermines your fate is the message, but he says it better than I. While I'm visiting non-conventional management sites, I'll stop at Elana Centor's Note to HR Folks: Hiring Over-Qualified People Is A Smart Strategy because you will need to hire again someday, if not today.

Speaking of saying things better, one of a manager's jobs is conveying information, and much of that information comes in the form of numbers and graphs. We do our organizations, our people, and ourselves a favor when we display such information clearly so others can make sense of it well. That's why the fourth stop in this carnival is at Andrew Gelman's An improved time-series graph instead of that notorious "spiraling down the drain" spiderweb. Follow the links, too, to see his earlier commentary. I'm a fan of Edward Tufte's approach to communicating information, and I'm a fan of the second graph in that posting. If you have to add drama, I like the third graph much better than the first, but I still think the second is the best of the three.

As important as data and statistics are, I'm reminded by xkcd's Decline that not everything we do, not even everything we do as managers, is best served by quantification and purely logical analysis. That brings me to Andrew Taylor's Not aloof and detached, but deeply, deeply human, a link to a Benjamin Zander TED presentation that, for me, brings together presentation skill and leadership in the service of his passion, music.

Finally, I'll take a view of another system we may not think of much, one that we very much need to be working well and one that may offer opportunities for some of us: food. Marilyn Holt's A Locavore Manifesto by Michael Pollan is a great education and reminder; click on the title of her post to get to the manifesto.

You may have thought I'd post about IT issues, about process improvement, or about systems or statistical analysis of management work. Those are indeed important, and I don't want to neglect them.

Yet I've found it helpful to start thinking at a high, systemic level to make sure I'm considering the important issues and to help me determine where I need more detailed information. While this, like most summaries of blog postings, can't claim to be as organized and logical as a book, I think it covers issues we need to concern ourselves about in business. From how we deal with people to the mindsets we bring to our work, from how to work in a tough economy to how to convey information, this covers a broad range. I closed with food systems because I wonder if we may be entering a period where the major systems we need for business—food, energy, the atmosphere and the overall environment—can no longer be safely taken for granted. That's why I think the desire and ability to view our challenges through a systems lens is particularly important as we enter 2009.

I'll conclude with with Tom Asacker's Nine Predictions for 2009, thanks to Tom Peters' Must Reading.

Follow-up:

To find the rest of the Management Improvement Carnival, check out these links:

Labels: , , , , , , , ,

Thursday, December 04, 2008

Sustainable Energy without the hot air

You may have noticed it's sometimes hard to get good data on issues of current importance. We read and hear adjectives, but we too rarely hear numbers. When we do, they're often presented in ways that are not conducive to clear understanding. I've written about that from time to time, for grounding decisions in good data seems to be a fundamentally important skill.

I've also written about the environment, for I think we do and will face challenges of the sort our ancestors never had to address (There: more descriptive phrases! Relief is on the way.).

Today Andrew Gelman pointed to David MacKay's free book Sustainable Energy -- without the hot air as an example of a book that brings data to the fore of the discussion about sustainable energy. In general, he likes the way the data is portrayed, although he doesn't attempt to vet the book for its content. While I haven't yet double-checked any of the numbers, I have begun to read the book, and I find the data clearly, cogently, and interestingly put (quite a change from William Farr's advice to statisticians). I like that he seems to use a significant number of clear time series ("behavior over time") graphs, and the time horizons are long enough to see useful patterns developing. So far, it appears as if this work will help me put events in perspective; I'll be curious to see what I learn and what reactions I have as I finish the book.

While I'm reading it, I encourage you to get a copy, too, and see what you think. Perhaps we'll all learn something important, both about living on the planet successfully and about presenting data effectively.

Labels: , , , ,

Thursday, November 13, 2008

Grounding beliefs in data

We all have beliefs about the way parts of the world work. Data is sometimes hard to come by; when we can find it, it's sometimes in a form that's hard to use.

Tom Fiddaman published State CO2 Emisssions from Fossil Fuel Combustion on many eyes. Click through to the interactive version, and see if you find any surprises.

One set of lessons: it appears that industrial emissions have improved between 1990 and 2005 in about half the states of the USA. Emissions from electric power generation have increased in all but four states (all in the Northeast), and emissions from transportation have increased in all states but Kansas.

It would be interesting to see these results expressed in per capita terms, too.

While we're talking about data, see his experiments with his kids on how fast a bathtub drains. Before you do, though, what do you think a time series graph of the water left in a bathtub as it drains would look like? Then read the results.

To me, the bathtub experiment sounds a bit like Seth Robert's self-experimentation, which leads to the obvious thought that we can all be amateur scientists, if we want.

Labels: , , ,

Wednesday, August 20, 2008

Prediction, system dynamics, and Future-Fusion

Recently, I made the claim that we're better off focusing on adapting to the present than predicting the future. I've made similar claims in the past, too. I've even given one example in which predictions serve a useful purpose.

That's all a bit simplistic, of course. Even system dynamicists could be said to predict the future in a way: we show behavior over time we feel is more likely to occur (although we may warn people away from point predictions based on a behavior over time graph). In other words, I might suggest that your current policies could produce a boom and bust effect in your business, but I wouldn't want you to draw the conclusion that your business will grow another 172.3% by June 15, 2009 before taking a tumble that afternoon.

Because we all would like to know the future, I've experimented with blending system dynamics and Bayesian analysis to quantify the probability of a particular behavior pattern, for example. Of course, that probability is conditioned on both the historical data and the model being correct, which is a loophole big enough for a good-sized locomotive to run through: models are always incorrect. Still, I think this approach may give more useful insight in certain cases.

Now Kshanti Greene of Stottler Henke Assocates, Inc. has shown me a Bayesian tool they've developed called Future-Fusion, and I've been exploring it a bit. They are using Bayesian networks and the power of groups to get a better handle on what the future holds. Much as Data360 looks at the past, Future-Fusion attempts to look at the future. As of this writing, they've created four test areas which you can explore: the 2008 US presidential election, the Iraq war, corporate strategy, and energy. Try it out: learn how to use the system, see current predictions, and add your own (I think you only have to create a free account if you want to add your own predictions). Perhaps you'll learn something, and perhaps they will, too.

Kshanti has pointed out a recent addition to Future-Fusion that may intrigue some of you: time. They've enhanced their technology to allow limited dynamic execution of a network model, which begins to narrow the gap between Bayesian networks and system dynamics from the Bayesian network side, much as what I've tried has narrowed it from the system dynamics side. To try that out, go to the energy model, select a prediction (e.g., "Reduced SUV sales"), click "view graph," note the numbers, and then click "Next Time Step."

I think this is all still experimental in many ways, but it's a good opportunity to learn a bit about this technology by trying it out on real-life issues. I'll be curious what you discover.

Labels: , , , , , , , , , , , ,

Thursday, December 27, 2007

Top postings of 2007

In the last 12 months (to be precise, from last December 28, the day after the Top postings of 2006 entry through December 26, 2007), you have chosen ten top postings on Making Sense With Facilitated Systems as ranked by unique pageviews in Google Analytics.

As I noted last year, there are potential statistical problems with this list. Those who read my blog every day using the main URL don't get counted; both last year's and this year's tallies were made from those who landed on specific URLs as reported by Google Analytics (but excluding visits I may have made). That may be okay; those who linked to specific pages may have cared more about them. Recent entries have a more difficult hurdle, as they haven't been around as long to be viewed. The dates don't quite line up with the calendar year, although I suspect that makes little difference in the results. If you know of a better way, let me know.





  1. For some time now, I've been using an open source simulator for my system dynamics work because it seems to help me think more effectively. That doesn't mean I've given up on commercial tools; I still use iThink for creating interactive environments, and I will be teaching IMT 586 at the University of Washington using Vensim PLE (and I may be using it in professional applications, as well). Last April, I combined my interest in the arts with my interest in this new approach to system dynamics in a public article about marketing program for symphony orchestras. You selected TAFTO 2007, the pointer to that article, as number ten on the list.


  2. I've written several articles about data and numbers. Making more sense with numbers part 3 offered an easy process to plot data you receive in email or reports.


  3. The words we use can be vitally important in helping us think productively about key business, organizational, and social challenges. In A systems language for business, number eight on the list, I described one team's evolution towards a better language for discussing business issues, thanks to a course they took from me in system dynamics modeling and simulation.


  4. Good data helps us ground our thinking in reality. Still more on data, a pointer to several online sources of data, captured the number seven spot.


  5. Growth can create problems (witness any of the bubbles that have occurred over history), but where are good examples of successful companies that intentionally don't grow? Number four on the list is Small Giants: the American Mittelstand?, pointing to a book that answers that question.


  6. Sometimes old technology still has utility; sometimes it still attracts interest. At number five, Technology comes full circle, a description of my continuing use of a slide rule in my work, certainly fits that description. For those who are interested, it points to a source for new slide rules.


  7. When I first started work as an engineer, PERT charts were done using mainframe computers or hand-drawn charts. Today, project management has become a profession with a certification process, and automated tools with graphical user interfaces have long since replaced tables of numbers and dates. Your sixth-most-popular entry was Critical chains: a decade later, my revisiting of Eliyahu Goldratt's critical chain theory that linked to Tom von Alten's revisiting of his views on the approach.


  8. Productivity is obviously important to you. Your third most popular posting of the year was a surprise to me: If you can say it, it's done, an entry about the array programming language J.


  9. Barry Richmond has a deserved place as an educator and thinker on system dynamics and systems thinking. I posted a link to an article he wrote about systems thinking and followed up with "Scientific thinking" the modern way, a differing view on the application of modern scientific thinking in system dynamics. That was your second favorite posting from 2007.


  10. The 2007 posting you viewed the most was the series Making musical sense by email, showcasing a conversation between music critic, composer, author, professor, and consultant Greg Sandow and me that used a system dynamics model to explore the aging of audiences for symphony orchestra concerts in the USA. Now I'm curious: was its popularity because of the topic (music), the approach (a somewhat novel approach to using system dynamics), or the fact it was a real conversation between two people? Let me know.


All of those postings were made in 2007. It wouldn't be fair to finish this list without noting that some postings from prior years did rank higher than some of these. Here's the all-time top ten list of postings from Making Sense With Facilitated Systems as measured by your viewings in the last twelve months:



  1. TAFTO 2007 (2007)


  2. Making more sense with numbers part 3 (2007)


  3. A systems language for business (2007)


  4. Still more on data (2007)


  5. Small Giants: the American Mittelstand? (2007)


  6. Technology comes full circle (2007)


  7. System Dynamics for Cheapskates (November 2006)


  8. Critical chains: a decade later (2007)


  9. If you can say it, it's done (2007)


  10. "Scientific thinking" the modern way (2007)


  11. Making musical sense by email (2007)


  12. System dynamics with MCSim (November 2006)


  13. In praise of the lazy employee (April 2005)


  14. System dynamics and program evaluation (June 2005)


  15. Making sense with numbers (November 2006)


That list includes the top ten postings written in 2007 plus the five entries written in prior years that were at least as popular as the top ten 2007 postings.

As 2007 draws to a close, I want to thank you who read Making Sense With Facilitated Systems and to invite you to continue with me in 2008. If you have suggestions or feedback for this blog, contact me.

I would be honored to be of service to you or your organization in 2008. If you're trying to make sense of tough business or organizational challenges, curious how I might be able to help, or just want to talk about some of the issues you face or that I write about, get in touch.

Labels: , , , , , , , , , ,

Wednesday, November 28, 2007

Making more sense with numbers, part 4

In the spirit of helping us all make better sense of data we read, I encourage you to read Mark Liberman's Thou shalt not report odds ratios in his Language Log if you write about data. If you read reports containing data (including the newspaper), read it, too, to help decipher what you read.

It's a somewhat long article, but you'll probably get the message by the end of the first example. (There is a possibly useful pointer to odds ratios and risk ratios on Wikipedia at the end of the article.) If you want another view on the same subject, see Odds ratios should be avoided when events are common, a letter by Douglas Altman, Jonathon Deeks, and David Sackett in BMJ. For an opposing view, see Stephen Senn's response.

If you're not writing for a highly technical audience and making it clear (perhaps through context) what you mean, I agree with the first and second articles.

Thanks to Jeremy Miles for the pointer.

Those curious about the title of this posting can read part 3 and find earlier parts.

Labels: , ,

Wednesday, November 21, 2007

You've got to see this graph!

One of the ways we make sense of situations is in how we portray data. I'm a fan of carefully crafted graphics, often trying to follow the lead Edward Tufte sets in his books and workshops.

That's why you have to see this graph on Statistical Modeling, Causal Inference, and Social Science. It ... ah, Phil says it better than I could; go take a look.

When you come back, note that the message is not to copy this design into the next five graphs you do (or at least the ones you show) but to have the courage to show the data in creative ways, breaking a few rules along the way if that helps to convey your information with clarity and integrity.

Labels: , ,

Monday, September 17, 2007

A better way to show data?

We all know that measured data comes with some uncertainty. Perhaps it's measurement error; perhaps it's sampling error. We even expect to see it mentioned explicitly in political polls, but I rarely see it published in business and financial reports. There are likely many reasons for that omission, only one of which is the difficulty of presenting the uncertainty concisely and informatively.

Thomas Louis and Scott Zeger recently published Effective Communication of Standard Errors and Confidence Intervals with a proposed approach to indicating such uncertainties. On the one hand, I like it. It makes a nice, neat display of two, three, or five numbers that can describe a statistic nicely, and it's relatively easy to understand and to incorporate into your reports (the authors give three lines of LaTeX code you can use directly). On the other hand, as Andrew Gelman notes, graphs are still better (thanks to Andrew for the pointer).

What's a person to do? I have some suggestions:



  • If you've got data in tables, strongly consider figuring out a graphical approach that conveys your information clearly and effectively instead of using a table, as easy as that might seem. In addition to the ideas in the paper Andrew references, consider boxplots among the potential candidates.

    If you don't have time to create useful graphics, consider whether your audience has the time to make sense of your tables. There are usually more of them than there are of you; taking 10 extra minutes to save 20 other people 2 minutes each sounds like a good trade-off (plug in your own numbers).

    Of course, if you're doing a balance sheet or income statement, you probably need the numbers at least once, although graphics may still help to convey your message.

  • If you've got isolated bits of data that you're using in flowing text, consider using Louis and Zeger's approach.

    It's easy to do in LaTeX and pretty easy in OpenOffice.org's Write. It seems a bit harder in Word (I have Word 2000) because I don't see a way to have subscripts on subscripts, but you can select a smaller font on some numbers.

    Unless and until this becomes a standard idiom, you'll probably need an explanatory note somewhere in your report.

  • Consider sparklines as a way to convey graphical data (time series graphs or histograms) in flowing text. Sparklines can be generated with a number of different approaches for a number of different document formats; if you'd rather, you can generate them online.



Perhaps the real answer is that we now have yet another way to portray data, from which we can pick and choose to fit our current needs.

Labels: , ,

Thursday, August 30, 2007

Visualizing data

Tuesday, August 14, 2007

Making more sense with numbers, part 4

Now that you've got an easy way to capture numbers out of emails and documents, how do you get numbers back into emails?

Graphs are great, but perhaps you don't want to use attachments? Check out Gnuplot's dumb terminal mode as a way to create plain text graphics. If you keep it simple, you can convey decent graphical information with plain text (as long as your recipients use a non-proportional font in their email client for plain text emails—a very good idea, anyway). I tested this approach in a public discussion and found some liked it and some didn't.

Perhaps you really do want to include a table of numbers or numbers and words. If you're working in J, it's pretty straightforward to create the table you want and then use J's clipfmt and wdclipwrite verbs to create something you can simply paste into your email or other document.

If you're using J, you can create your (text or other) graphics in Gnuplot, if you prefer, or you can create them in J directly.

Incidentally, this note and its predecessor have addressed specific cases of the more general problem of getting data into and out of J, a problem I think lots of newcomers to J discover early on. It's easy to do powerful calculations in J, but manually transcribing the data from another window into J or from J into another document loses all the benefits. The J Wiki has a page called Interfaces that might help. I've found the Text Files page quite helpful in getting data out of plain text files. Any statisticians reading this might find the interface to R useful.

Labels: , , ,

Tuesday, August 07, 2007

Making more sense with numbers, part 3

One of the early mantras one hears in statistics is "Plot the data." When I first heard it, it was followed by "by hand"; I suspect that part gets elided these days. Still, the advice is good. It's often easier to make sense of a list of numbers if you can visualize them.

Most of the time, that takes time we don't have. When we get an email or a report with a table of numbers, we know that plotting the numbers means grabbing a piece of graph paper (does your office supply cabinet even stock graph paper anymore?) or opening up your favorite spreadsheet, copying numbers, and drawing a graph. I rarely take the time.

Last week, I got yet another email with a table of numbers showing how something had changed over time. I was curious, so I wrote a short J script (now edited into a one line script) to turn the clipboard into data and another to plot the data.

Voilá! Now I had an easy and quick way to grab and plot data. I tried grabbing data out of an OpenOffice.org Writer document, and it worked, too. Grabbing data out of a Writer table was almost as good; my script lost the shape of the table, but that's easy to fix.

What's more, when you've got it in J, you can also apply various J statistical routines to the data, or you can pass it to R for more advanced statistical processing.

Yet another simple productivity tool, yet another reason to learn J as a tool for thinking and doing, yet another way to make sense with numbers.

I don't really care if you use J or some other tool; just pay appropriate attention to the data you handle. I just happen to think J is a powerful tool for this task (and for many other tasks). If you're learning J, check out the J lab called "An Introductory Course in J" by Henry Rich (thanks to Kip Murray of the University of Houston for pointing that out recently on the J Programming forum. Kip notes that Henry's lab covers a lot of territory very clearly but with a steep learning curve. If you are just seeing J for the first time, check out the J Primer.).

Interested readers might also be interested in tables2graphs.com and Using Graphs Instead of Tables.

So, if you have a table in email that looks like


Year Amount
2000 150
2001 200
2002 250
2003 225
2004 260
2005 254


and you'd like to graph it, one J program is


require 'format misc files plot'
sd=: > @: (". each ) @: |: @: clipunfmt @: wdclipread


Just copy the numbers, and type


plot ;/ sd''


to see your graph. I'll let you figure out how to add options and how to deal with multi-column data tables (it's easy).

Why is this part 3? Because there already has been a first and a second making sense with numbers, of course.

Labels: , , ,

Thursday, July 26, 2007

If you can say it, it's done

Even in this day and age, computing is a problem. How many of you us take the time to do some of the calculations mentioned here when faced with business or economic data, and how many of you us just read the analyst's summary and take the analyst's advice?

To some degree, that's because it takes time and effort to double-check such work, and that only gets worse if the subject is complex. It's also because the tools we have aren't always set up to help us do such things on the fly, and we're often on the fly (or in meetings, which can be as challenging).

That's one reason I've encouraged some of you who are interested to learn alternative approaches.

At least one APLer, Randy MacDonnell, has written about APL, "If you can say it, it's done." The same is true, of course, about J, its descendant. I had occasion recently to write a program to calculate whether a certain Monte Carlo simulation was done. I found a quotation by Andrew Gelman describing the Gelman - Rubin statistic:

For any given parameter, R-hat is the estimated posterior variance of the parameter, based on the mixture of all the simulated sequences, divided by the average of the variances within each sequence.


That looked easy enough, so I just wrote it down:


R=: var @: , % mean @: var


In English, that's "the variance of the entire set of data" (var @: ,)
"divided by" (%) "the mean of the variance of each data sequence" (mean @: var).

"If you can say it, it's done."

And you thought this was a blog about business, not programming, right? You were right. While J is a language that can be used by programmers, it's also a language that can be used by you and me to express quantitative ideas more powerfully and concisely than a spreadsheet. If you're ever interested in numerical answers from a spreadsheet, you could be interested in J. Perhaps, for some of you, it's worth downloading and trying out. Much as in learning a foreign (human) language, you won't be able to do much at first, but, eventually, you might be surprised what you can do. In a way, it's as much about thinking than about computing, and yet you can process some pretty large data sets with pretty concise "programs," too.

Thanks to Randy and Andrew for the quotations. For those of you interested in the Gelman-Rubin statistic, Andrew has pointed me to two papers giving more information: his Inference from Iterative Simulation Using Multiple Sequences with Donald Rubin and his General Methods for Monitoring Convergence of Iterative Simulations with Steve Brooks.

Labels: , , , ,

Monday, July 02, 2007

You have to pay attention to the data, too

I've written about data in the past. Now Wade Schuette has put me onto a blog by Stephanie Pearl-McPhee called The way things are that is required reading, both because it's a great reminder that sometimes you really do have to pay attention to the data and because it's a good source of a chuckle, too.


Yes, I know I haven't finished the series on decision making. I will do that, and I'll create a table of contents page when I'm done so you can find all the postings from one spot.

Labels:

Friday, June 22, 2007

Debunking myths with data

Normally I don't recommend video blogs, because I know people's time is scarce (or I presume it is; mine is), and it's easier to control one's time by reading than by watching.

This morning, I found Hans Rosling's Debunking Myths about the World. It's a video of his talk at TED showing Trendalyzer's use in helping us think more productively about the world. It's worth its twenty minutes, both to learn a bit about the world and to learn a bit about another way to look at data.

Now look at Gapminder to learn more on your own.

Labels: ,

Friday, June 01, 2007

Pie charts: the exception that proves the rule

Pie charts: don't use them. That's been my motto, and it's a (non-) feature of one of the graphics applications I use. Now Masanao at Statistical Modeling, Causal Inference, and Social Science posts the Color of Flags, possibly an interesting use for such a tool. See also Information Extraction from Different Data Representation Forms on a CRT: Charts and Tables by Janice M. Engberg and F. Layne Wallace.

Labels: , ,

Monday, May 21, 2007

A leisurely snapshot of the USA

How do we in the USA relax? How has that changed over the past few decades? Normally I try to write from a more global perspective, but today's link specifically refers to the USA. Perhaps those of you outside the USA will find it helpful (or amusing) to learn and ponder a bit more about us. Perhaps some of you will comment here, leaving similar information about the culture and nation in which you live.

David Touve and Steven Tepper of the Curb Center for Art, Enterprise and Public Policy at Vanderbilt University have put together "Leisure in America: Searching for the forest amongst the trees." It may seem out of date in some cases (it talks about MySpace and IM but doesn't mention Twitter; then again, it was published in April 2007 :-), and it may lack a bit of statistical rigor (I don't know if differences it cites are statistically significant), but it seems interesting if sometimes paradoxical (which may be an apt description of us as a culture).

If that's who we are, what does it mean for you and your enterprise (in all senses of the word), no matter the field?

Thanks to Andrew Taylor and The Artful Manager for the link and for more information he gives about the related conference.

Labels: , , ,

Thursday, May 17, 2007

Are you good with data?

Do you pay attention to data? That's important when we work with organizations and when we look for patterns in data.

Check out the amazing colour changing card trick.

Do you pay attention to data? Really?

Thanks to Nancy White for the tip!

Labels:

Wednesday, April 04, 2007

Exploratory data analysis

Most of you (well, I presume most of you; perhaps someday I should do a poll) are busy enough with management and business activities so that you don't have time to become a statistician (or system dynamicist or soft systems expert or facilitator or ...). You rely on others, whether internal or external to your organization, to do the technical work in such areas.

Nonetheless, you see data all the time, and you may have need of simple tools to help make sense of what you're seeing, either before you can get to your statistician or to double-check what you're hearing from a statistician to see if it makes sense.

In the 1970s, statistician John Tukey assembled a body of techniques into a methodology he called "exploratory data analysis" (EDA), and some of its tools may be of use to any of us. While there is software available to perform these techniques, many of them can be done with paper and pencil, on the spot. That's when it likely becomes most useful for those of you managing operations or organizations.

Even that may be too much for the time some of you have. You may need something you can do without even paper and pencil, something you can do to evaluate the results you're hearing or reading.

For example, let's say you're presented the results of doing things two different ways, and the speaker or writer claims that one approach is obviously better than the other (or asks us which is the better approach). Tukey developed a so-called pocket test that you can likely do in your head. It's so easy to describe that the abstract gives almost the entire process.

Labels: , ,

Thursday, March 29, 2007

Skepticism, numbers, and making sense

Don't always trust what you read in print or hear in meetings. Mike Kellermann posted The answer is -3.9% (plus or minus 17.4%) on the Social Science Statistics Blog. Note that he had to dig deeper to understand the real situation. While he was writing about public information, the same guideline applies to internal business communications.

Labels: , ,

Monday, March 12, 2007

Data sources

I've written about the care with which we should attend to data and the care with which we should interpret it.

Now Aleks Jakulin has posted a classic short quotation ("Statistics") that paints a vivid picture of the importance of attending to data sources.

Where do you get your data?

Labels:

Monday, February 26, 2007

Data: fundamental premises

About twenty years ago, I created a slide I called "Fundamental Premises" in reaction to what I saw at the time as an excessively eager approach to data collection in a particular manufacturing environment.

I rediscovered it recently. Here are its five points:


  1. One should only take data for a specific purpose; the quantity of data necessary for maintaining historical perspective and a report card is far less than we presently take;
  2. The value of the flow of information is epsilon less than the value of the flow of products, and the same attention should be paid to making both flows simple, easy to understand, and defect-free;
  3. Nothing beats talking to people for basic communications, but limited data helps to expand the capability of people to analyze a situation;
  4. Data collection is almost never free, although the costs are often well hidden;
  5. Manual data collection may be more valuable than computerized data collection (much as we have learned that manual, Kanban-oriented shop floor control may be preferred to computerized systems); for one thing, it is arguably easier to verify the accuracy of many kinds of data when manually collected and plotted.


While the original was an unnumbered list, I've added numbers to make commenting easier.

How might I modify those premises today?

Seemingly contrary to what I wrote in points 3 and 5, I do understand that automated data collection can be valuable, and I do understand that data helps us avoid subjective biases (even as talking with people helps us avoid missing important insights). I've described elsewhere a case in which people on a production line failed to report the most common problem they saw; when the problem was pointed out to them because it was evident in recorded data, they said, "Oh, that's not a problem; it happens all the time." Triangulation is important, as is paying serious attention to the data, not just letting a computer draw a few conclusions and accepting those conclusions without further thought.

I still stand by point 2 and the related point 4. Most of the organizational systems in which we work can be understood as feedback systems, and information feedback is a key determinant of system behavior in such systems. I would suggest that system dynamics can be a tool to help determine what data is important. That data feedback necessary to make the system dynamics model work well may be just the data needed to make the real system work well.

I'd largely stand by point 1, too. It's tempting to squirrel away all the data we can take and then have it just in case we need it. The problem comes in point 4; it costs time and money to ensure we're getting the data we think we're getting. If we don't need particular data, we're tempted to not worry about its accuracy as much. Then, later, if we do decide we need it, it may be hard to determine what it really means or how accurate it really is, and we may make bad and costly decisions by relying on data we only think we have.

What are your fundamental premises regarding data and its use in organizations?

Labels: , ,

Thursday, February 01, 2007

Reference behavior patterns and Data360

One of the aspects of defining a problem that plays out over time is to capture a "reference behavior pattern" (RBP). That's simply a graph of key variables over time. Drawing the graph of the RBP may help you spot patterns instead of just seeing events. It may remind you to consider a longer time horizon than you had originally conceived, making it easier to detect any pattern that might be present. It "draws a line in the sand," ensuring that your problem-solving activity is focused on a specific problem with specific data, thus keeping you from wandering off course. It's a fundamental start to system dynamics modeling.

Where do you get data for your RBP? If you think it involves public data, check out Data360, one of several new sites dedicated to aggregating and delivering up data on demand.

Tom Paper
generously gave me a tour of Data360 yesterday. As he described Data360, it's a tool we can use to create balanced scorecards of, well, anything we're interested in.

Data360 has lots of potential power. Here's a quick example to show how you might use it.

Let's say you are interested in the size of the labor force in the U.S.A.


  1. Go to Data360.
  2. Click on Data Graphs.
  3. Click on Civilian Labor Force United States.
  4. Observe the graph. Note the source of the data, the date it was last updated, and other pertinent information.

    Note that I've embedded the graph in this Web page. It's the actual graph, so it will always have the latest data that Data360 contains. I could have captured a static image of the graph, instead. (Update: after publishing it, I discovered I had to change the height and width to make it fit. Tom told me I'd have to do that.)
  5. If this provides what you want, you can click "PDF" to generate a PDF file of the page, or you can click "Generate CSV" to download the data in a format you can open in a spreadsheet.
  6. You can also click "View Data Set" to see just the data and its attributes. If you wanted to capture the data without downloading a CSV file and you have the Firefox Table2Clipboard extension, you can hold down Ctrl and then select the data with the left mouse button. Select Edit > Copy Table Elements. Then paste the result into your favorite word processor, spreadsheet, or editor.


How do you know if the data meets your needs? There are all the obvious questions, including:


  • Does it show the variables you want to see? Check the definitions of the variables used to make sure they match your expectations.
  • If it's a time series chart, does the time horizon match your needs? A too-short time series can turn a recurrent pattern into what seems to be a precipitious and on-going decline or increase.
  • Is the data credible? Check the source, and perhaps even check important data against the original source. Data360's Links page might help you, as might links on my Web site. As Tom said, calling on his CFO background: "Trust, but verify."


While you're there, check out Graph Groups to find graphs organized in groups to make them easier to find and Print Groups to see complete reports. Browse the data, too; you might learn something; I have.

Labels:

Thursday, January 25, 2007

Still more on data

Last month, I blogged about Swivel. Now there are Many Eyes, courtesy of IBM, and Data 360.

Aleks Jakulin suggests that Many Eyes may be the more polished of the genre to date. I might find it worthwhile to keep my eye on all of them.

Labels: ,