Thursday, June 25, 2009

Making more sense with numbers, part 8

I'm not a professional statistician; I am a professional who uses statistics in the course of my work. Increasingly I'm drawn to Bayesian approaches. Various people have asked me what Bayesian statistics is; when I was asked for the elevator speech version recently, I was stumped. I'll try to make up for it here.

Statistical problems have three parts: the setup, the calculation, and the presentation of results. By my understanding, Bayesian and classical (frequentist) statistics differ in all three.

In the setup, Bayesian statistics starts with the development of a probabilistic model and a set of prior probabilities for the parameters of interest. Classical statistics seems to start with the development of a null hypothesis (what if there is no effect from whatever intervention being considered) and an alternative hypothesis. There's a difference in how one considers information one has before the data collection starts. Some have taken Bayesian approaches to task for the sometimes subjective form of those prior probabilities, but others have pointed out that classical approaches also have their subjective moments in assuming that the particular nature of the classical assumptions apply in a particular situation. Some point out that one can pick prior probabilities in a way that doesn't rely on subjective assessments; those tend to be the weakly informative priors you can read about. I'm intrigued by this part of the difference, but it's not the telling difference for me.

In the calculation, the classical statistical approach relies on selecting the appropriate test to decide if one should accept or reject the null hypothesis or to calculate confidence intervals for parameters of interest. As some have pointed out, this is not always an easy task, and the tests are not always easily matched to complex problems. With unique problems, one may have to modify the problem to match the method or invent new methods to match the problem.

The Bayesian approach relies on basic probability models, which makes it easier to develop an approach that meets the specific problem at hand. This is a telling difference for me.

There is a problem. Except for the simpler cases (for example, see the original original Making sense with numbers), it's often hard to carry out the integration involved in making the calculations. Markov chain Monte Carlo (MCMC) approaches make that much more approachable, but they're not things one carries out on the back of an envelope.

Finally, there's the presentation of the data. This, too, is telling for me. While the classical approach gets tied up in explaining precisely what it means to reject the null hypothesis or what a confidence interval means, the Bayesian result means exactly what most of us likely think when we hear a statistical result: it states the probability of a particular event we care about happening.

I'm still looking for a short, easy-to-read but complete elevator speech from a statistician on the topic that's consistent with some of what Andrew Gelman writes (I think he has some excellent writing on the subject, but I'm not sure I've found anything that fits the elevator speech model). In the meantime, Bayesian Statistical Inference for Psychological Research may help some begin to understand, even as it's somewhat old chronologically. Some might enjoy Why we (usually) don't have to worry about multiple comparisons. shows a simple but powerful application of Bayes Theorem, although it's rather more simple than what one would recognize today as Bayesian analysis.

Objections to Bayesian statistics actually does contain an elevator speech about Bayesian inference, even if it is a bit mathematically concise: "'Bayesian inference' represents statistical estimation as the conditional distribution of parameters and unobserved data, given observed data."

It's a bit longer than an elevator speech, but Dr. David Lucy of Lancaster University does have a short introduction to Bayesian methods that may help; it's part of his CFAS415a course materials.

If you've got a great but simple introduction that can explain the difference between Bayesian and classical inference well, please add it to the comments here! Thanks.

Labels: ,

Monday, June 15, 2009

Causality

When we evaluate something, we typically are trying to understand and make claims about causal relationships. When we create a system dynamics model, we are mapping and modeling causal relationships. But how do we tell what relationships are causal and which are correlational?

Thanks to a recent pointer on the evaltalk mailing list, here's Sir Austin Bradford Hill's “The Environment and Disease: Association or Causation?” Hill gives nine considerations to ponder.

For a rather shorter read, see xkcd's take on causality. Be sure to see the alt tag.

Labels: , , , , ,

Tuesday, June 09, 2009

Creating sustainability in complex ecosystems

I recently had the privilege of teaching a course in system dynamic for Willamette University's Sustainable Enterprise certificate program. The course lasted two days, with a follow-up two-hour web seminar. We focused on qualitative system dynamics, but we treated it at a somewhat more rigorous level than many such courses, I think.

I'm writing because of one particular lesson I learned—we all learned. Early in the course, we used a simulation game to help people have a common, shared experience of interacting in a challenging system environment.

As with many such games, the expected result is that people fail in making the system work. Typically, the debrief is used to help people understand the ways of thinking that led them into trouble and to prepare them for the material that's to come.

Unexpectedly, this class managed their challenges quite sustainably. While their skill wrecked the planned flow of that part of the session, I was really pleased to see their skill in action. We spent some time talking about what made them successful and how that might carry over to real-world situations. Their insights were useful enough that I wanted to share them (with the students' permission) with a larger audience: you.

I first asked what made them succeed in the game and what provided the most challenges.

Goals were the first. While the game tells them the goal they should have, they rapidly realized that focusing on the stated goals would lead to ruin, and so they decided to set a much longer-term goal.

Communications was the second factor. After the first round, they began to spend most of their time huddled in the center of the room, talking animatedly through their decision-making processes instead of working in isolated teams.

They noted that delays provided a key challenge. As they worked to establish trust in the social system they had set up, they were both trusting other teams' commitments and verifying that they were indeed living up to their commitments. That takes time: commitments made today may not show up for quite a while.

Those delay effects were complicated by the natural delays in the system. Without revealing the game we used, I will say that the dynamics of the game included natural delays between actions and results that complicated decision making.

Some noted this seemed analogous to the situation OPEC finds itself in. They rely on mutual agreement to limit production as a way to manage prices. If anyone in OPEC breaks that agreement, the system can collapse. OPEC's problems are complicated by uncertain demand and uncertain prices, factors that had no analogy in our game.

Math skills created another success factor, which some may find surprising. A subset of the players rather immediatedly began developing quite a useful understanding of their system based on a mathematical model they developed. Once others saw that their results were accurate, everyone became driven by the data. Without some in the group being able to pull that off, they would likely not have succeeded.

Interestingly, trust and math worked together. At one point, the analyst team made a numerical error and then made an especial effort to communicate that they had made that error to others so that the others would be able to differentiate that error from a breaking of the trust relationship. Apologies were key. Information and the lack of information thus played a key role in the group's success. Even then, it took time for the others to regain their trust in the analysts' team.

Playing into this was the lack of external shareholders. Everyone on the teams had a serious take in the workings of the game; no one was in it just for the "money." Similarly, there were no new entrants into the field who might have upset the cartel relationship they had crafted.

I then asked them what they'd advise people in the real world.

Collaboration was the first clear answer. Work together across groups to align goals and actions.

They then said, "knowledge is power." After a bit of reflection and revision, they revised that to "timely, transferrable, actionable knowledge is power."

They felt it was important for everyone to be clear on a vision.

They would encourage people to watch their egos and to be visibly trustworthy.

At one point, in an attempt to test the strength of their commitment (okay, as an attempt to derail their commitment), I as facilitator announced I was the government and was giving them something they really didn't want. (To be accurate, that idea came from Anne Murray Allen, the executive director of the program, who was running the simulation computer.) For a while, I felt as if I were about to experience the French Revolution, as some rather emotionally argued for standing up to government and refusing my help, a bit of resistance I wasn't accepting.

As a result, their last bit of advice was to "Don't trust the wisdom of government, of the private sector, ... of either." In other words, test the data and the reasoning yourselves instead of blindly accepting what others say is good for you.

This was an intense and very exciting two-day workshop. I think those in the class learned a lot; I know I learned as they taught themselves and me (and now perhaps you) how to make sustainability work.

Perhaps I'll see some of you there next year.

Labels: , , , , , , ,

Monday, February 23, 2009

Cool tool

If you like making sense of (or with) numbers and use Linux, check out Qalculate!. The screenshots give you an idea of its power and ease of use.

Labels: , ,

Tuesday, February 17, 2009

Good graphs

Doing graphs well is important for communicating information (you do use graphs, don't you?). Rafe Donahue has published Fundamental Statistical Concepts in Presenting Data: Principles for Constructing Better Graphics. I think it's well worth our time to read and heed.

Thanks to Andrew Gelman for the tip.

Labels: , ,

Thursday, January 08, 2009

R in the news

If you're nervous about applying free software in your work, see what the New York Times had to say about R, the free statistics system. Thanks to Andrew Gelman for pointing out that article.

I'll highlight another free tool in my next posting.

Labels: , , , , , ,

Wednesday, December 31, 2008

Management Improvement Carnival: Annual Edition

John Hunter of Curious Cat has asked me to participate in the annual edition of the Management Improvement Carnival. I'm humbled to be invited and glad to participate.

The first station has to be Tom Peters' blog. I don't agree with everything he says, but I do find that he makes me think. Any of you who manage something are people, too. That's why my first link goes to his Christmas 2008. I share his sentiments, if not his bully pulpit. While I'm mentioning his blog, I'll also mention Repeat!.

The next station is MetaSD, the home of Tom Fiddaman and his Four Legs and a Tail. It's a good reminder of the leverage points we can seek in the systems in which we work as originally drafted by Dana Meadows, and it offers his notion that they don't necessarily compose an ordered list. Perhaps more importantly, it's a reminder that we need a mindset change to be successful in the world we're entering. Speaking of mindsets, Cynthia McEwen and John Schmidt of Avastone Consulting have published Leadership and the Corporate Sustainability Challenge: Mindsets in Action Report. While not a blog, that report does speak to mindset changes. Speaking of Tom Fiddaman, he has also posted My Bathtub is Nonlinear, an excellent reminder of the importance of grounding our assumptions in real data.

Times are tough, economically, and that's why I pick Paul Graham's Why to Start a Startup in a Bad Economy as the third stop. Don't do anything foolish, but don't think that the news from Wall Street necessarily predetermines your fate is the message, but he says it better than I. While I'm visiting non-conventional management sites, I'll stop at Elana Centor's Note to HR Folks: Hiring Over-Qualified People Is A Smart Strategy because you will need to hire again someday, if not today.

Speaking of saying things better, one of a manager's jobs is conveying information, and much of that information comes in the form of numbers and graphs. We do our organizations, our people, and ourselves a favor when we display such information clearly so others can make sense of it well. That's why the fourth stop in this carnival is at Andrew Gelman's An improved time-series graph instead of that notorious "spiraling down the drain" spiderweb. Follow the links, too, to see his earlier commentary. I'm a fan of Edward Tufte's approach to communicating information, and I'm a fan of the second graph in that posting. If you have to add drama, I like the third graph much better than the first, but I still think the second is the best of the three.

As important as data and statistics are, I'm reminded by xkcd's Decline that not everything we do, not even everything we do as managers, is best served by quantification and purely logical analysis. That brings me to Andrew Taylor's Not aloof and detached, but deeply, deeply human, a link to a Benjamin Zander TED presentation that, for me, brings together presentation skill and leadership in the service of his passion, music.

Finally, I'll take a view of another system we may not think of much, one that we very much need to be working well and one that may offer opportunities for some of us: food. Marilyn Holt's A Locavore Manifesto by Michael Pollan is a great education and reminder; click on the title of her post to get to the manifesto.

You may have thought I'd post about IT issues, about process improvement, or about systems or statistical analysis of management work. Those are indeed important, and I don't want to neglect them.

Yet I've found it helpful to start thinking at a high, systemic level to make sure I'm considering the important issues and to help me determine where I need more detailed information. While this, like most summaries of blog postings, can't claim to be as organized and logical as a book, I think it covers issues we need to concern ourselves about in business. From how we deal with people to the mindsets we bring to our work, from how to work in a tough economy to how to convey information, this covers a broad range. I closed with food systems because I wonder if we may be entering a period where the major systems we need for business—food, energy, the atmosphere and the overall environment—can no longer be safely taken for granted. That's why I think the desire and ability to view our challenges through a systems lens is particularly important as we enter 2009.

I'll conclude with with Tom Asacker's Nine Predictions for 2009, thanks to Tom Peters' Must Reading.

Follow-up:

To find the rest of the Management Improvement Carnival, check out these links:

Labels: , , , , , , , ,

Thursday, December 04, 2008

Sustainable Energy without the hot air

You may have noticed it's sometimes hard to get good data on issues of current importance. We read and hear adjectives, but we too rarely hear numbers. When we do, they're often presented in ways that are not conducive to clear understanding. I've written about that from time to time, for grounding decisions in good data seems to be a fundamentally important skill.

I've also written about the environment, for I think we do and will face challenges of the sort our ancestors never had to address (There: more descriptive phrases! Relief is on the way.).

Today Andrew Gelman pointed to David MacKay's free book Sustainable Energy -- without the hot air as an example of a book that brings data to the fore of the discussion about sustainable energy. In general, he likes the way the data is portrayed, although he doesn't attempt to vet the book for its content. While I haven't yet double-checked any of the numbers, I have begun to read the book, and I find the data clearly, cogently, and interestingly put (quite a change from William Farr's advice to statisticians). I like that he seems to use a significant number of clear time series ("behavior over time") graphs, and the time horizons are long enough to see useful patterns developing. So far, it appears as if this work will help me put events in perspective; I'll be curious to see what I learn and what reactions I have as I finish the book.

While I'm reading it, I encourage you to get a copy, too, and see what you think. Perhaps we'll all learn something important, both about living on the planet successfully and about presenting data effectively.

Labels: , , , ,

Sunday, November 30, 2008

Making more sense with numbers, part 7

Tim O'Reilly tweeted about Florence Nightingale: The passionate statistician. While I'm not a fan of pie charts, Nightingale's work here is impressive, especially given the date. We can probably ignore William Farr's advice about statistical writing, though.

By the way, I got off on my numbering in this series, giving several postings the number 4. You can now find the real Making more sense with numbers, part 5 and Making more sense with numbers, part 6, as well as earlier postings in the series.

Labels: ,

Thursday, November 13, 2008

Grounding beliefs in data

We all have beliefs about the way parts of the world work. Data is sometimes hard to come by; when we can find it, it's sometimes in a form that's hard to use.

Tom Fiddaman published State CO2 Emisssions from Fossil Fuel Combustion on many eyes. Click through to the interactive version, and see if you find any surprises.

One set of lessons: it appears that industrial emissions have improved between 1990 and 2005 in about half the states of the USA. Emissions from electric power generation have increased in all but four states (all in the Northeast), and emissions from transportation have increased in all states but Kansas.

It would be interesting to see these results expressed in per capita terms, too.

While we're talking about data, see his experiments with his kids on how fast a bathtub drains. Before you do, though, what do you think a time series graph of the water left in a bathtub as it drains would look like? Then read the results.

To me, the bathtub experiment sounds a bit like Seth Robert's self-experimentation, which leads to the obvious thought that we can all be amateur scientists, if we want.

Labels: , , ,

Wednesday, August 20, 2008

Prediction, system dynamics, and Future-Fusion

Recently, I made the claim that we're better off focusing on adapting to the present than predicting the future. I've made similar claims in the past, too. I've even given one example in which predictions serve a useful purpose.

That's all a bit simplistic, of course. Even system dynamicists could be said to predict the future in a way: we show behavior over time we feel is more likely to occur (although we may warn people away from point predictions based on a behavior over time graph). In other words, I might suggest that your current policies could produce a boom and bust effect in your business, but I wouldn't want you to draw the conclusion that your business will grow another 172.3% by June 15, 2009 before taking a tumble that afternoon.

Because we all would like to know the future, I've experimented with blending system dynamics and Bayesian analysis to quantify the probability of a particular behavior pattern, for example. Of course, that probability is conditioned on both the historical data and the model being correct, which is a loophole big enough for a good-sized locomotive to run through: models are always incorrect. Still, I think this approach may give more useful insight in certain cases.

Now Kshanti Greene of Stottler Henke Assocates, Inc. has shown me a Bayesian tool they've developed called Future-Fusion, and I've been exploring it a bit. They are using Bayesian networks and the power of groups to get a better handle on what the future holds. Much as Data360 looks at the past, Future-Fusion attempts to look at the future. As of this writing, they've created four test areas which you can explore: the 2008 US presidential election, the Iraq war, corporate strategy, and energy. Try it out: learn how to use the system, see current predictions, and add your own (I think you only have to create a free account if you want to add your own predictions). Perhaps you'll learn something, and perhaps they will, too.

Kshanti has pointed out a recent addition to Future-Fusion that may intrigue some of you: time. They've enhanced their technology to allow limited dynamic execution of a network model, which begins to narrow the gap between Bayesian networks and system dynamics from the Bayesian network side, much as what I've tried has narrowed it from the system dynamics side. To try that out, go to the energy model, select a prediction (e.g., "Reduced SUV sales"), click "view graph," note the numbers, and then click "Next Time Step."

I think this is all still experimental in many ways, but it's a good opportunity to learn a bit about this technology by trying it out on real-life issues. I'll be curious what you discover.

Labels: , , , , , , , , , , , ,

Thursday, December 27, 2007

Top postings of 2007

In the last 12 months (to be precise, from last December 28, the day after the Top postings of 2006 entry through December 26, 2007), you have chosen ten top postings on Making Sense With Facilitated Systems as ranked by unique pageviews in Google Analytics.

As I noted last year, there are potential statistical problems with this list. Those who read my blog every day using the main URL don't get counted; both last year's and this year's tallies were made from those who landed on specific URLs as reported by Google Analytics (but excluding visits I may have made). That may be okay; those who linked to specific pages may have cared more about them. Recent entries have a more difficult hurdle, as they haven't been around as long to be viewed. The dates don't quite line up with the calendar year, although I suspect that makes little difference in the results. If you know of a better way, let me know.





  1. For some time now, I've been using an open source simulator for my system dynamics work because it seems to help me think more effectively. That doesn't mean I've given up on commercial tools; I still use iThink for creating interactive environments, and I will be teaching IMT 586 at the University of Washington using Vensim PLE (and I may be using it in professional applications, as well). Last April, I combined my interest in the arts with my interest in this new approach to system dynamics in a public article about marketing program for symphony orchestras. You selected TAFTO 2007, the pointer to that article, as number ten on the list.


  2. I've written several articles about data and numbers. Making more sense with numbers part 3 offered an easy process to plot data you receive in email or reports.


  3. The words we use can be vitally important in helping us think productively about key business, organizational, and social challenges. In A systems language for business, number eight on the list, I described one team's evolution towards a better language for discussing business issues, thanks to a course they took from me in system dynamics modeling and simulation.


  4. Good data helps us ground our thinking in reality. Still more on data, a pointer to several online sources of data, captured the number seven spot.


  5. Growth can create problems (witness any of the bubbles that have occurred over history), but where are good examples of successful companies that intentionally don't grow? Number four on the list is Small Giants: the American Mittelstand?, pointing to a book that answers that question.


  6. Sometimes old technology still has utility; sometimes it still attracts interest. At number five, Technology comes full circle, a description of my continuing use of a slide rule in my work, certainly fits that description. For those who are interested, it points to a source for new slide rules.


  7. When I first started work as an engineer, PERT charts were done using mainframe computers or hand-drawn charts. Today, project management has become a profession with a certification process, and automated tools with graphical user interfaces have long since replaced tables of numbers and dates. Your sixth-most-popular entry was Critical chains: a decade later, my revisiting of Eliyahu Goldratt's critical chain theory that linked to Tom von Alten's revisiting of his views on the approach.


  8. Productivity is obviously important to you. Your third most popular posting of the year was a surprise to me: If you can say it, it's done, an entry about the array programming language J.


  9. Barry Richmond has a deserved place as an educator and thinker on system dynamics and systems thinking. I posted a link to an article he wrote about systems thinking and followed up with "Scientific thinking" the modern way, a differing view on the application of modern scientific thinking in system dynamics. That was your second favorite posting from 2007.


  10. The 2007 posting you viewed the most was the series Making musical sense by email, showcasing a conversation between music critic, composer, author, professor, and consultant Greg Sandow and me that used a system dynamics model to explore the aging of audiences for symphony orchestra concerts in the USA. Now I'm curious: was its popularity because of the topic (music), the approach (a somewhat novel approach to using system dynamics), or the fact it was a real conversation between two people? Let me know.


All of those postings were made in 2007. It wouldn't be fair to finish this list without noting that some postings from prior years did rank higher than some of these. Here's the all-time top ten list of postings from Making Sense With Facilitated Systems as measured by your viewings in the last twelve months:



  1. TAFTO 2007 (2007)


  2. Making more sense with numbers part 3 (2007)


  3. A systems language for business (2007)


  4. Still more on data (2007)


  5. Small Giants: the American Mittelstand? (2007)


  6. Technology comes full circle (2007)


  7. System Dynamics for Cheapskates (November 2006)


  8. Critical chains: a decade later (2007)


  9. If you can say it, it's done (2007)


  10. "Scientific thinking" the modern way (2007)


  11. Making musical sense by email (2007)


  12. System dynamics with MCSim (November 2006)


  13. In praise of the lazy employee (April 2005)


  14. System dynamics and program evaluation (June 2005)


  15. Making sense with numbers (November 2006)


That list includes the top ten postings written in 2007 plus the five entries written in prior years that were at least as popular as the top ten 2007 postings.

As 2007 draws to a close, I want to thank you who read Making Sense With Facilitated Systems and to invite you to continue with me in 2008. If you have suggestions or feedback for this blog, contact me.

I would be honored to be of service to you or your organization in 2008. If you're trying to make sense of tough business or organizational challenges, curious how I might be able to help, or just want to talk about some of the issues you face or that I write about, get in touch.

Labels: , , , , , , , , , ,

Wednesday, November 28, 2007

Making more sense with numbers, part 4

In the spirit of helping us all make better sense of data we read, I encourage you to read Mark Liberman's Thou shalt not report odds ratios in his Language Log if you write about data. If you read reports containing data (including the newspaper), read it, too, to help decipher what you read.

It's a somewhat long article, but you'll probably get the message by the end of the first example. (There is a possibly useful pointer to odds ratios and risk ratios on Wikipedia at the end of the article.) If you want another view on the same subject, see Odds ratios should be avoided when events are common, a letter by Douglas Altman, Jonathon Deeks, and David Sackett in BMJ. For an opposing view, see Stephen Senn's response.

If you're not writing for a highly technical audience and making it clear (perhaps through context) what you mean, I agree with the first and second articles.

Thanks to Jeremy Miles for the pointer.

Those curious about the title of this posting can read part 3 and find earlier parts.

Labels: , ,

Wednesday, November 21, 2007

You've got to see this graph!

One of the ways we make sense of situations is in how we portray data. I'm a fan of carefully crafted graphics, often trying to follow the lead Edward Tufte sets in his books and workshops.

That's why you have to see this graph on Statistical Modeling, Causal Inference, and Social Science. It ... ah, Phil says it better than I could; go take a look.

When you come back, note that the message is not to copy this design into the next five graphs you do (or at least the ones you show) but to have the courage to show the data in creative ways, breaking a few rules along the way if that helps to convey your information with clarity and integrity.

Labels: , ,

Monday, September 17, 2007

A better way to show data?

We all know that measured data comes with some uncertainty. Perhaps it's measurement error; perhaps it's sampling error. We even expect to see it mentioned explicitly in political polls, but I rarely see it published in business and financial reports. There are likely many reasons for that omission, only one of which is the difficulty of presenting the uncertainty concisely and informatively.

Thomas Louis and Scott Zeger recently published Effective Communication of Standard Errors and Confidence Intervals with a proposed approach to indicating such uncertainties. On the one hand, I like it. It makes a nice, neat display of two, three, or five numbers that can describe a statistic nicely, and it's relatively easy to understand and to incorporate into your reports (the authors give three lines of LaTeX code you can use directly). On the other hand, as Andrew Gelman notes, graphs are still better (thanks to Andrew for the pointer).

What's a person to do? I have some suggestions:



  • If you've got data in tables, strongly consider figuring out a graphical approach that conveys your information clearly and effectively instead of using a table, as easy as that might seem. In addition to the ideas in the paper Andrew references, consider boxplots among the potential candidates.

    If you don't have time to create useful graphics, consider whether your audience has the time to make sense of your tables. There are usually more of them than there are of you; taking 10 extra minutes to save 20 other people 2 minutes each sounds like a good trade-off (plug in your own numbers).

    Of course, if you're doing a balance sheet or income statement, you probably need the numbers at least once, although graphics may still help to convey your message.

  • If you've got isolated bits of data that you're using in flowing text, consider using Louis and Zeger's approach.

    It's easy to do in LaTeX and pretty easy in OpenOffice.org's Write. It seems a bit harder in Word (I have Word 2000) because I don't see a way to have subscripts on subscripts, but you can select a smaller font on some numbers.

    Unless and until this becomes a standard idiom, you'll probably need an explanatory note somewhere in your report.

  • Consider sparklines as a way to convey graphical data (time series graphs or histograms) in flowing text. Sparklines can be generated with a number of different approaches for a number of different document formats; if you'd rather, you can generate them online.



Perhaps the real answer is that we now have yet another way to portray data, from which we can pick and choose to fit our current needs.

Labels: , ,

Thursday, August 30, 2007

Visualizing data

Tuesday, August 14, 2007

Making more sense with numbers, part 4

Now that you've got an easy way to capture numbers out of emails and documents, how do you get numbers back into emails?

Graphs are great, but perhaps you don't want to use attachments? Check out Gnuplot's dumb terminal mode as a way to create plain text graphics. If you keep it simple, you can convey decent graphical information with plain text (as long as your recipients use a non-proportional font in their email client for plain text emails—a very good idea, anyway). I tested this approach in a public discussion and found some liked it and some didn't.

Perhaps you really do want to include a table of numbers or numbers and words. If you're working in J, it's pretty straightforward to create the table you want and then use J's clipfmt and wdclipwrite verbs to create something you can simply paste into your email or other document.

If you're using J, you can create your (text or other) graphics in Gnuplot, if you prefer, or you can create them in J directly.

Incidentally, this note and its predecessor have addressed specific cases of the more general problem of getting data into and out of J, a problem I think lots of newcomers to J discover early on. It's easy to do powerful calculations in J, but manually transcribing the data from another window into J or from J into another document loses all the benefits. The J Wiki has a page called Interfaces that might help. I've found the Text Files page quite helpful in getting data out of plain text files. Any statisticians reading this might find the interface to R useful.

Labels: , , ,

Tuesday, August 07, 2007

Making more sense with numbers, part 3

One of the early mantras one hears in statistics is "Plot the data." When I first heard it, it was followed by "by hand"; I suspect that part gets elided these days. Still, the advice is good. It's often easier to make sense of a list of numbers if you can visualize them.

Most of the time, that takes time we don't have. When we get an email or a report with a table of numbers, we know that plotting the numbers means grabbing a piece of graph paper (does your office supply cabinet even stock graph paper anymore?) or opening up your favorite spreadsheet, copying numbers, and drawing a graph. I rarely take the time.

Last week, I got yet another email with a table of numbers showing how something had changed over time. I was curious, so I wrote a short J script (now edited into a one line script) to turn the clipboard into data and another to plot the data.

Voilá! Now I had an easy and quick way to grab and plot data. I tried grabbing data out of an OpenOffice.org Writer document, and it worked, too. Grabbing data out of a Writer table was almost as good; my script lost the shape of the table, but that's easy to fix.

What's more, when you've got it in J, you can also apply various J statistical routines to the data, or you can pass it to R for more advanced statistical processing.

Yet another simple productivity tool, yet another reason to learn J as a tool for thinking and doing, yet another way to make sense with numbers.

I don't really care if you use J or some other tool; just pay appropriate attention to the data you handle. I just happen to think J is a powerful tool for this task (and for many other tasks). If you're learning J, check out the J lab called "An Introductory Course in J" by Henry Rich (thanks to Kip Murray of the University of Houston for pointing that out recently on the J Programming forum. Kip notes that Henry's lab covers a lot of territory very clearly but with a steep learning curve. If you are just seeing J for the first time, check out the J Primer.).

Interested readers might also be interested in tables2graphs.com and Using Graphs Instead of Tables.

So, if you have a table in email that looks like


Year Amount
2000 150
2001 200
2002 250
2003 225
2004 260
2005 254


and you'd like to graph it, one J program is


require 'format misc files plot'
sd=: > @: (". each ) @: |: @: clipunfmt @: wdclipread


Just copy the numbers, and type


plot ;/ sd''


to see your graph. I'll let you figure out how to add options and how to deal with multi-column data tables (it's easy).

Why is this part 3? Because there already has been a first and a second making sense with numbers, of course.

Labels: , , ,

Thursday, July 26, 2007

If you can say it, it's done

Even in this day and age, computing is a problem. How many of you us take the time to do some of the calculations mentioned here when faced with business or economic data, and how many of you us just read the analyst's summary and take the analyst's advice?

To some degree, that's because it takes time and effort to double-check such work, and that only gets worse if the subject is complex. It's also because the tools we have aren't always set up to help us do such things on the fly, and we're often on the fly (or in meetings, which can be as challenging).

That's one reason I've encouraged some of you who are interested to learn alternative approaches.

At least one APLer, Randy MacDonnell, has written about APL, "If you can say it, it's done." The same is true, of course, about J, its descendant. I had occasion recently to write a program to calculate whether a certain Monte Carlo simulation was done. I found a quotation by Andrew Gelman describing the Gelman - Rubin statistic:

For any given parameter, R-hat is the estimated posterior variance of the parameter, based on the mixture of all the simulated sequences, divided by the average of the variances within each sequence.


That looked easy enough, so I just wrote it down:


R=: var @: , % mean @: var


In English, that's "the variance of the entire set of data" (var @: ,)
"divided by" (%) "the mean of the variance of each data sequence" (mean @: var).

"If you can say it, it's done."

And you thought this was a blog about business, not programming, right? You were right. While J is a language that can be used by programmers, it's also a language that can be used by you and me to express quantitative ideas more powerfully and concisely than a spreadsheet. If you're ever interested in numerical answers from a spreadsheet, you could be interested in J. Perhaps, for some of you, it's worth downloading and trying out. Much as in learning a foreign (human) language, you won't be able to do much at first, but, eventually, you might be surprised what you can do. In a way, it's as much about thinking than about computing, and yet you can process some pretty large data sets with pretty concise "programs," too.

Thanks to Randy and Andrew for the quotations. For those of you interested in the Gelman-Rubin statistic, Andrew has pointed me to two papers giving more information: his Inference from Iterative Simulation Using Multiple Sequences with Donald Rubin and his General Methods for Monitoring Convergence of Iterative Simulations with Steve Brooks.

Labels: , , , ,

Friday, July 20, 2007

Making more sense with numbers

Some time ago, I walked through an exercise to make sense of one sort of numerical claim we often encounter in our daily lives, whether in business reports or the daily news. Now Andrew Gelman has pointed to a presentation by Dick De Veaux called Math is Music – Stats is Literature, pointing out that thinking statistically is hard work.

Andrew recommended it for teachers of statistics; I'd suggest it for anyone interested in making sense of numbers. Even assimilating the lessons of those slides might help us remember to plot the data we've been given and then to look at the graphs, to think about assumptions that are being made, to think critically and be skeptical, and to make better decisions.

Incidentally, you can find links to tools to convert data tables you might see online to graphs in my comments to Andrew's tables2graphs.com.

Labels: ,

Friday, June 01, 2007

Pie charts: the exception that proves the rule

Pie charts: don't use them. That's been my motto, and it's a (non-) feature of one of the graphics applications I use. Now Masanao at Statistical Modeling, Causal Inference, and Social Science posts the Color of Flags, possibly an interesting use for such a tool. See also Information Extraction from Different Data Representation Forms on a CRT: Charts and Tables by Janice M. Engberg and F. Layne Wallace.

Labels: , ,

Wednesday, April 04, 2007

Exploratory data analysis

Most of you (well, I presume most of you; perhaps someday I should do a poll) are busy enough with management and business activities so that you don't have time to become a statistician (or system dynamicist or soft systems expert or facilitator or ...). You rely on others, whether internal or external to your organization, to do the technical work in such areas.

Nonetheless, you see data all the time, and you may have need of simple tools to help make sense of what you're seeing, either before you can get to your statistician or to double-check what you're hearing from a statistician to see if it makes sense.

In the 1970s, statistician John Tukey assembled a body of techniques into a methodology he called "exploratory data analysis" (EDA), and some of its tools may be of use to any of us. While there is software available to perform these techniques, many of them can be done with paper and pencil, on the spot. That's when it likely becomes most useful for those of you managing operations or organizations.

Even that may be too much for the time some of you have. You may need something you can do without even paper and pencil, something you can do to evaluate the results you're hearing or reading.

For example, let's say you're presented the results of doing things two different ways, and the speaker or writer claims that one approach is obviously better than the other (or asks us which is the better approach). Tukey developed a so-called pocket test that you can likely do in your head. It's so easy to describe that the abstract gives almost the entire process.

Labels: , ,

Thursday, March 29, 2007

Skepticism, numbers, and making sense

Don't always trust what you read in print or hear in meetings. Mike Kellermann posted The answer is -3.9% (plus or minus 17.4%) on the Social Science Statistics Blog. Note that he had to dig deeper to understand the real situation. While he was writing about public information, the same guideline applies to internal business communications.

Labels: , ,

Wednesday, February 14, 2007

R and some updated links

The free statistical package R has an apparent reputation in some quarters of being hard to use. That discussion arose recently on the evaltalk mailing list, which led to many good references.

One of the better ones I found for starting with R was Regression Using R on Jeremy Miles' site. In nine steps (the first—installing R—should take longer than the rest), he walks us through installing R and doing our first regression analysis from data in a file on our systems. It can give a lot of confidence to get your first results with real data in very few steps.

Jeremy also links to Bob Muenchen's R for SAS and SPSS Users, which seems great for, well, SAS and SPSS users, although it may be useful for some of the rest of us, too. Paul Johnson's Rtips page appears to have more advanced tips.

Speaking of links, I've updated the Facilitated Systems' Links page. I've added to the Data, Knowledge management, and Project management sections and cleaned up a few links that had changed or disappeared.

Labels: ,