Thursday, December 27, 2007

Top postings of 2007

In the last 12 months (to be precise, from last December 28, the day after the Top postings of 2006 entry through December 26, 2007), you have chosen ten top postings on Making Sense With Facilitated Systems as ranked by unique pageviews in Google Analytics.

As I noted last year, there are potential statistical problems with this list. Those who read my blog every day using the main URL don't get counted; both last year's and this year's tallies were made from those who landed on specific URLs as reported by Google Analytics (but excluding visits I may have made). That may be okay; those who linked to specific pages may have cared more about them. Recent entries have a more difficult hurdle, as they haven't been around as long to be viewed. The dates don't quite line up with the calendar year, although I suspect that makes little difference in the results. If you know of a better way, let me know.





  1. For some time now, I've been using an open source simulator for my system dynamics work because it seems to help me think more effectively. That doesn't mean I've given up on commercial tools; I still use iThink for creating interactive environments, and I will be teaching IMT 586 at the University of Washington using Vensim PLE (and I may be using it in professional applications, as well). Last April, I combined my interest in the arts with my interest in this new approach to system dynamics in a public article about marketing program for symphony orchestras. You selected TAFTO 2007, the pointer to that article, as number ten on the list.


  2. I've written several articles about data and numbers. Making more sense with numbers part 3 offered an easy process to plot data you receive in email or reports.


  3. The words we use can be vitally important in helping us think productively about key business, organizational, and social challenges. In A systems language for business, number eight on the list, I described one team's evolution towards a better language for discussing business issues, thanks to a course they took from me in system dynamics modeling and simulation.


  4. Good data helps us ground our thinking in reality. Still more on data, a pointer to several online sources of data, captured the number seven spot.


  5. Growth can create problems (witness any of the bubbles that have occurred over history), but where are good examples of successful companies that intentionally don't grow? Number four on the list is Small Giants: the American Mittelstand?, pointing to a book that answers that question.


  6. Sometimes old technology still has utility; sometimes it still attracts interest. At number five, Technology comes full circle, a description of my continuing use of a slide rule in my work, certainly fits that description. For those who are interested, it points to a source for new slide rules.


  7. When I first started work as an engineer, PERT charts were done using mainframe computers or hand-drawn charts. Today, project management has become a profession with a certification process, and automated tools with graphical user interfaces have long since replaced tables of numbers and dates. Your sixth-most-popular entry was Critical chains: a decade later, my revisiting of Eliyahu Goldratt's critical chain theory that linked to Tom von Alten's revisiting of his views on the approach.


  8. Productivity is obviously important to you. Your third most popular posting of the year was a surprise to me: If you can say it, it's done, an entry about the array programming language J.


  9. Barry Richmond has a deserved place as an educator and thinker on system dynamics and systems thinking. I posted a link to an article he wrote about systems thinking and followed up with "Scientific thinking" the modern way, a differing view on the application of modern scientific thinking in system dynamics. That was your second favorite posting from 2007.


  10. The 2007 posting you viewed the most was the series Making musical sense by email, showcasing a conversation between music critic, composer, author, professor, and consultant Greg Sandow and me that used a system dynamics model to explore the aging of audiences for symphony orchestra concerts in the USA. Now I'm curious: was its popularity because of the topic (music), the approach (a somewhat novel approach to using system dynamics), or the fact it was a real conversation between two people? Let me know.


All of those postings were made in 2007. It wouldn't be fair to finish this list without noting that some postings from prior years did rank higher than some of these. Here's the all-time top ten list of postings from Making Sense With Facilitated Systems as measured by your viewings in the last twelve months:



  1. TAFTO 2007 (2007)


  2. Making more sense with numbers part 3 (2007)


  3. A systems language for business (2007)


  4. Still more on data (2007)


  5. Small Giants: the American Mittelstand? (2007)


  6. Technology comes full circle (2007)


  7. System Dynamics for Cheapskates (November 2006)


  8. Critical chains: a decade later (2007)


  9. If you can say it, it's done (2007)


  10. "Scientific thinking" the modern way (2007)


  11. Making musical sense by email (2007)


  12. System dynamics with MCSim (November 2006)


  13. In praise of the lazy employee (April 2005)


  14. System dynamics and program evaluation (June 2005)


  15. Making sense with numbers (November 2006)


That list includes the top ten postings written in 2007 plus the five entries written in prior years that were at least as popular as the top ten 2007 postings.

As 2007 draws to a close, I want to thank you who read Making Sense With Facilitated Systems and to invite you to continue with me in 2008. If you have suggestions or feedback for this blog, contact me.

I would be honored to be of service to you or your organization in 2008. If you're trying to make sense of tough business or organizational challenges, curious how I might be able to help, or just want to talk about some of the issues you face or that I write about, get in touch.

Labels: , , , , , , , , , ,

Wednesday, November 28, 2007

Making more sense with numbers, part 4

In the spirit of helping us all make better sense of data we read, I encourage you to read Mark Liberman's Thou shalt not report odds ratios in his Language Log if you write about data. If you read reports containing data (including the newspaper), read it, too, to help decipher what you read.

It's a somewhat long article, but you'll probably get the message by the end of the first example. (There is a possibly useful pointer to odds ratios and risk ratios on Wikipedia at the end of the article.) If you want another view on the same subject, see Odds ratios should be avoided when events are common, a letter by Douglas Altman, Jonathon Deeks, and David Sackett in BMJ. For an opposing view, see Stephen Senn's response.

If you're not writing for a highly technical audience and making it clear (perhaps through context) what you mean, I agree with the first and second articles.

Thanks to Jeremy Miles for the pointer.

Those curious about the title of this posting can read part 3 and find earlier parts.

Labels: , ,

Wednesday, November 21, 2007

You've got to see this graph!

One of the ways we make sense of situations is in how we portray data. I'm a fan of carefully crafted graphics, often trying to follow the lead Edward Tufte sets in his books and workshops.

That's why you have to see this graph on Statistical Modeling, Causal Inference, and Social Science. It ... ah, Phil says it better than I could; go take a look.

When you come back, note that the message is not to copy this design into the next five graphs you do (or at least the ones you show) but to have the courage to show the data in creative ways, breaking a few rules along the way if that helps to convey your information with clarity and integrity.

Labels: , ,

Monday, September 17, 2007

A better way to show data?

We all know that measured data comes with some uncertainty. Perhaps it's measurement error; perhaps it's sampling error. We even expect to see it mentioned explicitly in political polls, but I rarely see it published in business and financial reports. There are likely many reasons for that omission, only one of which is the difficulty of presenting the uncertainty concisely and informatively.

Thomas Louis and Scott Zeger recently published Effective Communication of Standard Errors and Confidence Intervals with a proposed approach to indicating such uncertainties. On the one hand, I like it. It makes a nice, neat display of two, three, or five numbers that can describe a statistic nicely, and it's relatively easy to understand and to incorporate into your reports (the authors give three lines of LaTeX code you can use directly). On the other hand, as Andrew Gelman notes, graphs are still better (thanks to Andrew for the pointer).

What's a person to do? I have some suggestions:



  • If you've got data in tables, strongly consider figuring out a graphical approach that conveys your information clearly and effectively instead of using a table, as easy as that might seem. In addition to the ideas in the paper Andrew references, consider boxplots among the potential candidates.

    If you don't have time to create useful graphics, consider whether your audience has the time to make sense of your tables. There are usually more of them than there are of you; taking 10 extra minutes to save 20 other people 2 minutes each sounds like a good trade-off (plug in your own numbers).

    Of course, if you're doing a balance sheet or income statement, you probably need the numbers at least once, although graphics may still help to convey your message.

  • If you've got isolated bits of data that you're using in flowing text, consider using Louis and Zeger's approach.

    It's easy to do in LaTeX and pretty easy in OpenOffice.org's Write. It seems a bit harder in Word (I have Word 2000) because I don't see a way to have subscripts on subscripts, but you can select a smaller font on some numbers.

    Unless and until this becomes a standard idiom, you'll probably need an explanatory note somewhere in your report.

  • Consider sparklines as a way to convey graphical data (time series graphs or histograms) in flowing text. Sparklines can be generated with a number of different approaches for a number of different document formats; if you'd rather, you can generate them online.



Perhaps the real answer is that we now have yet another way to portray data, from which we can pick and choose to fit our current needs.

Labels: , ,

Thursday, August 30, 2007

Visualizing data

Tuesday, August 14, 2007

Making more sense with numbers, part 4

Now that you've got an easy way to capture numbers out of emails and documents, how do you get numbers back into emails?

Graphs are great, but perhaps you don't want to use attachments? Check out Gnuplot's dumb terminal mode as a way to create plain text graphics. If you keep it simple, you can convey decent graphical information with plain text (as long as your recipients use a non-proportional font in their email client for plain text emails—a very good idea, anyway). I tested this approach in a public discussion and found some liked it and some didn't.

Perhaps you really do want to include a table of numbers or numbers and words. If you're working in J, it's pretty straightforward to create the table you want and then use J's clipfmt and wdclipwrite verbs to create something you can simply paste into your email or other document.

If you're using J, you can create your (text or other) graphics in Gnuplot, if you prefer, or you can create them in J directly.

Incidentally, this note and its predecessor have addressed specific cases of the more general problem of getting data into and out of J, a problem I think lots of newcomers to J discover early on. It's easy to do powerful calculations in J, but manually transcribing the data from another window into J or from J into another document loses all the benefits. The J Wiki has a page called Interfaces that might help. I've found the Text Files page quite helpful in getting data out of plain text files. Any statisticians reading this might find the interface to R useful.

Labels: , , ,

Tuesday, August 07, 2007

Making more sense with numbers, part 3

One of the early mantras one hears in statistics is "Plot the data." When I first heard it, it was followed by "by hand"; I suspect that part gets elided these days. Still, the advice is good. It's often easier to make sense of a list of numbers if you can visualize them.

Most of the time, that takes time we don't have. When we get an email or a report with a table of numbers, we know that plotting the numbers means grabbing a piece of graph paper (does your office supply cabinet even stock graph paper anymore?) or opening up your favorite spreadsheet, copying numbers, and drawing a graph. I rarely take the time.

Last week, I got yet another email with a table of numbers showing how something had changed over time. I was curious, so I wrote a short J script (now edited into a one line script) to turn the clipboard into data and another to plot the data.

Voilá! Now I had an easy and quick way to grab and plot data. I tried grabbing data out of an OpenOffice.org Writer document, and it worked, too. Grabbing data out of a Writer table was almost as good; my script lost the shape of the table, but that's easy to fix.

What's more, when you've got it in J, you can also apply various J statistical routines to the data, or you can pass it to R for more advanced statistical processing.

Yet another simple productivity tool, yet another reason to learn J as a tool for thinking and doing, yet another way to make sense with numbers.

I don't really care if you use J or some other tool; just pay appropriate attention to the data you handle. I just happen to think J is a powerful tool for this task (and for many other tasks). If you're learning J, check out the J lab called "An Introductory Course in J" by Henry Rich (thanks to Kip Murray of the University of Houston for pointing that out recently on the J Programming forum. Kip notes that Henry's lab covers a lot of territory very clearly but with a steep learning curve. If you are just seeing J for the first time, check out the J Primer.).

Interested readers might also be interested in tables2graphs.com and Using Graphs Instead of Tables.

So, if you have a table in email that looks like


Year Amount
2000 150
2001 200
2002 250
2003 225
2004 260
2005 254


and you'd like to graph it, one J program is


require 'format misc files plot'
sd=: > @: (". each ) @: |: @: clipunfmt @: wdclipread


Just copy the numbers, and type


plot ;/ sd''


to see your graph. I'll let you figure out how to add options and how to deal with multi-column data tables (it's easy).

Why is this part 3? Because there already has been a first and a second making sense with numbers, of course.

Labels: , , ,

Thursday, July 26, 2007

If you can say it, it's done

Even in this day and age, computing is a problem. How many of you us take the time to do some of the calculations mentioned here when faced with business or economic data, and how many of you us just read the analyst's summary and take the analyst's advice?

To some degree, that's because it takes time and effort to double-check such work, and that only gets worse if the subject is complex. It's also because the tools we have aren't always set up to help us do such things on the fly, and we're often on the fly (or in meetings, which can be as challenging).

That's one reason I've encouraged some of you who are interested to learn alternative approaches.

At least one APLer, Randy MacDonnell, has written about APL, "If you can say it, it's done." The same is true, of course, about J, its descendant. I had occasion recently to write a program to calculate whether a certain Monte Carlo simulation was done. I found a quotation by Andrew Gelman describing the Gelman - Rubin statistic:

For any given parameter, R-hat is the estimated posterior variance of the parameter, based on the mixture of all the simulated sequences, divided by the average of the variances within each sequence.


That looked easy enough, so I just wrote it down:


R=: var @: , % mean @: var


In English, that's "the variance of the entire set of data" (var @: ,)
"divided by" (%) "the mean of the variance of each data sequence" (mean @: var).

"If you can say it, it's done."

And you thought this was a blog about business, not programming, right? You were right. While J is a language that can be used by programmers, it's also a language that can be used by you and me to express quantitative ideas more powerfully and concisely than a spreadsheet. If you're ever interested in numerical answers from a spreadsheet, you could be interested in J. Perhaps, for some of you, it's worth downloading and trying out. Much as in learning a foreign (human) language, you won't be able to do much at first, but, eventually, you might be surprised what you can do. In a way, it's as much about thinking than about computing, and yet you can process some pretty large data sets with pretty concise "programs," too.

Thanks to Randy and Andrew for the quotations. For those of you interested in the Gelman-Rubin statistic, Andrew has pointed me to two papers giving more information: his Inference from Iterative Simulation Using Multiple Sequences with Donald Rubin and his General Methods for Monitoring Convergence of Iterative Simulations with Steve Brooks.

Labels: , , , ,

Monday, July 02, 2007

You have to pay attention to the data, too

I've written about data in the past. Now Wade Schuette has put me onto a blog by Stephanie Pearl-McPhee called The way things are that is required reading, both because it's a great reminder that sometimes you really do have to pay attention to the data and because it's a good source of a chuckle, too.


Yes, I know I haven't finished the series on decision making. I will do that, and I'll create a table of contents page when I'm done so you can find all the postings from one spot.

Labels:

Friday, June 22, 2007

Debunking myths with data

Normally I don't recommend video blogs, because I know people's time is scarce (or I presume it is; mine is), and it's easier to control one's time by reading than by watching.

This morning, I found Hans Rosling's Debunking Myths about the World. It's a video of his talk at TED showing Trendalyzer's use in helping us think more productively about the world. It's worth its twenty minutes, both to learn a bit about the world and to learn a bit about another way to look at data.

Now look at Gapminder to learn more on your own.

Labels: ,

Friday, June 01, 2007

Pie charts: the exception that proves the rule

Pie charts: don't use them. That's been my motto, and it's a (non-) feature of one of the graphics applications I use. Now Masanao at Statistical Modeling, Causal Inference, and Social Science posts the Color of Flags, possibly an interesting use for such a tool. See also Information Extraction from Different Data Representation Forms on a CRT: Charts and Tables by Janice M. Engberg and F. Layne Wallace.

Labels: , ,

Monday, May 21, 2007

A leisurely snapshot of the USA

How do we in the USA relax? How has that changed over the past few decades? Normally I try to write from a more global perspective, but today's link specifically refers to the USA. Perhaps those of you outside the USA will find it helpful (or amusing) to learn and ponder a bit more about us. Perhaps some of you will comment here, leaving similar information about the culture and nation in which you live.

David Touve and Steven Tepper of the Curb Center for Art, Enterprise and Public Policy at Vanderbilt University have put together "Leisure in America: Searching for the forest amongst the trees." It may seem out of date in some cases (it talks about MySpace and IM but doesn't mention Twitter; then again, it was published in April 2007 :-), and it may lack a bit of statistical rigor (I don't know if differences it cites are statistically significant), but it seems interesting if sometimes paradoxical (which may be an apt description of us as a culture).

If that's who we are, what does it mean for you and your enterprise (in all senses of the word), no matter the field?

Thanks to Andrew Taylor and The Artful Manager for the link and for more information he gives about the related conference.

Labels: , , ,

Thursday, May 17, 2007

Are you good with data?

Do you pay attention to data? That's important when we work with organizations and when we look for patterns in data.

Check out the amazing colour changing card trick.

Do you pay attention to data? Really?

Thanks to Nancy White for the tip!

Labels:

Wednesday, April 04, 2007

Exploratory data analysis

Most of you (well, I presume most of you; perhaps someday I should do a poll) are busy enough with management and business activities so that you don't have time to become a statistician (or system dynamicist or soft systems expert or facilitator or ...). You rely on others, whether internal or external to your organization, to do the technical work in such areas.

Nonetheless, you see data all the time, and you may have need of simple tools to help make sense of what you're seeing, either before you can get to your statistician or to double-check what you're hearing from a statistician to see if it makes sense.

In the 1970s, statistician John Tukey assembled a body of techniques into a methodology he called "exploratory data analysis" (EDA), and some of its tools may be of use to any of us. While there is software available to perform these techniques, many of them can be done with paper and pencil, on the spot. That's when it likely becomes most useful for those of you managing operations or organizations.

Even that may be too much for the time some of you have. You may need something you can do without even paper and pencil, something you can do to evaluate the results you're hearing or reading.

For example, let's say you're presented the results of doing things two different ways, and the speaker or writer claims that one approach is obviously better than the other (or asks us which is the better approach). Tukey developed a so-called pocket test that you can likely do in your head. It's so easy to describe that the abstract gives almost the entire process.

Labels: , ,

Thursday, March 29, 2007

Skepticism, numbers, and making sense

Don't always trust what you read in print or hear in meetings. Mike Kellermann posted The answer is -3.9% (plus or minus 17.4%) on the Social Science Statistics Blog. Note that he had to dig deeper to understand the real situation. While he was writing about public information, the same guideline applies to internal business communications.

Labels: , ,

Monday, March 12, 2007

Data sources

I've written about the care with which we should attend to data and the care with which we should interpret it.

Now Aleks Jakulin has posted a classic short quotation ("Statistics") that paints a vivid picture of the importance of attending to data sources.

Where do you get your data?

Labels:

Monday, February 26, 2007

Data: fundamental premises

About twenty years ago, I created a slide I called "Fundamental Premises" in reaction to what I saw at the time as an excessively eager approach to data collection in a particular manufacturing environment.

I rediscovered it recently. Here are its five points:


  1. One should only take data for a specific purpose; the quantity of data necessary for maintaining historical perspective and a report card is far less than we presently take;
  2. The value of the flow of information is epsilon less than the value of the flow of products, and the same attention should be paid to making both flows simple, easy to understand, and defect-free;
  3. Nothing beats talking to people for basic communications, but limited data helps to expand the capability of people to analyze a situation;
  4. Data collection is almost never free, although the costs are often well hidden;
  5. Manual data collection may be more valuable than computerized data collection (much as we have learned that manual, Kanban-oriented shop floor control may be preferred to computerized systems); for one thing, it is arguably easier to verify the accuracy of many kinds of data when manually collected and plotted.


While the original was an unnumbered list, I've added numbers to make commenting easier.

How might I modify those premises today?

Seemingly contrary to what I wrote in points 3 and 5, I do understand that automated data collection can be valuable, and I do understand that data helps us avoid subjective biases (even as talking with people helps us avoid missing important insights). I've described elsewhere a case in which people on a production line failed to report the most common problem they saw; when the problem was pointed out to them because it was evident in recorded data, they said, "Oh, that's not a problem; it happens all the time." Triangulation is important, as is paying serious attention to the data, not just letting a computer draw a few conclusions and accepting those conclusions without further thought.

I still stand by point 2 and the related point 4. Most of the organizational systems in which we work can be understood as feedback systems, and information feedback is a key determinant of system behavior in such systems. I would suggest that system dynamics can be a tool to help determine what data is important. That data feedback necessary to make the system dynamics model work well may be just the data needed to make the real system work well.

I'd largely stand by point 1, too. It's tempting to squirrel away all the data we can take and then have it just in case we need it. The problem comes in point 4; it costs time and money to ensure we're getting the data we think we're getting. If we don't need particular data, we're tempted to not worry about its accuracy as much. Then, later, if we do decide we need it, it may be hard to determine what it really means or how accurate it really is, and we may make bad and costly decisions by relying on data we only think we have.

What are your fundamental premises regarding data and its use in organizations?

Labels: , ,

Thursday, February 01, 2007

Reference behavior patterns and Data360

One of the aspects of defining a problem that plays out over time is to capture a "reference behavior pattern" (RBP). That's simply a graph of key variables over time. Drawing the graph of the RBP may help you spot patterns instead of just seeing events. It may remind you to consider a longer time horizon than you had originally conceived, making it easier to detect any pattern that might be present. It "draws a line in the sand," ensuring that your problem-solving activity is focused on a specific problem with specific data, thus keeping you from wandering off course. It's a fundamental start to system dynamics modeling.

Where do you get data for your RBP? If you think it involves public data, check out Data360, one of several new sites dedicated to aggregating and delivering up data on demand.

Tom Paper
generously gave me a tour of Data360 yesterday. As he described Data360, it's a tool we can use to create balanced scorecards of, well, anything we're interested in.

Data360 has lots of potential power. Here's a quick example to show how you might use it.

Let's say you are interested in the size of the labor force in the U.S.A.


  1. Go to Data360.
  2. Click on Data Graphs.
  3. Click on Civilian Labor Force United States.
  4. Observe the graph. Note the source of the data, the date it was last updated, and other pertinent information.

    Note that I've embedded the graph in this Web page. It's the actual graph, so it will always have the latest data that Data360 contains. I could have captured a static image of the graph, instead. (Update: after publishing it, I discovered I had to change the height and width to make it fit. Tom told me I'd have to do that.)
  5. If this provides what you want, you can click "PDF" to generate a PDF file of the page, or you can click "Generate CSV" to download the data in a format you can open in a spreadsheet.
  6. You can also click "View Data Set" to see just the data and its attributes. If you wanted to capture the data without downloading a CSV file and you have the Firefox Table2Clipboard extension, you can hold down Ctrl and then select the data with the left mouse button. Select Edit > Copy Table Elements. Then paste the result into your favorite word processor, spreadsheet, or editor.


How do you know if the data meets your needs? There are all the obvious questions, including:


  • Does it show the variables you want to see? Check the definitions of the variables used to make sure they match your expectations.
  • If it's a time series chart, does the time horizon match your needs? A too-short time series can turn a recurrent pattern into what seems to be a precipitious and on-going decline or increase.
  • Is the data credible? Check the source, and perhaps even check important data against the original source. Data360's Links page might help you, as might links on my Web site. As Tom said, calling on his CFO background: "Trust, but verify."


While you're there, check out Graph Groups to find graphs organized in groups to make them easier to find and Print Groups to see complete reports. Browse the data, too; you might learn something; I have.

Labels:

Thursday, January 25, 2007

Still more on data

Last month, I blogged about Swivel. Now there are Many Eyes, courtesy of IBM, and Data 360.

Aleks Jakulin suggests that Many Eyes may be the more polished of the genre to date. I might find it worthwhile to keep my eye on all of them.

Labels: ,