Better sampling, more analyses, bigger laboratories?

http://www.adelaide.edu.au
By T P Hutchinson, Ctr for Automotive Safety Research, University of Adelaide
Tuesday, 08 February, 2005


I'm a statistician. I don't know a mass spectrometer from a resonating horoscope. But it seems to me that:

  • Many practitioners and researchers do not have an optimal balance between the various stages of their data collection;
  • A better balance would involve them making more use of the fancy bits of kit back in the laboratory; and
  • This should be of great interest to those who sell laboratory services or the fancy bits of kit.

The key point is that the user typically wants (or should want) some idea of the accuracy of the results - the accuracy of the sample in representing the relevant population, that is, not merely the accuracy of the fancy bits of kit themselves. I am not making the familiar plea that an accurate estimate of a mean requires a large sample size. That's often true and important, but it's not my point. Rather, I am saying that a larger sample size will enable certain components of variability to be estimated that are vital for answering certain questions.

Scenario

I have in mind a scenario with the following features:

  1. There is a population of objects (eg, rocks in a desert, used cars, droplets of blood, images). The user collects one of these at random.
  2. Actually, there are several populations (eg, rocks at different sites, used cars from different fleets). The question of interest concerns how these vary, one from another.
  3. The rock (or whatever) that has been collected is sliced and diced, and fed into a machine.
  4. Out pops a measurement (or perhaps a list of measurements of different properties).
  5. A statistician may or may not be consulted about what the measurement means.

Point 1 is utterly vital in interpreting the numbers, yet there are many fields where this is not thought about. The user typically wants to know how accurate the measurement is, but does not turn his or her mind to the question of what type of accuracy. The distinction here is between the accuracy of this number:

  1. In representing this rock; and
  2. In representing all the rocks of equal relevance, eg, those at a particular site.

(In another context, A may be a vehicle, and B the fleet.) The users will typically know A, from experience or from the instrument's specification. There may even be elaborate procedures in place to cross-check results from different laboratories. But they may not have thought about B. (Not until the statistician starts quizzing them.) And usually it is B, not A, that is of importance.

I believe it is in the interests of laboratories offering specialist analytical services, and of equipment manufacturers, to educate users about the initial sampling of the object(s) - the users' optimal policy may be to collect more than they do at present, these will need to be sliced, diced, and fed into the machine, and so more analyses (and perhaps more or bigger instruments) will be needed. The user I am referring to is the person who collects the rock and wants to know the answer. He or she may be different from the person who operates the machine; indeed, the machine may belong to a different company, or be located in a different city. Thus, I admit the laboratory or the manufacturer may need to make a special effort to communicate with the ultimate user.

The point is still valid that laboratories and statisticians are natural allies in educating people to take a sample large enough for its purpose. (I am not thinking of situations where the laboratory process is so expensive, in money or time or convenience, that it is the limiting factor. Rather, I have in mind that once you have reached the desert, you might as well pick up three or four rocks rather than one, and send all for analysis.)

Discussion

Two further points, rather subtle ones, need discussion. The first concerns the relationship of the data to the question. Sometimes, interest may lie in a set of objects (eg, rocks from different sites) as a unit, and how it compares with other sets, or with reference values. For example, how much Yttrium is in the rocks at this set of sites? In such a case, the variability between the rocks will be considered random and there is no need to replicate the rocks individually. But other times, the question concerns the differences within the set (eg, between one site and another), in which case they do need to be replicated. For example, is the eastern site different from the western site in respect of Yttrium? I think it is common for practitioners and researchers not to plan ahead sufficiently well to realise that though a single rock from each site will be sufficient for some purposes, it will not be for others.

The second concerns how the 'random' sample is to be taken. Consider why three or four rocks from each site are better than one. In part, it is because they are a bigger sample, and thus the sample mean is a better estimate of the population mean. But that isn't the point I am making. I am saying that three or four rocks are needed in order to give information about the variability in the population at that site. And this has an implication for how they should be chosen.

If all you are interested in is the mean, no great harm is done if you choose three or four rocks, each of which is typical. But if, as I am suggesting, you should also be interested in the variability, then you need to capture that variability in your sample. (If they are all typical, they are less variable than they should be.)

Now, I concede that in order to take the sample, you cannot number the thousands of rocks and consult a table of random numbers! But perhaps you can get one that is darker than average, one that is lighter than average, one that is more speckled than average, one that is less. Obviously, in such a situation, it is next to impossible to do things properly. But I do think that getting some feel for the population variability is both practicable and will pay dividends when attempting to interpret the numbers.

In short, I suggest a laboratory offering specialist analytical services, or an equipment manufacturer, would be well advised to know about the principles of statistics and about the details of the user's purposes, in order to argue intelligently for a larger, appropriately structured, sample.

Related Articles

Novel activity identified for an existing drug

Drug discovery company Re-Pharm has used computational chemistry suite Forge, a product of its...

New structural variant of carbon made of pentagons

Researchers from the US and China have discovered a structural variant of carbon called...

Cosmic radio waves caught in real time

Swinburne University of Technology PhD student Emily Petroff has become the first person to...


  • All content Copyright © 2024 Westwick-Farrow Pty Ltd