Education And Debate

# ABC of Medical Computing: MANIPULATING AND ANALYSING DATA

BMJ 1995; 311 (Published 02 September 1995) Cite this as: BMJ 1995;311:614
1. Nicholas Lee,
2. Andrew Millman

Spreadsheets form part of a family of tools for storing and manipulating numerical data. At their most basic, they perform the simple arithmetical calculations normally done on a handheld calculator. Several good programs are available that will simulate such calculators. Many of the calculator programs available provide facilities such as scientific functions, weight and measure conversions, and statistical and financial functions. They are often included in a package with other programs such as Windows or WordPerfect.

If you need to perform multiple calculations on a collection of data a spreadsheet program becomes invaluable. Spreadsheets were used before computers for bookkeeping and business accounts. However, maintaining accuracy when adding or performing the calculations on the figures within a spreadsheet was always a problem. A computer spreadsheet overcomes this problem. The calculations are always accurate, and if you make an alteration in a massive spreadsheet, the hundreds of calculations required to update the figures and totals are done in seconds.

WordPerfect's calculator can be turned on by running the macro calc.wpm.

A spreadsheet consists of columns labelled on the top A, B, C, D … and rows labelled on the left 1, 2, 3, 4…. The cells within the table are identified by the column and row label--for example, A1, A2, B1, B2. The column letter always goes first. You can enter numbers, formulas, or words into the cells. Entering a formula instructs the spreadsheet to carry out a calculation on data in other cells. For example, to add all the numbers in a column you would enter the formula B1+B2+B3+B5+B6+B7 into the cell you wish the total to be placed or more simply use the summing function. If any of the numbers are altered the spreadsheet automatically recalculates the total.

The sum function is a quick way of adding up data in a spreadsheet.

The standard spreadsheet has available 9999 rows and 255 columns, making a total of over two million cells, all of which can be interlinked and used in mathematical calculations. This makes it a powerful tool that allows you to do complex analysis on small or large amounts of numerical data very rapidly.

## Functions

Spreadsheet programs all have several built in functions--that is, formulas that will automatically carry out calculations for which it would be time consuming to write the instructions in full each time using normal arithmetical symbols. They take the tedium out of long calculations and avoid the errors that inevitably occur when entering lengthy instructions. The big spreadsheets have over 300 of these functions available.

The date function is very useful as it allows dates or times to be used as numerical data. You can therefore automatically calculate a person's age from the date of birth. The problem solving function allows “what if?” questions to be asked. Here the problem solver tells you what values to use to make a particular formula equal to a value you specify.

## Sorting data

Sorting data into ranges to create histograms or tables is laborious by hand. The advantage of storing data in a spreadsheet is that the program will create an accurate frequency distribution table, which can then be taken into a charting program or used by the charting functions of the spreadsheet program. Spreadsheets often have powerful charting capabilities which can also be incorporated into the spreadsheet itself to produce a professional looking report.

Spreadsheets are an excellent way of storing experimental or trial data. Each patient is allocated a separate row and each test result is then entered into the appropriate column, giving a clear representation of the data collected.

Spreadsheets will automatically sort your data to produce frequency distribution tables or charts.

Once the data have been entered into the spreadsheet it is easy to sort them into alphabetical or numerical order with the sorting function or perform complex calculations.

Many people prefer to use spreadsheets for small or moderate amounts of data rather than a database because of the complexities of setting up a database. However, for large amounts of data spreadsheets become unwieldy, and a database is much more suitable.

The data you have collected is invaluable so it is important to make regular back ups and keep a copy in a secure place away from work.

Many of the leading word processing packages contain quite powerful spreadsheets, though they are often referred to as tables. The advantages of using your word processor's spreadsheet are that you are working within a familiar program and you can produce a high quality looking document incorporating your spreadsheet. The formatting and printing abilities of some spreadsheet programs are more difficult to use than those of word processors and often not as good.

A spreadsheet can be imported into a word processing program for presentation.

However, spreadsheets within a word processor, are not as easy to use to manipulate data as stand alone programs. If you work regularly with data you should use a proper spreadsheet program. To produce a high quality report you can import the spreadsheet into the word processing package or produce a direct link to your spreadsheet from the word processor. With a direct link the spreadsheet in your word processor will automatically be updated should you make any alterations in the spreadsheet.

Three dimensional spreadsheet. Note the tabs at the bottom allowing you to switch between pages.

Manufacturers have tried hard to make spreadsheets more user friendly. One way they have done this is to allow labels such as column B to be replaced with more meaningful plain English terms such as “Games.” To perform calculations you enter the names instead of abstract numbers and letters--for example Sum((Games:Sports)). Although this is longwinded for simple spreadsheets, it greatly simplifies the use of multiple spreadsheets with complex arrangements of data.

Some spreadsheets let you label the columns and rows with proper names.

Spreadsheets have developed beyond three dimensions into multidimensional programs. This is not an easy concept to understand but considerably extends the capabilities of spreadsheets. In the figure the row numbers represent each entry and are the first dimension, the columns represent the description, amount, etc and are the second dimension, the expenses/income is the third, the month the fourth, the year the fifth, and the name of the account the sixth dimension.

In the example shown, by clicking with the mouse button on any of the dimensions it is possible to change the display to show, for example, the “Expenses” spreadsheet for the month of “January” in “1995” for “Mr Harvey's account.” The display will show only two dimensions at once, but any combination of the dimensions can be applied to the x and y axes, resulting in a powerful analytical tool.

## Statistical analysis

Statistical analyses are an integral part of most research projects. Until recently, statistical programs were both expensive and non-user friendly, but now many of the latest ones are menu driven and use the Windows or Macintosh graphic user interfaces. This has made the programs simpler to use, and budget packages will fit the pocket and needs of many people. Once you have entered the data into the program you simply select the correct statistical analysis from the menu and the calculations are done automatically.

Statistics programs are invaluable for research.

The choice of statistical package will depend not only on your budget and experience but on the size and complexity of the analyses required. If you are doing a large study it is advisable to seek the advice of a statistician before starting the study to plan how the data will be collected and analysed. The planning stage should include a trial analysis of data with the proposed program.

Although these programs are not designed to replace the advice or work of a qualified statistician, they have a valuable role in allowing non-statisticians to analyse data without needing to use or understand fully the complex formulas behind them. All programs come with various levels of guidance on selecting the correct statistical analysis to use.

Arcus Pro-Stat offers comprehensive statistical guidance.

Arcus, which is aimed at biomedical researchers in particular, has good online help which is easy for beginners to understand yet is detailed enough to remain useful for experienced users. The heavyweight statistical packages like SPSS are supplied with several large manuals. These are fine for the statistician but too complex for the novice. Advanced programs also enable you to ask “what if?” questions--for example, which data values would be needed to reach statistical significance.

The SPSS package offers complex statistical programs.

## Entering and presenting data

One of the great benefits of computers is that you have to enter your data into a program only once. You can then usually transfer them to other programs. So, for instance, you could use a database to enter your data. This can make a lot of sense as the data entry mechanism is much more sophisticated and quicker than with a spreadsheet or a word processor. Thereafter you could transfer the data into a spreadsheet, into a statistics program for analysis, and finally into a word processor for printing the final report.

Some of the statistics programs have data entry facilities that are like those of a database. Once you have entered the number or name the cursor will either automatically move on to the next entry box or may be moved by pressing the return key. Many statistics programs can also create charts directly from your data and offer some of the specialist plots that are not routinely found on the standard business charting programs.

Some statistics programs offer database-like systems to minimise the number of keystrokes required to enter data.

Sources of programs for manipulating data

View this table:
View Abstract