ABC of Medical Computing: MANIPULATING AND ANALYSING DATABMJ 1995; 311 doi: https://doi.org/10.1136/bmj.311.7005.614 (Published 02 September 1995) Cite this as: BMJ 1995;311:614
- Nicholas Lee,
- Andrew Millman
Spreadsheets form part of a family of tools for storing and manipulating numerical data. At their most basic, they perform the simple arithmetical calculations normally done on a handheld calculator. Several good programs are available that will simulate such calculators. Many of the calculator programs available provide facilities such as scientific functions, weight and measure conversions, and statistical and financial functions. They are often included in a package with other programs such as Windows or WordPerfect.
If you need to perform multiple calculations on a collection of data a spreadsheet program becomes invaluable. Spreadsheets were used before computers for bookkeeping and business accounts. However, maintaining accuracy when adding or performing the calculations on the figures within a spreadsheet was always a problem. A computer spreadsheet overcomes this problem. The calculations are always accurate, and if you make an alteration in a massive spreadsheet, the hundreds of calculations required to update the figures and totals are done in seconds.
Structure of spreadsheets
A spreadsheet consists of columns labelled on the top A, B, C, D … and rows labelled on the left 1, 2, 3, 4…. The cells within the table are identified by the column and row label--for example, A1, A2, B1, B2. The column letter always goes first. You can enter numbers, formulas, or words into the cells. Entering a formula instructs the spreadsheet to carry out a calculation on data in other cells. For example, to add all the numbers in a column you would enter the formula B1+B2+B3+B5+B6+B7 into the cell you wish the total to be placed or more simply use the summing function. If any of the numbers are altered the spreadsheet automatically recalculates the total.
The standard spreadsheet has available 9999 rows and 255 columns, making a total of over two million cells, all of which can be interlinked and used in mathematical calculations. This makes it a powerful tool that allows you to do complex analysis on small or large amounts of numerical data very rapidly.
Spreadsheet programs all have several built in functions--that is, formulas that will automatically carry out calculations for which it would be time consuming to write the instructions in full each time using normal arithmetical symbols. They take the tedium out of long calculations and avoid the errors that inevitably occur when entering lengthy instructions. The big spreadsheets have over 300 of these functions available.
Sorting data into ranges to create histograms or tables is laborious by hand. The advantage of storing data in a spreadsheet is that the program will create an accurate frequency distribution table, which can then be taken into a charting program or used by the charting functions of the spreadsheet program. Spreadsheets often have powerful charting capabilities which can also be incorporated into the spreadsheet itself to produce a professional looking report.
Spreadsheets are an excellent way of storing experimental or trial data. Each patient is allocated a separate row and each test result is then entered into the appropriate column, giving a clear representation of the data collected.
Once the data have been entered into the spreadsheet it is easy to sort them into alphabetical or numerical order with the sorting function or perform complex calculations.
Many people prefer to use spreadsheets for small or moderate amounts of data rather than a database because of the complexities of setting up a database. However, for large amounts of data spreadsheets become unwieldy, and a database is much more suitable.
Choosing a spreadsheet program
Many of the leading word processing packages contain quite powerful spreadsheets, though they are often referred to as tables. The advantages of using your word processor's spreadsheet are that you are working within a familiar program and you can produce a high quality looking document incorporating your spreadsheet. The formatting and printing abilities of some spreadsheet programs are more difficult to use than those of word processors and often not as good.
However, spreadsheets within a word processor, are not as easy to use to manipulate data as stand alone programs. If you work regularly with data you should use a proper spreadsheet program. To produce a high quality report you can import the spreadsheet into the word processing package or produce a direct link to your spreadsheet from the word processor. With a direct link the spreadsheet in your word processor will automatically be updated should you make any alterations in the spreadsheet.
Modern spreadsheet programs allow you to load and view more than one spreadsheet at a time and to interlink the spreadsheets. For example, monthly financial data can be entered as a page for each month and then the whole linked as a single spreadsheet so that calculations can be performed between any cell on any page. Many of these three dimensional spreadsheets allow you to label each spreadsheet page with a helpful name, and you can then use the mouse to click on the label or tab and bring up that spreadsheet page.
Manufacturers have tried hard to make spreadsheets more user friendly. One way they have done this is to allow labels such as column B to be replaced with more meaningful plain English terms such as “Games.” To perform calculations you enter the names instead of abstract numbers and letters--for example Sum((Games:Sports)). Although this is longwinded for simple spreadsheets, it greatly simplifies the use of multiple spreadsheets with complex arrangements of data.
Spreadsheets have developed beyond three dimensions into multidimensional programs. This is not an easy concept to understand but considerably extends the capabilities of spreadsheets. In the figure the row numbers represent each entry and are the first dimension, the columns represent the description, amount, etc and are the second dimension, the expenses/income is the third, the month the fourth, the year the fifth, and the name of the account the sixth dimension.
In the example shown, by clicking with the mouse button on any of the dimensions it is possible to change the display to show, for example, the “Expenses” spreadsheet for the month of “January” in “1995” for “Mr Harvey's account.” The display will show only two dimensions at once, but any combination of the dimensions can be applied to the x and y axes, resulting in a powerful analytical tool.
Statistical analyses are an integral part of most research projects. Until recently, statistical programs were both expensive and non-user friendly, but now many of the latest ones are menu driven and use the Windows or Macintosh graphic user interfaces. This has made the programs simpler to use, and budget packages will fit the pocket and needs of many people. Once you have entered the data into the program you simply select the correct statistical analysis from the menu and the calculations are done automatically.
The choice of statistical package will depend not only on your budget and experience but on the size and complexity of the analyses required. If you are doing a large study it is advisable to seek the advice of a statistician before starting the study to plan how the data will be collected and analysed. The planning stage should include a trial analysis of data with the proposed program.
Although these programs are not designed to replace the advice or work of a qualified statistician, they have a valuable role in allowing non-statisticians to analyse data without needing to use or understand fully the complex formulas behind them. All programs come with various levels of guidance on selecting the correct statistical analysis to use.
Arcus, which is aimed at biomedical researchers in particular, has good online help which is easy for beginners to understand yet is detailed enough to remain useful for experienced users. The heavyweight statistical packages like SPSS are supplied with several large manuals. These are fine for the statistician but too complex for the novice. Advanced programs also enable you to ask “what if?” questions--for example, which data values would be needed to reach statistical significance.
Entering and presenting data
One of the great benefits of computers is that you have to enter your data into a program only once. You can then usually transfer them to other programs. So, for instance, you could use a database to enter your data. This can make a lot of sense as the data entry mechanism is much more sophisticated and quicker than with a spreadsheet or a word processor. Thereafter you could transfer the data into a spreadsheet, into a statistics program for analysis, and finally into a word processor for printing the final report.
Some of the statistics programs have data entry facilities that are like those of a database. Once you have entered the number or name the cursor will either automatically move on to the next entry box or may be moved by pressing the return key. Many statistics programs can also create charts directly from your data and offer some of the specialist plots that are not routinely found on the standard business charting programs.