Guide to the Internet: The world wide webBMJ 1995; 311 doi: https://doi.org/10.1136/bmj.311.7019.1552 (Published 09 December 1995) Cite this as: BMJ 1995;311:1552
- Mark Pallen, senior lecturera
- Correspondence to: firstname.lastname@example.org.
The world wide web provides a uniform, user friendly interface to the Internet. Web pages can contain text and pictures and are interconnected by hypertext links. The addresses of web pages are recorded as uniform resource locators (URLs), transmitted by hypertext transfer protocol (HTTP), and written in hypertext markup language (HTML). Programs that allow you to use the web are available for most operating systems. Powerful on line search engines make it relatively easy to find information on the web. Browsing through the web— “net surfing”—is both easy and enjoyable. Contributing to the web is not difficult, and the web opens up new possibilities for electronic publishing and electronic journals.
The world wide web (WWW, W3, or simply “the web”) is the crowning glory of the Internet, providing a uniform, user friendly interface to the net. It allows information to be presented in a sophisticated and attractive format, interlacing pictures with text. Simply by clicking on highlighted text, you can surf the net or search for information.
The web has fuelled such an explosion of interest in the Internet that it is easy to forget quite how new it all is. Although Tim Berners-Lee and his coworkers first put forward proposals for the world wide web in 1989-90, the web was catapulted to success only with the release of Macintosh and Windows versions of Marc Andreessen's world wide web client program (or “web browser”), Mosaic, in the autumn of 1993. Since then, the web has shown astonishing exponential growth.
Anatomy of the web
The web page is the basic unit of information on the web. Four elements are needed for its creation, transmission, or retrieval: hypertext; uniform resource locators (URLs); hypertext transfer protocol (HTTP); and hypertext markup language (HTML).
Hypertext1 underpins the web. The term “hypertext” was coined by Ted Nelson in the 1960s, although the concept was probably first proposed by Vannevar Bush in the 1940s.2 The hypertext idea was later incorporated into the Macintosh program Hypercard and now also features in the help program built into the Windows operating system.
In its most basic form, hypertext functions like an electronic footnoting system, where a hypertext link takes you to more detailed information about the issue in question. However, hypertext links on the world wide web, unlike conventional footnotes, may lead you to information held anywhere on the Internet—in another town or country or even on another continent—and the information you receive may be a text, graphics, or sound file, or even a movie. Unlike in gopher (see box), the web's hypertext links can be embedded within any part of a web page and are not necessarily arranged in a hierarchical format. How hypertext links are highlighted varies according to the web browser—most browsers use underlining or display hypertext in a different colour (fig 1).
Gopher, forerunner of the web
Gopher is an Internet service similar to, but predating, the web. Gopher presents you with a series of menus; selecting items from a menu might take you to other menus, to text based information, or to any other sort of file. Indeed, these menu driven links, like the web's hypertext links, can take you to information anywhere on the Internet. Like the web, gopher allows easy “net surfing”—you keep making choices at each menu to see what turns up next. The term “gopherspace” is often used to denote the information space served by gopher. You can search gopherspace with a tool called “Veronica” that you can access on most gophers by selecting “other gopher and information services” at the main menu, then “searching through gopherspace using Veronica.”
Although gopher can be used through dedicated client programs, via certain open access UNIX sites, or even via email,29 it is now most often accessed through the web; most web browsers also act as gopher clients, and hypertext links can be made from the web pages to gopher sites using gopher-specific URLs.
There are numerous medical and biological gophers30 covering almost everything from AIDS31 to the WHO.32 Readers keen to explore medical gopherspace should start at lists held at the National Institute of Allergy and Infectious Disease gopher30 or else perform a Veronica search with the keywords “Medical Gophers.”
In following a series of the web's hypertext links you trace a path through “information space” rather than through geography. A single click of your mouse on a web page in Europe might take you to a web site in Asia, Africa, the Americas, or Oceania. Hypertext encourages you to follow non-linear paths through information—metaphorically you explore a web rather than a tree.
URLs—uniform resource locators—are the standard form of address used in hypertext links to retrieve or send information. You can use them not only to access other web pages but also to interact with other Internet services, such as gopher (box), file transfer protocol, or email.
Most URLs contain three pieces of information: the service (or in jargon “the protocol”) to be used; the Internet address or host name of the server; and the file path to follow to access the specified file on that server. The format is typically: protocol://server-name/path For example, this will take you to a web page at Queen Mary Westfield: http://www.qmw.ac.uk/~rhbm001/index.html whereas this will take you to the Queen Mary Westfield gopher: gopher://gopher.qmw.ac.uk and this will retrieve a file from Queen Mary Westfield's file transfer protocol server: ftp://ftp.qmw.ac.uk/pub/aids/aids9409.rmf Other examples of URLs can be seen in the references to this article.
There are three ways in which you can use a URL to take you somewhere on the web. Firstly, you can simply click on a hypertext link on a web page. In this case, the URL is buried inside the link and is invisible to you, the user. Secondly, you can supply your web browser with the URL by typing it in manually. This option is useful if you find a URL for an interesting site printed in a magazine, journal, or book. Finally, if you are using a suitable operating system you can cut and paste a URL from an electronic document (such as an email message) into a window on your web browser.
Hypertext transfer protocol (HTTP) is used to transfer information on the web. An HTTP connection lasts just long enough for a web page or image to be requested by the browser, then sent by the web server. As a separate connection is needed to retrieve each image on a web page, displaying pages with graphics can be time consuming for some clients or servers. The web browser Netscape speeds things up by running mutiple connections in parallel when accessing such pages.
Web pages are plain text documents, marked up in hypertext markup language (HTML), which defines the format of areas of text by “tagging” them. The tags are interpreted by a web browser so that tagged areas are displayed differently from the normal style of text (for example, in bold; fig >1). Tags usually surround the tagged area in the format <tag>text</tag>. For example, a top level (level 1) heading is tagged as follows: <h1>This is a top level heading</h1> Tags are usually defined functionally, rather than visually, so that the way in which a given tag is displayed varies from browser to browser. Take, for example, text tagged with <h1> </h1>. Both Mosaic and Netscape would display this text in a large, bold font; Win Web would also underline the text; Lynx would centre the text and then capitalise all the letters.
Hypertext links are marked up in a special format: <A HREF=“URL”>hypertext</A> For example: <A HREF=“http://www.qmw.ac.uk/~rhbm001/index.html”>The Microbial Underground</A> Links marked up in this way can be used to take the reader elsewhere in the same document, to other marked up documents, or to completely different types of file.
There are now web browsers for almost all operating systems. The simplest web browsers, such as Lynx, allow users with text only Internet access to explore the web, albeit missing out on its graphical richness. If you do not have full Internet access, but do have a UNIX account, try typing “lynx” next time you are logged on; if that does not work, try one of the open access sites that allow free exploration of the web using a text only browser.3 4
NCSA Mosaic was the first widely used web browser that not only displayed in line images but also provided access to gopher and other non-web Internet resources. Although Mosaic revolutionised the web, it has now been replaced by Netscape Navigator as the most commonly used web browser. The latest version of Netscape Navigator supports extensions to HTML that, for example, allow authors to specify a background to web pages. Both Mosaic and Netscape are available free of charge to users in educational settings.
What if you have only email access to the Internet? You can still access the web by using a “WebMail” server to retrieve the text of web pages. Send an email message to email@example.com with the command GO<URL> in the body of the message, where <URL> is the uniform resource locator of the document you wish to retrieve. There is no easy way to use this system to follow links embedded in the web page. You have to repost any locators found in the retrieved document back to the web server. Also you cannot enter data into forms using this approach.
Searching the web
One of the great strengths of the web that underpins its astonishing growth is that anyone with access to a web server can contribute to the web; apart from local rules as to what may be placed on the server, you do not need anyone's permission to publish on the web. This is also a weakness, as it means that no one holds a central index of the web; the information you want may be out there somewhere, but you may not be able to find it.
Let us imagine that you are giving some lectures on orthopaedic surgery and want some images related to the subject. There are two approaches you can try. The first is to start your search at a site that provides a hierarchical index of the web. You might begin your search on the top menu of the popular web site, Yahoo.5 You select the link to “health” (fig 2), then the link to “medicine,” which takes you to a list of a medical specialties, from which you select “orthopaedics.” You find five orthopaedic institutes and one orthopaedic organisation listed. You select the hypertext link to orthopaedics at Queen's University, Belfast. This takes you to a web page6 detailing the activities and resources of the orthopaedic department there (fig 3). Now all you need do is click on the link to the Orthopaedic Image Database7 to access the department's collection of slides (fig 4).
An alternative approach to finding information on the web is to use one of the “search engines” that collect and index data published on the web. The most comprehensive of these is Lycos,8 which currently holds information on over five million web pages. You can query several search engines one at a time via the Configurable Unified Search Interface (CUSI),9 or in parallel, using SavvySearch.10 You choose to use SavvySearch. You type the query term “orthopaedic” into the search window and cross check the “images” box to tell SavvySearch what sort of sites to search for. SavvySearch then lists numerous links, most of which take you back to Queen's University.
Once you have found an interesting site on the web, you can easily revisit it by using your history list or bookmark list. The history list is a temporary log of all of the sites that you have visited since you opened the program; you can scroll back through to revisit any of the sites listed, or just go back to the most recent site with a “go back” command. Your bookmark list is similar to the history list, except that it contains only those sites that you have deemed important enough to bookmark and, unlike the history list, it survives quitting and restarting the program.
Surfing the web
The web has turned net surfing into a pleasurable pastime and an occupational hazard—fascinating but also distracting. Simply by following one hypertext link after another in a stream of consciousness fashion, one can travel vast distances in both information and geographical space, often stumbling on hidden treasures.
To illustrate this point, let us begin a net surfing session, starting again in Yahoo's Health:Medicine Section.11 A click on the Education link takes us to Yahoo's list of 11 links to medical education sites. We choose what seems to be the most comprehensive site, Shlomi Codish's index on medical education,12 which is held on a server at the Ben-Gurion University of the Negev in Israel. Here, we select a link to a list of web based educational material on cardiology; from this list we select a link entitled “The Virtual Heart.”13 This takes us to some stunning web pages on the heart maintained by the the Franklin Institute Science Museum in Philadelphia. These describe all aspects of heart function and disease and contain links to quicktime movies, to sound files, and to many other cardiology pages on the web.
We start to download a movie of a trip down a coronary artery.14 That will takes us 10 minutes or more, so in the meantime we open a new window on our web browser and explore the institute's information about Benjamin Franklin. A link takes us to the American Declaration of Independence.15 The bad press given to King George III in that document prompts searches that take us to a review of The Madness of King George16 and to information on the crystal structure of porphobilinogen deaminase (a defective version of which was probably responsible for George's illness).17 Finally the movie of the coronary artery arrives (fig 5) and our net surfing experience ends.
Contributing to the web
The richness of the web stems largely from the fact that it is so easy to contribute material to it. You need no training in computer science or graphic design, just the ability to write hypertext markup language and access to a web server. Creating good quality web pages is not difficult as long as you follow the basic rules spelt out in several good introductions to web authorship (most of them available only on the web).18 19 20 21 22 23 24 25 How easy it will be for you to gain access to a web server will vary. Some academic institutions or departments have very strict rules governing who may put what on the web, whereas others allow any bona fide academic to launch web pages. For those outside academia, several commercial Internet service providers will make space on a web server available to their customers, albeit at a cost.
What should you include in your web pages? This will depend on whether you are an official author of your department's or institution's web pages or whether you are acting independently. Departmental and institutional web sites usually include general information about the department or institution, with local phone numbers and email addresses; specific information about local research, educational, and clinical interests; and links to related sites elsewhere on the web. Several British medical schools now have their own home pages and Brighton Health Care Trust26 has led the way in producing the first nonacademic NHS hospital site on the web. Many professional associations, such as the Association of Clinical Biochemists,27 are also now on the web. Many individuals have created their own “home pages” containing some biographical data, a picture, and links to sites that interest the author.
Publishing on the web
Although some scientists have made their research papers available through the Internet for many years, the arrival of the web presents a more attractive and powerful medium for electronic publishing. Several medical journals have now moved on to the web (box). Some provide only a table of contents for each issue; others, like the BMJ, publish a selection of their articles in electronic form; yet others, such as the Journal of Biological Chemistry, are available in a full text format.
For the professional medical publisher moving on to the web, having to use HTML is currently a problem, as it is not yet as flexible as desktop publishing software; it forces the publisher to give up control of the precise appearance of the product (each web browser will display it differently); and it is time consuming to produce paper and HTML versions of the same publication.
There are several solutions to these problems. One approach, followed by the BMJ, is to mark up the publication in a more powerful markup language, SGML (standard generalised markup language), that can be interpreted both as HTML and by desktop publishing software. Another is to make documents accessible through the web, not as web pages but in one of several portable document formats (Adobe Acrobat, Common Ground, Envoy, Replica, etc). Material produced by desktop publishing software can be exported to these formats. This approach also affords more control over how the publication will appear to the reader. Communicable Diseases Report, Mortality and Morbidity Weekly Review, and the Journal of Biological Chemistry are all published in Acrobat format.
Medical and scientific publication on the Internet raises some problems. There is anxiety that material can be freely distributed without being peer reviewed. This might be a problem for the naive reader, but it is usually quite clear whether electronically published information has been peer reviewed or not—and implementation of electronic peer review is likely to overcome many of the deficiencies of current peer review practice.28
Another problem is the evanescent nature of pages on the web; links expire as web pages are moved or removed, or as servers close down. This makes it difficult to cite data on the web. An author citing an article in the paper version of the BMJ can be fairly certain that readers a hundred years from now will still be able to find that article; those providing a reference to the equivalent item on the web cannot be sure that readers will be able to find it the following week. Indeed, the BMJ changed all of its URLs while these articles were being written.
Finally, there are potential ethical problems regarding the reproduction of clinical data, such as images of patients, on the web. In particular, it is unclear whether additional permission should be sought from patients when existing photographic images are transferred onto the web. As a result of such considerations, the orthopaedics department at Queen's University, Belfast, has now restricted access to the image database7 to local users.