BMJ 2002;324:577-581 ( 9 March )

Papers

Breast cancer on the world wide web: cross sectional survey of quality of information and popularity of websites

Funda Meric, assistant professor aElmer V Bernstam, assistant professor cNadeem Q Mirza, research investigator aKelly K Hunt, associate professor aFrederick C Ames, professor aMerrick I Ross, professor aHenry M Kuerer, assistant professor aRaphael E Pollock, professor aMark A Musen, associate professor bS Eva Singletary, professor a

a Section of Breast Surgery, Department of Surgical Oncology, University of Texas M D Anderson Cancer Center, 1515 Holcombe Boulevard, Box 444, Houston, TX 77030, USA, b Stanford Medical Informatics, Department of Internal Medicine, Stanford University Medical School, Stanford, CA, USA, c University of Texas Health Science Center at Houston, School of Health Information Sciences, Houston, TX, USA

Correspondence to: F Meric fmeric{at}mail.mdanderson.org


    Abstract
Top
Abstract
Introduction
Methods
Results
Discussion
References

Objectives: To determine the characteristics of popular breast cancer related websites and whether more popular sites are of higher quality.
Design: The search engine Google was used to generate a list of websites about breast cancer. Google ranks search results by measures of link popularity---the number of links to a site from other sites. The top 200 sites returned in response to the query "breast cancer" were divided into "more popular" and "less popular" subgroups by three different measures of link popularity: Google rank and number of links reported independently by Google and by AltaVista (another search engine).
Main outcome measures: Type and quality of content.
Results: More popular sites according to Google rank were more likely than less popular ones to contain information on ongoing clinical trials (27% v 12%, P=0.01 ), results of trials (12% v 3%, P=0.02), and opportunities for psychosocial adjustment (48% v 23%, P<0.01). These characteristics were also associated with higher number of links as reported by Google and AltaVista. More popular sites by number of linking sites were also more likely to provide updates on other breast cancer research, information on legislation and advocacy, and a message board service. Measures of quality such as display of authorship, attribution or references, currency of information, and disclosure did not differ between groups.
Conclusions: Popularity of websites is associated with type rather than quality of content. Sites that include content correlated with popularity may best meet the public's desire for information about breast cancer.

What is already known on this topic
Patients are using the world wide web to search for health information

Breast cancer is one of the most popular search topics

Characteristics of popular websites may reflect the information needs of patients

What this study adds
Type rather than quality of content correlates with popularity of websites

Measures of quality correlate with accuracy of medical information




    Introduction
Top
Abstract
Introduction
Methods
Results
Discussion
References

Recent surveys show that 40-54% of patients access medical information via the internet and that this information affects their choice of treatment.1-5 Although the quality of medical information on the world wide web has been an area of increasing concern,6-11 the factors that contribute to popularity of websites have not been systematically studied.

Understanding the determinants of website popularity has implications for clinicians and medical centres that recognise the need to provide information about themselves via the internet. Website designers who understand the information needs of the public can attract visitors to their site. Knowing what patients are investigating on the web may help clinicians to educate themselves and their patients.

Two measures of website popularity are "click popularity" and "link popularity."12 Click popularity is the frequency with which users have visited (clicked on) a site.13 Although some search engines, such as Direct Hit, measure click popularity, this information is not publicly available for a large number of websites. Furthermore, click popularity is subject to artificial marketing manipulations.14 Link popularity, which is less susceptible to manipulation,15 relies on links from sites to other sites rather than on statistics about usage. High link popularity is thought to dramatically increase traffic to a site.16 Link popularity, sometimes referred to as "peer review popularity," has been proposed as an objective way of identifying high quality websites.17-19 Google ranks results of searches by using a proprietary link popularity algorithm that takes into account the number of links and the "importance" of the linking sites. 15 17

Breast cancer is one of the most common health related search topics among users of the internet.20 Previous studies have evaluated use of the internet by women with breast cancer and the quality of selected sites. 10 21 22 A recent study found information about breast cancer on the web to be more complete and accurate than for other conditions.22 We are not aware, however, of work that attempts to determine what makes some sites more popular than others. The purpose of our study was to identify the determinants of link popularity of websites about breast cancer and to test the hypothesis that more popular sites are of higher quality.


    Methods
Top
Abstract
Introduction
Methods
Results
Discussion
References

Selection of websites
We used the search term "breast cancer" on Google (www.google.com accessed 19 Oct 2000) to generate a list of sites. We examined the first 200 of approximately 600 000 English language sites. Of these, 185 (93%) were accessible, but one was excluded as its content was only peripherally related to breast cancer.

Determination of popularity
Because there is no standard way to assess link popularity, we used three different measures: Google rank and number of links reported by Google and by AltaVista (on www.altavista.com). Of the top 200 sites returned by Google, we defined the first 100 (Google rank 1-100) as "more popular" and the second 100 (Google rank 101-200) as "less popular." We obtained the number of links in Google and AltaVista by entering each site's universal resource locator (URL) into the search string "link:URL".

Google provided the number of linking sites for 162 sites and AltaVista for 148 sites. We excluded from analysis any sites for which the number of links was not available. The median number of links was 51 according to Google and 21 according to AltaVista. We considered sites with a number of links greater than the median to be "more popular" and sites with fewer links to be "less popular." To assess whether popular sites were displayed by multiple search engines, we repeated the search on four search engines often used by patients: Yahoo (categorical), Excite, AltaVista, and Infoseek.4

Evaluation of websites
A breast oncologist (FM) evaluated the sites within four weeks of the original search. Links within each site were pursued until all medical information about breast cancer was evaluated. A median of four pages (range 1-11) were evaluated for each site. Type and quality of content were recorded. Affiliation was determined on the basis of the information provided by the site. Sites were divided into professional (government, universities, major medical centres), non-profit organisation, and commercial (all others).


                              
View this table:
[in this window]
[in a new window]
 

Table 1. Characteristics of breast cancer websites evaluated (n=184). Values are numbers (percentages)

We assessed quality of content by criteria known as the "JAMA benchmarks"6: display of authorship of medical content; source (attribution or references); date of update; and disclosure of ownership, sponsorship, advertising policies, or conflicts of interest. We also documented whether each site displayed its webmaster's email address or a Health on the Net (www.hon.ch/) seal. Health on the Net is a non-profit foundation with an eight point code of conduct for sites providing health information.23 Sites that comply with the Health on the Net code are allowed to display the seal, but continued compliance is not systematically enforced.

Analysis
We used Pearson chi 2 analysis to compare more popular and less popular websites. We performed separate analyses for each of the three measures of link popularity. We considered groups to be significantly different if P=<0.05 in at least two of three analyses.




    Results
Top
Abstract
Introduction
Methods
Results
Discussion
References

Website characteristics
Table 1 lists the characteristics of the top 184 accessible sites returned by Google. Twenty seven (15%) provided medical facts on the site as well as through links to other sites, and 125 (68%) had medical facts displayed at the website only. Table 2 shows the medical facts displayed in this second group of websites.


                              
View this table:
[in this window]
[in a new window]
 

Table 2. Medical facts contained in breast cancer websites (n=125).* Values are numbers (percentages)

Table 3 shows indicators of quality. Of the 184 sites, 105 (57%) displayed some evidence of authorship, but only 32 (17%) displayed the name, qualifications, and institutional affiliation of the author. Sixteen (9%) of sites had all four JAMA benchmarks (authorship, references, currency, and disclosure), 48 (26%) had three, 68 (37%) had two, 43 (23%) had one, and 9 (5%) had none. Forty five sites (25%) displayed a disclaimer that the information provided should not substitute for consultation with a physician.


                              
View this table:
[in this window]
[in a new window]
 

Table 3. Quality of medical content (n=184). Values are numbers (percentages)

A Health on the Net seal was displayed on 27 (15%) sites. Commercial sites were more likely than sites of professional groups or of organisations to display the seal---21/84 (25%) v 3/36 (8%) v 3/64 (5%) (P=0.001). None of the sites with a seal actually complied with all eight Health on the Net criteria or with all four JAMA benchmarks.

Of the 184 sites, 12 (7%) contained inaccurate medical statements. Commercial sites contained inaccurate statements more often than did sites of professional groups or of organisations---11/84 (13%) v 1/36 (3%) v 0/64 (P=0.004). Three (16%) of 19 commercial sites that displayed the Health on the Net seal contained inaccurate statements. Higher quality sites (at least three JAMA benchmarks) were less likely to contain inaccurate information than lower quality sites (fewer than three JAMA benchmarks)---1/64 (2%) v 11/120 (10%) (P=0.047) (figure). None of the 16 sites that met all four JAMA benchmarks contained inaccurate information.



View larger version (15K):
[in this window]
[in a new window]
 
Number of accurate and inaccurate websites, based on number of JAMA benchmarks met. A website was considered inaccurate if it contained one or more inaccurate statements

Determinants of popularity
Type of content differed significantly between more popular and less popular websites (table 4). Sites that were more popular by at least two of three popularity measures were more likely to contain information about ongoing clinical trials, results of randomised clinical trials, results of other breast cancer research, information on legislation and advocacy, and information on opportunities for psychosocial adjustment and to allow interaction through a message board service.


                              
View this table:
[in this window]
[in a new window]
 

Table 4. Content of breast cancer websites by popularity. Values are numbers (percentages) unless stated otherwise

We then evaluated differences in the topics of medical facts presented between the more popular and less popular sites (table 2). This analysis was carried out only for the 125 sites that displayed medical information. More popular sites were more likely to discuss breast reconstruction---15/57 (26%) v 8/68 (12%) by Google rank (P=0.037), 15/51 (29%) v 5/57 (9%) by number of links in Google (P=0.002), and 16/50 (32%) v 6/48 (13%) by number of links in AltaVista (P=0.018)---and psychology topics such as depression---11/51 (21%) v 1/57 (2%) by number of links in Google (P=0.001) and 11/50 (22%) v 1/48 (2%) by number of links in AltaVista (P=0.002).

More popular and less popular websites did not differ in any of the quality measures studied (table 4). Furthermore, the presence of inaccurate information did not differ between more popular and less popular sites.

Evaluation of popularity measures
We evaluated the concordance between our measures of popularity. The median number of linking sites as measured by Google and AltaVista was significantly higher for sites that were more popular by Google rank than for less popular sites (Google: 82 v 21, P<0.001; AltaVista: 48 v 10, P<0.001). The number of links as measured by Google strongly correlated with the number of links as measured by AltaVista (Pearson coefficient 0.806, P<0.001).

We hypothesised that link popularity would correlate with a site being displayed by multiple search engines. Of the top 184 accessible sites displayed by Google, AltaVista displayed 24 (13%), Yahoo displayed 41 (22%), Infoseek displayed 58 (32%), and Excite displayed 84 (46%). Indeed, more popular sites were more likely than less popular sites to be displayed by multiple search engines (table 5).


                              
View this table:
[in this window]
[in a new window]
 

Table 5. Display by other search engines on the basis of website link popularity. Values are numbers (percentages) unless stated otherwise

To assess the correlation between click popularity and link popularity, we evaluated the top 10 sites returned by Direct Hit, which ranks sites based on click popularity and duration of visits. Nine of the 10 sites were in the top 200 by Google rank, eight were among the more popular sites by Google rank, and five were among the top 20 by Google rank.




    Discussion
Top
Abstract
Introduction
Methods
Results
Discussion
References

To meet the demand for health information on the web, it is important to identify the factors that influence popularity of websites. Our results show that type rather than quality of content determines popularity. To our knowledge, ours is the first study to assess the popularity as well as the quality and accuracy of health related websites.

We found that many breast cancer websites do not comply with the JAMA benchmarks, but we found higher compliance than previously reported. 10 11 This may reflect an improvement in quality of websites over the past few years or a difference between search engines used in the studies.

Since accessibility and ranking of websites vary with the search engine used, the overlap between Google and other search engines of only 13-46% is not surprising. We found that "more popular" sites were more likely to be displayed by multiple search engines. If a site is not displayed, it is unlikely to be visited by users of the internet; thus the more popular sites by our measures of link popularity should indeed be the more popular sites among users of the internet. Our finding that eight of the top 10 sites according to Direct Hit were among the more popular sites by Google rank also supports this assertion.

Our results confirm those of an earlier study that found no correlation between measures of quality and link popularity.24 We may have selected higher quality sites by examining the top 200 of about 600 000 sites returned by Google. Significant differences in quality might have emerged if we had increased our sample size or compared the top 100 sites with sites of lower popularity, such as those ranked 1000-1100. Using less popular sites, however, would not have allowed us to correlate multiple measures of link popularity, as most sites would have no incoming links.

One limitation of our study is that we performed multiple comparisons. Another limitation is that a single reviewer (FM) assessed quality and accuracy. To mitigate this, we used objective criteria whenever possible. For example, we used the presence or absence of authorship information rather than author authority. Accuracy is inherently subjective, so our results should be confirmed by studies using a panel of experts. Multiple, non-expert reviewers, however, may not be better than a single expert reviewer.

In one survey, only 14% of patients expressed uncertainty about the accuracy of medical information on the web.4 We found that higher quality sites contain more accurate information. Objective measures of quality may help lay users to assess online health information.

Self regulation has been advocated as a way of maintaining the quality of online medical content. Our finding that sites displaying a Health on the Net seal did not comply with the Health on the Net code emphasises the limitation of self regulation. It remains the responsibility of the medical community to ensure adequate quality of online medical content, to educate the public regarding quality measures, and to direct patients to sites of known quality.

Link popularity, which can be assessed automatically, has been proposed as an indirect measure of quality.19 This is analogous to citation analysis, a somewhat controversial approach to measuring quality in the printed literature. Although link popularity may identify sites of interest, it does not correlate with quality of content. The growing number of users of the internet searching for health information indicates an unmet need for information. Understanding what patients are looking for on line may help us meet their need for health information.

    Acknowledgments

We thank Valerie Natale and Stephanie Deming for editorial assistance and Herbert Kaizer and Soumya Raychaudhuri for critical reading of the manuscript.

Contributors: FM was involved in the conception and design of the study, reviewed all the websites, performed the analysis and interpretation of data, and drafted the paper. EVB was involved in the conception and design of the study, collection of data on links, interpretation of data, and drafting of the paper. NQM carried out the statistical analysis. KKH, FCA, MIR, HMK, and REP were involved in interpreting the data and revised the paper for intellectual content. MAM and SES advised on methods and revised the paper for intellectual content. FM and EVB are the guarantors.

    Footnotes

Funding: Supported in part by Grant LM06594 from the National Library of Medicine (EVB).

Competing interests: None declared.


    References
Top
Abstract
Introduction
Methods
Results
Discussion
References

1. Metz JM, Devine P, DeNittis A, Stambaugh M, Jones H, Goldwein J, et al. Utilization of the internet by oncology patients to obtain cancer related information. Proc Am Soc Clin Oncol 2001; 20: 395a (abstract 1575).
2. Yakren S, Shi W, Thaler H, Agre P, Bach PB, Schrag D, et al. Use of internet and other information resources among adult cancer patients and their companions. Proc Am Soc Clin Oncol 2001; 20: 398a (abstract 1589).
3. Helft PR, Hlubocky FJ, Gordon EJ, Ratain MJ, Daugherty CK. Hope and the media in advanced cancer patients. Proc Am Soc Clin Oncol 2000; 19: 633a (abstract 2497).
4. O'Connor JB, Johanson JF. Use of the web for medical information by a gastroenterology clinic population. JAMA 2000; 284: 1962-1964[Abstract/Free Full Text].
5. Health on the Net. Survey on the evolution of internet use for health purposes: raw data for the survey February-March 2001. www.honch/Survey/FebMar2001/ (accessed 14 Jan 2002).
6. Silberg WM, Lundberg GD, Musacchio RA. Assessing, controlling, and assuring the quality of medical information on the internet: Caveant lector et viewor---let the reader and viewer beware [editorial] [see comments]. JAMA 1997; 277: 1244-1245[Abstract/Free Full Text].
7. Jadad AR, Gagliardi A. Rating health information on the internet: navigating to knowledge or to Babel? JAMA 1998; 279: 611-614[Abstract/Free Full Text].
8. Bichakjian CK, Schwartz JL, Wang TS, Hall JM, Johnson TM, Sybil Biermann J. Melanoma information on the internet: often incomplete---a public health opportunity? J Clin Oncol 2002; 20: 134-141[Abstract/Free Full Text].
9. Price SL, Hersh WR. Filtering web pages for quality indicators: an empirical approach to finding high quality consumer health information on the world wide web. Proc AMIA Symp 1999:911-5.
10. Shon J, Musen MA. The low availability of metadata elements for evaluating the quality of medical information on the world wide web. Proc AMIA Symp 1999:945-9.
11. Hoffman-Goetz L, Clarke JN. Quality of breast cancer sites on the world wide web. Can J Public Health 2000; 91: 281-284[Web of Science][Medline].
12. What is popularity: thewritemarket.com. www.thewritemarket.com/search/popularity.htm (accessed 14 Mar 2001).
13. Kerber R. Direct hit uses popularity to narrow internet searches. Wall Street Journal 1998; 232(July 2): B4, Op 1.
14. Search engines take a quantum leap: 19 out of 20 now use link popularity to determine relevancy. www.webseed.com/page1007.html (accessed 14 Mar 2001).
15. Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 1998; 30: 107-117[CrossRef][Web of Science].
16. Linkpopularity.com. www.linkpopularity.com/ (accessed 21 Dec 2001).
17. Google. http://google.com/ (accessed 14 Mar 2001).
18. Rumsey E. Peer-review popularity vs. dotcom popularity. www.lib.uiowa.edu (accessed 7 Feb 2002).
19. Eysenbach G, Diepgen TL. Towards quality management of medical information on the internet: evaluation, labelling, and filtering of information. BMJ 1998; 317: 1496-1500[Free Full Text].
20. Lacroix E-M. Health topics most hit March 2000. www.nlm.nih.gov/pubs/staffpubs/lo/medlineplus/sld013.htm (accessed 27 Jan 2001).
21. Bateman M, Rittenberg CN, Gralla RJ. Is the internet a reliable and useful resource for patients and oncology professionals: a randomized evaluation of breast cancer information. Proc Am Soc Clin Oncol 1998; 17: 419a (abstract 1616).
22. Berland GK, Elliott MN, Morales LS, Algazy JI, Kravitz RL, Broder MS, et al. Health information on the internet: accessibility, quality, and readability in English and Spanish. JAMA 2001; 285: 2612-2621[Abstract/Free Full Text].
23. Health On the Net Foundation. HON code of conduct (HONcode) for medical and health web sites: principles. www.hon.ch/HONcode/Conduct.html (accessed 25 May 2001).
24. Sandvik H. Health information and interaction on the internet: a survey of female urinary incontinence. BMJ 1999; 319: 29-32[Abstract/Free Full Text].

(Accepted 22 January 2002)


© BMJ 2002

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to StumbleUpon StumbleUpon   Add to Technorati Technorati    What's this?

Relevant Articles

Is a consultation needed?
Frank Sullivan and Jeremy C Wyatt
BMJ 2005 331: 625-627. [Extract] [Full Text] [PDF]

Popular breast cancer websites not necessarily of higher quality
BMJ 2002 324: 0. [Full Text]

The invention of talk
BMJ 2002 324: 0. [Full Text] [PDF]

The quality of health information on the internet
Gretchen P Purcell, Petra Wilson, and Tony Delamothe
BMJ 2002 324: 557-558. [Extract] [Full Text] [PDF]

This article has been cited by other articles:

  • Adams, S., Bal, R. (2009). Practicing Reliability: Reconstructing Traditional Boundaries in the Gray Areas of Health Information Review on the Web. Science Technology Human Values 34: 34-54 [Abstract]  
  • Keselman, A., Logan, R., Smith, C. A., Leroy, G., Zeng-Treitler, Q. (2008). Developing Informatics Tools and Strategies for Consumer-centered Health Communication. J. Am. Med. Inform. Assoc. 15: 473-483 [Abstract] [Full text]  
  • Marriott, J. V., Stec, P., El-Toukhy, T., Khalaf, Y., Braude, P., Coomarasamy, A. (2008). Infertility information on the World Wide Web: a cross-sectional survey of quality of infertility information on the internet in the UK. Hum Reprod 23: 1520-1525 [Abstract] [Full text]  
  • Kivits, J. (2006). Informed Patients and the Internet: A Mediated Context for Consultations with Health Professionals. J Health Psychol 11: 269-282 [Abstract]  
  • Sullivan, F., Wyatt, J. C (2005). Is a consultation needed?. BMJ 331: 625-627 [Full text]  
  • Baker, L., Rideout, J., Gertler, P., Raube, K. (2005). Effect of an Internet-Based System for Doctor-Patient Communication on Health Care Spending. J. Am. Med. Inform. Assoc. 12: 530-536 [Abstract] [Full text]  
  • Trumbo, C. W. (2004). Cancer Information on the World Wide Web: Gross Characteristics. JNCI J Natl Cancer Inst 96: 332-333 [Full text]  
  • Baker, L., Wagner, T. H., Singer, S., Bundorf, M. K. (2003). Use of the Internet and E-mail for Health Care Information: Results From a National Survey. JAMA 289: 2400-2406 [Abstract] [Full text]  
  • Purcell, G. P, Wilson, P., Delamothe, T. (2002). The quality of health information on the internet. BMJ 324: 557-558 [Full text]  

Rapid Responses:

Read all Rapid Responses

Honcode seal presence in the medical web sites.Guarantee of reliableness?
Moro Quesada Daniel, et al.
bmj.com, 12 Mar 2002 [Full text]
prevalence of internet acces and use for medical information
Andrew J Larner
bmj.com, 2 Apr 2002 [Full text]
Popularity of Medical Sites and the Internet as a Scale-Free Network
Ken A Masters, et al.
bmj.com, 29 Sep 2003 [Full text]
Please Don't Blame 'The Internet' or Google for 'Pilot Error'
Trevor G Marshall
bmj.com, 30 Sep 2003 [Full text]
Teaching the Pilot the currents
Ken A Masters, et al.
bmj.com, 4 Dec 2003 [Full text]



Access jobs at BMJ Careers
Whats new online at Student 

BMJ