- Funda Meric (), assistant professora,
- Elmer V Bernstam, assistant professorc,
- Nadeem Q Mirza, research investigatora,
- Kelly K Hunt, associate professora,
- Frederick C Ames, professora,
- Merrick I Ross, professora,
- Henry M Kuerer, assistant professora,
- Raphael E Pollock, professora,
- Mark A Musen, associate professorb,
- S Eva Singletary, professora
- a Section of Breast Surgery, Department of Surgical Oncology, University of Texas M D Anderson Cancer Center, 1515 Holcombe Boulevard, Box 444, Houston, TX 77030, USA
- b Stanford Medical Informatics, Department of Internal Medicine, Stanford University Medical School, Stanford, CA, USA
- c University of Texas Health Science Center at Houston, School of Health Information Sciences, Houston, TX, USA
- Correspondence to: F Meric
- Accepted 22 January 2002
Objectives: To determine the characteristics of popular breast cancer related websites and whether more popular sites are of higher quality.
Design: The search engine Google was used to generate a list of websites about breast cancer. Google ranks search results by measures of link popularity—the number of links to a site from other sites. The top 200 sites returned in response to the query “breast cancer” were divided into “more popular” and “less popular” subgroups by three different measures of link popularity: Google rank and number of links reported independently by Google and by AltaVista (another search engine).
Main outcome measures: Type and quality of content.
Results: More popular sites according to Google rank were more likely than less popular ones to contain information on ongoing clinical trials (27% v 12%, P=0.01), results of trials (12% v 3%, P=0.02), and opportunities for psychosocial adjustment (48% v 23%, P<0.01). These characteristics were also associated with higher number of links as reported by Google and AltaVista. More popular sites by number of linking sites were also more likely to provide updates on other breast cancer research, information on legislation and advocacy, and a message board service. Measures of quality such as display of authorship, attribution or references, currency of information, and disclosure did not differ between groups.
Conclusions: Popularity of websites is associated with type rather than quality of content. Sites that include content correlated with popularity may best meet the public's desire for information about breast cancer.
What is already known on this topic
What is already known on this topic Patients are using the world wide web to search for health information
Breast cancer is one of the most popular search topics
Characteristics of popular websites may reflect the information needs of patients
What this study adds
What this study adds Type rather than quality of content correlates with popularity of websites
Measures of quality correlate with accuracy of medical information
Recent surveys show that 40-54% of patients access medical information via the internet and that this information affects their choice of treatment.1–5 Although the quality of medical information on the world wide web has been an area of increasing concern,6–11 the factors that contribute to popularity of websites have not been systematically studied.
Understanding the determinants of website popularity has implications for clinicians and medical centres that recognise the need to provide information about themselves via the internet. Website designers who understand the information needs of the public can attract visitors to their site. Knowing what patients are investigating on the web may help clinicians to educate themselves and their patients.
Two measures of website popularity are “click popularity” and “link popularity.”12 Click popularity is the frequency with which users have visited (clicked on) a site.13 Although some search engines, such as Direct Hit, measure click popularity, this information is not publicly available for a large number of websites. Furthermore, click popularity is subject to artificial marketing manipulations.14 Link popularity, which is less susceptible to manipulation,15 relies on links from sites to other sites rather than on statistics about usage. High link popularity is thought to dramatically increase traffic to a site.16 Link popularity, sometimes referred to as “peer review popularity,” has been proposed as an objective way of identifying high quality websites.17–19 Google ranks results of searches by using a proprietary link popularity algorithm that takes into account the number of links and the “importance” of the linking sites. 15 17
Breast cancer is one of the most common health related search topics among users of the internet.20 Previous studies have evaluated use of the internet by women with breast cancer and the quality of selected sites. 10 21 22 A recent study found information about breast cancer on the web to be more complete and accurate than for other conditions.22 We are not aware, however, of work that attempts to determine what makes some sites more popular than others. The purpose of our study was to identify the determinants of link popularity of websites about breast cancer and to test the hypothesis that more popular sites are of higher quality.
Selection of websites
We used the search term “breast cancer” on Google (http://www.google.com/ accessed 19 Oct 2000) to generate a list of sites. We examined the first 200 of approximately 600 000 English language sites. Of these, 185 (93%) were accessible, but one was excluded as its content was only peripherally related to breast cancer.
Determination of popularity
Because there is no standard way to assess link popularity, we used three different measures: Google rank and number of links reported by Google and by AltaVista (on http://www.altavista.com/). Of the top 200 sites returned by Google, we defined the first 100 (Google rank 1-100) as “more popular” and the second 100 (Google rank 101-200) as “less popular.” We obtained the number of links in Google and AltaVista by entering each site's universal resource locator (URL) into the search string “link:URL”.
Google provided the number of linking sites for 162 sites and AltaVista for 148 sites. We excluded from analysis any sites for which the number of links was not available. The median number of links was 51 according to Google and 21 according to AltaVista. We considered sites with a number of links greater than the median to be “more popular” and sites with fewer links to be “less popular.” To assess whether popular sites were displayed by multiple search engines, we repeated the search on four search engines often used by patients: Yahoo (categorical), Excite, AltaVista, and Infoseek.4
Evaluation of websites
A breast oncologist (FM) evaluated the sites within four weeks of the original search. Links within each site were pursued until all medical information about breast cancer was evaluated. A median of four pages (range 1-11) were evaluated for each site. Type and quality of content were recorded. Affiliation was determined on the basis of the information provided by the site. Sites were divided into professional (government, universities, major medical centres), non-profit organisation, and commercial (all others).
We assessed quality of content by criteria known as the “JAMA benchmarks”6: display of authorship of medical content; source (attribution or references); date of update; and disclosure of ownership, sponsorship, advertising policies, or conflicts of interest. We also documented whether each site displayed its webmaster's email address or a Health on the Net (www.hon.ch/) seal. Health on the Net is a non-profit foundation with an eight point code of conduct for sites providing health information.23 Sites that comply with the Health on the Net code are allowed to display the seal, but continued compliance is not systematically enforced.
We used Pearson χ2 analysis to compare more popular and less popular websites. We performed separate analyses for each of the three measures of link popularity. We considered groups to be significantly different if P≤0.05 in at least two of three analyses.
Table 1 lists the characteristics of the top 184 accessible sites returned by Google. Twenty seven (15%) provided medical facts on the site as well as through links to other sites, and 125 (68%) had medical facts displayed at the website only. Table 2 shows the medical facts displayed in this second group of websites.
Table 3 shows indicators of quality. Of the 184 sites, 105 (57%) displayed some evidence of authorship, but only 32 (17%) displayed the name, qualifications, and institutional affiliation of the author. Sixteen (9%) of sites had all four JAMA benchmarks (authorship, references, currency, and disclosure), 48 (26%) had three, 68 (37%) had two, 43 (23%) had one, and 9 (5%) had none. Forty five sites (25%) displayed a disclaimer that the information provided should not substitute for consultation with a physician.
A Health on the Net seal was displayed on 27 (15%) sites. Commercial sites were more likely than sites of professional groups or of organisations to display the seal—21/84 (25%) v 3/36 (8%) v 3/64 (5%) (P=0.001). None of the sites with a seal actually complied with all eight Health on the Net criteria or with all four JAMA benchmarks.
Of the 184 sites, 12 (7%) contained inaccurate medical statements. Commercial sites contained inaccurate statements more often than did sites of professional groups or of organisations—11/84 (13%) v 1/36 (3%) v 0/64 (P=0.004). Three (16%) of 19 commercial sites that displayed the Health on the Net seal contained inaccurate statements. Higher quality sites (at least three JAMA benchmarks) were less likely to contain inaccurate information than lower quality sites (fewer than three JAMA benchmarks)—1/64 (2%) v 11/120 (10%) (P=0.047) (figure). None of the 16 sites that met all four JAMA benchmarks contained inaccurate information.
Determinants of popularity
Type of content differed significantly between more popular and less popular websites (table 4). Sites that were more popular by at least two of three popularity measures were more likely to contain information about ongoing clinical trials, results of randomised clinical trials, results of other breast cancer research, information on legislation and advocacy, and information on opportunities for psychosocial adjustment and to allow interaction through a message board service.
We then evaluated differences in the topics of medical facts presented between the more popular and less popular sites (table 2). This analysis was carried out only for the 125 sites that displayed medical information. More popular sites were more likely to discuss breast reconstruction—15/57 (26%) v 8/68 (12%) by Google rank (P=0.037), 15/51 (29%) v 5/57 (9%) by number of links in Google (P=0.002), and 16/50 (32%) v 6/48 (13%) by number of links in AltaVista (P=0.018)—and psychology topics such as depression—11/51 (21%) v 1/57 (2%) by number of links in Google (P=0.001) and 11/50 (22%) v 1/48 (2%) by number of links in AltaVista (P=0.002).
More popular and less popular websites did not differ in any of the quality measures studied (table 4). Furthermore, the presence of inaccurate information did not differ between more popular and less popular sites.
Evaluation of popularity measures
We evaluated the concordance between our measures of popularity. The median number of linking sites as measured by Google and AltaVista was significantly higher for sites that were more popular by Google rank than for less popular sites (Google: 82 v 21, P<0.001; AltaVista: 48 v 10, P<0.001). The number of links as measured by Google strongly correlated with the number of links as measured by AltaVista (Pearson coefficient 0.806, P<0.001).
We hypothesised that link popularity would correlate with a site being displayed by multiple search engines. Of the top 184 accessible sites displayed by Google, AltaVista displayed 24 (13%), Yahoo displayed 41 (22%), Infoseek displayed 58 (32%), and Excite displayed 84 (46%). Indeed, more popular sites were more likely than less popular sites to be displayed by multiple search engines (table 5).
To assess the correlation between click popularity and link popularity, we evaluated the top 10 sites returned by Direct Hit, which ranks sites based on click popularity and duration of visits. Nine of the 10 sites were in the top 200 by Google rank, eight were among the more popular sites by Google rank, and five were among the top 20 by Google rank.
To meet the demand for health information on the web, it is important to identify the factors that influence popularity of websites. Our results show that type rather than quality of content determines popularity. To our knowledge, ours is the first study to assess the popularity as well as the quality and accuracy of health related websites.
We found that many breast cancer websites do not comply with the JAMA benchmarks, but we found higher compliance than previously reported. 10 11 This may reflect an improvement in quality of websites over the past few years or a difference between search engines used in the studies.
Since accessibility and ranking of websites vary with the search engine used, the overlap between Google and other search engines of only 13-46% is not surprising. We found that “more popular” sites were more likely to be displayed by multiple search engines. If a site is not displayed, it is unlikely to be visited by users of the internet; thus the more popular sites by our measures of link popularity should indeed be the more popular sites among users of the internet. Our finding that eight of the top 10 sites according to Direct Hit were among the more popular sites by Google rank also supports this assertion.
Our results confirm those of an earlier study that found no correlation between measures of quality and link popularity.24 We may have selected higher quality sites by examining the top 200 of about 600 000 sites returned by Google. Significant differences in quality might have emerged if we had increased our sample size or compared the top 100 sites with sites of lower popularity, such as those ranked 1000-1100. Using less popular sites, however, would not have allowed us to correlate multiple measures of link popularity, as most sites would have no incoming links.
One limitation of our study is that we performed multiple comparisons. Another limitation is that a single reviewer (FM) assessed quality and accuracy. To mitigate this, we used objective criteria whenever possible. For example, we used the presence or absence of authorship information rather than author authority. Accuracy is inherently subjective, so our results should be confirmed by studies using a panel of experts. Multiple, non-expert reviewers, however, may not be better than a single expert reviewer.
In one survey, only 14% of patients expressed uncertainty about the accuracy of medical information on the web.4 We found that higher quality sites contain more accurate information. Objective measures of quality may help lay users to assess online health information.
Self regulation has been advocated as a way of maintaining the quality of online medical content. Our finding that sites displaying a Health on the Net seal did not comply with the Health on the Net code emphasises the limitation of self regulation. It remains the responsibility of the medical community to ensure adequate quality of online medical content, to educate the public regarding quality measures, and to direct patients to sites of known quality.
Link popularity, which can be assessed automatically, has been proposed as an indirect measure of quality.19 This is analogous to citation analysis, a somewhat controversial approach to measuring quality in the printed literature. Although link popularity may identify sites of interest, it does not correlate with quality of content. The growing number of users of the internet searching for health information indicates an unmet need for information. Understanding what patients are looking for on line may help us meet their need for health information.
We thank Valerie Natale and Stephanie Deming for editorial assistance and Herbert Kaizer and Soumya Raychaudhuri for critical reading of the manuscript.
Contributors: FM was involved in the conception and design of the study, reviewed all the websites, performed the analysis and interpretation of data, and drafted the paper. EVB was involved in the conception and design of the study, collection of data on links, interpretation of data, and drafting of the paper. NQM carried out the statistical analysis. KKH, FCA, MIR, HMK, and REP were involved in interpreting the data and revised the paper for intellectual content. MAM and SES advised on methods and revised the paper for intellectual content. FM and EVB are the guarantors.
Funding Supported in part by Grant LM06594 from the National Library of Medicine (EVB).
Competing interests None declared.