Introduction 1 3 4 7 8 12 13 14 15 18 19 16 20 16 18 21 The study reported here builds on this prior work by analyzing cancer-related searches conducted in the United States from 2001 to 2003 using the search engine Yahoo! Specifically, we investigated three potential correlates of Yahoo! cancer search activity—estimated cancer incidence, estimated cancer mortality, and the volume of cancer news coverage. Cancers that afflicted more individuals, claimed more lives, and generated more news coverage were expected to be associated with more Internet search activity than other cancers, given the interest generated by relevance and publicity. In addition, we assessed the periodicity of Yahoo! cancer search activity and examined sharp increases in Yahoo! search activity related to specific cancer types. Methods This analysis included three types of 2001–2003 US data: Yahoo! cancer search activity, cancer burden (estimated incidence and mortality), and cancer news coverage. The study protocol was reviewed by the Institutional Review Board of the National Center for Chronic Disease Prevention and Health Promotion and was designated as “research not involving human subjects.” Yahoo! Cancer Search Activity 22 14 23 Yahoo! employs professional “surfers” or content indexers who manually classify Web pages into one of more than 2000 content categories, such as “movies,” “footwear,” “astrology,” or “cancer or neoplasms.” The Yahoo! Buzz Index classifies search terms in the same content category as the first Web page link that a user “clicks” or activates after conducting a search. For instance, if a user entered the search term “colon” and then clicked on a cancer website, “colon” would be classified as a “cancer or neoplasms” search term. If the user clicked on a grammar website, however, “colon” in that instance would be classified as an “education” search term. When a user does not click on a Web page link or when a user clicks on a Web page link that has not been classified, the Yahoo! Buzz Index categorizes the search term using a variety of algorithms that analyze recent content viewed by the user. Table 1 Cancer Burden 24 26 Cancer News Coverage Wall Street Journal Baltimore Sun, Analysis Descriptive statistics were calculated for the Yahoo! search activity score, estimated incidence, estimated morality, and news coverage volume associated with the cancers included in the study. Spearman rank correlations were used to establish the consistency of these variables across the study period, and the data were aggregated. Next, the relationships between Yahoo! search activity and the potential correlates of interest were tested using Spearman rank correlations. t Results Table 1 The highest mean daily Yahoo! search activity scores were generated by breast cancer (mean = 14.37), lung cancer (mean = 9.08), and leukemia (mean = 7.15). Cancers with the highest US 2001–2003 incidences were breast (n = 611300), prostate (n = 608000), and lung (n = 510800). For cancer mortality, lung (n = 469500), colorectal (n = 170400), and breast (n = 120800) cancer were the leading causes of death. Breast cancer (n = 5840), leukemia (n = 2143), and prostate cancer (n = 1822) were associated with the most US news reports from 2001 to 2003. Some cancers, such as leukemia, ovarian, and testicular, appeared to be associated with more Internet search activity than their burden would dictate. P Table 1 Mean daily Yahoo! search activity score (United States, 2001–2003), estimated incidence, estimated mortality, and number of news reports, by cancer Cancer Yahoo! Search Terms * (Rank) Estimated Incidence (Rank) Estimated Mortality (Rank) Number of News Reports (Rank) Breast “breast cancer” 14.37 (1) 611300 (1) 120800 (3) 5840 (1) Lung “lung cancer” 9.08 (2) 510800 (3) 469500 (1) 918 (5) Leukemia “leukemia” 7.15 (3) 92900 (10) 65100 (7) 2143 (2) Colorectal “colon cancer” 7.08 (4) 43120 (4) 170400 (2) 617 (6) Prostate “prostate cancer” 6.13 (5) 608000 (2) 90600 (4) 1822 (3) Ovary “ovarian cancer” 3.71 (6) 72100 (13) 42100 (9) 458 (8) Lymphoma “lymphoma” 3.54 (7) 185500 (5) 78100 (6) 480 (7) Uterine, cervix “cervical cancer” 2.53 (8) 38100 (20) 12600 (19) 392 (9) Melanoma “melanoma” 2.25 (9) 159200 (7) 22800 (16) 376 (10) Brain 1.52 (10) 52500 (16) 39300 (10) 925 (4) Liver “liver cancer” 0.70 (11) 50100 (17) 42600 (8) 110 (14) Testis “testicular cancer” 0.62 (12) 22300 (23) 1200 (23) 50 (17) Pancreas “pancreatic cancer” 0.23 (13) 90200 (11) 88600 (5) 185 (11) Multiple myeloma 0.11 (14) 43600 (18) 32900 (15) 185 (11) Stomach “stomach cancer” 0.08 (15) 65700 (14) 37300 (13) 50 (17) Uterine, corpus “uterine cancer” 0.012 (16) 117700 (8) 20000 (18) 17 (22) Larynx “throat cancer” 0.012 (16) 28400 (21) 11500 (21) 30 (20) Bladder “bladder cancer” 0.010 (18) 168200 (6) 37500 (12) 118 (13) Soft tissue “sarcoma” 0.009 (19) 25300 (22) 12200 (20) 25 (21) Thyroid “thyroid cancer” 0.002 (20) 62200 (15) 4000 (22) 40 (19) Kidney “kidney cancer” 0.001 (21) 94500 (9) 35600 (14) 77 (15) Oral cavity - 0.000 (22) 86700 (12) 22400 (17) 69 (16) Esophagus - 0.000 (22) 40200 (19) 38100 (11) 13 (23) * Correlates of Yahoo! Cancer Search Activity Table 2 Table 2 Spearman rank correlations between mean daily Yahoo! search activity score (United States, 2001–2003), estimated incidence, estimated mortality, and number of news reports * Mean Daily Yahoo! Search Activity Score Estimated Incidence Estimated Mortality Number of news reports † ‡ † Estimated mortality † † - Estimated incidence § - - * Table 1 † P ‡ P § P P Table 3 Table 3 Mean daily Yahoo! search activity score (United States, 2001–2003), by number of news reports published daily and cancer Cancer * † (Number of News Reports ) Days With 0 News Reports Days With 1–2 News Reports Days With 3–4 News Reports Days With 5+ News Reports Breast 10.09 (81) 11.49 (278) 13.36 (252) Lung 8.27 (633) 10.00 (362) 10.54 (71) Leukemia 6.89 (248) 7.07 (523) 7.18 (232) Colorectal 6.72 (739) 7.44 (297) 8.25 (43) Prostate 5.30 (390) 6.40 (467) 6.72 (150) * † P Periodicity of Yahoo! Cancer Search Activity and News Coverage Table 4 P P P P P P P P P P P P Table 4 Periodicity of mean daily Yahoo! search activity score (United States 2001–2003) and mean daily number of news reports, by cancer Cancer Weekdays Weekends P Awareness Month Non-Awareness Months P Summer: June-August Non-Summer P * 15.78 10.84 < .001 13.26 < .001 10.78 15.58 < .001 Mean Daily Number of News Reports 6.26 3.02 < .001 15.30 4.41 < .001 4.19 5.72 < .001 Lung Mean Daily Yahoo! Search Activity Score 10.31 6.00 < .001 8.84 < .001 5.76 10.20 <.001 Mean Daily Number of News Reports 1.03 0.37 < .001 1.03 0.82 .226 0.89 .086 Leukemia Mean Daily Yahoo! Search Activity Score 8.13 4.70 < .001 7.20 .093 5.65 7.66 < .001 Mean Daily Number of News Reports 2.20 1.34 < .001 1.51 2.00 .036 1.88 1.98 .506 Colorectal Mean Daily Yahoo! Search Activity Score 7.73 5.44 < .001 6.77 < .001 6.83 7.17 .081 Mean Daily Number of News Reports 0.68 0.27 < .001 1.55 0.47 < .001 0.49 0.59 .214 Prostate Mean Daily Yahoo! Search Activity Score 6.82 4.41 < .001 6.18 .044 6.14 6.13 .997 Mean Daily Number of News Reports 2.03 0.74 < .001 2.39 1.60 .007 2.14 1.50 .010 * Peaks in Yahoo! Cancer Search Activity and News Coverage 27 Figure 1 28 Figure 1 2003 US prostate cancer Yahoo! search activity (each point of a Yahoo! search activity score equals 0.001% of the population searching Yahoo! on any day) Discussion P P P 29 33 30 32 21 We detected several periodicity effects in US Yahoo! cancer search activity, which tended to be higher on weekdays and during national cancer awareness months but lower during the summer months. It should be noted that these observations are not artifacts of the size of the online population during these periods because Yahoo! search activity scores are based on the percentage, not the number, of total users. One explanation for these results is that the volume of cancer news coverage tended to follow these trends. It is also possible that users tend to search for online cancer information from school or work settings. As a result, Yahoo! cancer search activity would be expected to drop during weekends when people are at home and over the summer months when many students are out of school and many workers go on vacation. Although Yahoo! is a leading US Internet search engine, the extent to which the findings of this study can be generalized to other search engines is not known. Also, we were unable to discern the motivations of Yahoo! users searching for cancer information. For instance, news coverage of a breast cancer drug might be associated with an increase in “breast cancer” search activity. While the Yahoo! Buzz Index would detect this rise, it cannot tell how many searchers were breast cancer patients or family members and how many were investors interested in buying stock in the company developing the drug. 34 35 19