Introduction 1 Robert Wilensky 2 3 4 5 6 7 8 9 To address the issue of attrition, a defined set of health-related Web sites was examined at two separate time intervals. Methods Opuntia 10 In this study, to determine the degree of attrition, each of the 184 Web sites obtained and recorded from the previous study were revisited at a later period of time. During May 2002, the previously recorded URL for each Web site was entered into the address field of the browser Netscape Navigator (version 4.7, Netscape Communication Corporation, Mountain View, California.) It was documented whether the original Web site could not be found, moved to a different URL location, or the URL and site location was found unchanged from the original search. For A Web site whose URL remained unchanged, it was also noted whether the Web site had maintained currency, (i.e. updated) since the original posting. Since it is conceivable that inaccessibility of Web sites may be due to temporary server problems, another attempt was made to access the sites at different periods of time. For each "HTTP Error 404" or similar message obtained from the initial URL checks, an attempt to access these sites was made during June 2002 on various days and times of day in the manner described above. Results Table 1 In this study, attrition is defined as the unavailability of a Web site when known to be previously accessible based on a known URL address. This did not include sites that were redirected to a new URL. Approximately three years after initial posting, over two-thirds of the health-related Web sites reviewed could not be found or had moved with no forwarding URL, and about one-third of the remaining sites maintained currency of information. It appears that links are terminated as Web sites are moved or removed, or as servers close down. This supports the notion that it is difficult, if not impossible, to locate information that was previously found on the Web, and if a reference to an item is provided, there is no guarantee that viewers will be able to find the site at a later date. In this study, a comprehensive data set of Web sites on a specific health-related topic was obtained, and attrition was examined. Obviously an example from a single health-related topic is limited in what conclusions should be drawn. These findings cannot be generalized to other medical topics. But this raises the question that other health-related sites on the World Wide Web may vary in their degree of attrition, and warrants further research into methods of dealing with attrition with other medical topics. Discussion 11 http://searchenginewatch.com 12 Enhancements in Web technologies hope to improve the problem of attrition. A prime example is the Internet Archive. The Internet Archive http://www.archive.org 11 11 Though a significant accomplishment towards recovering lost Web pages, the Wayback Machine has limitations. It is not searchable by keywords or text in the manner of a general search engine. The user must know the precise URL of a particular Web page or site to access the Archive. Having entered a URL address, the viewer is presented with a list of dates that designates when a particular page was archived. Also, though the Internet Archive contains more than 100 terabytes of data, much is still missing. For example, it does not contain the older gopher content and other non-Web files prior to 1996, and a relatively small number of pages exists from 1996, with content increasing to recent times. Issues of Quality and Content The question may arise as to whether a relationship exists between Web site quality and attrition. Are poor quality sites more likely to disappear in time than sites of higher quality? 13 14 15 Table 2 It appears that although the high quality sites make up only a small portion of the total number of sites retrieved (15%), half of the original high quality sites (14 of 28) could be located from the original URL or were redirected to a new URL. Conversely, only 10 of the 73 poor quality sites were accessible from the original URL entry, and only one poor quality site was redirected to another URL from the original site. This suggests that Web sites of higher quality may be less subject to attrition than those of poorer quality, and warrants further research on the relationship between Web site quality and attrition with other medical topics. Considering subject matter and attrition, it may be that certain topics (such as herbal remedies) can have periods of enthusiasm by the public then wane — which may be the case with these sites. Perhaps information on more mainstream topics (such as health risks and smoking) is less vulnerable to attrition. Future Considerations It has yet to be determined with certainty the forces that influence the survival of Web sites. With the complex and dynamic nature of information flow on the Web, is there a form of "natural selection" at work in health Web site survival? If attrition is not related to the site's quality or subject matter, perhaps those with strong commercial backing may survive with greatest frequency. At this point we can only speculate what will endure. 16 17 18 19