Web scraping stories at Techdirt.


of phew department

For years, we have been tracking cases related to website data removal and the Computer Fraud and Abuse Act (CFAA). The CFAA is an extremely poorly drafted law, which has been extended by both law enforcement and civilian plaintiffs to argue that all sorts of things are “unauthorized access” and therefore hacking. We have covered many of these cases over the years. The courts have at least begun to push back against some of the more extreme interpretations of the law, although this remains problematic.

More than a decade ago, we followed a case that I think is still one of the most problematic decisions for the internet: when Facebook sued a small startup called Power.com. Power has created a social media aggregator, allowing you to access all of your different social media accounts through a single interface and even post across multiple platforms through this single interface. To do this, you had to provide your login to Power, which would access your social media accounts, suck the data (or push the data to release it). Again, it was the user voluntarily granting their login credentials. Leaving aside whether or not it is wise to share your login details with a third party, it was always the choice of the user.

However, Facebook decided it was hacking and a violation of the CFAA…and the courts (tragically) agreed, allowing Facebook to effectively shut down a useful service that would have prevented Facebook from locking up so much data (and become such a dominant player). The main reason the court sided with Facebook was because it claimed that once Facebook sent a cease and desist letter, it actually meant that any further scraping was “unauthorized”. . I still think we would see a wildly different competitive landscape today if the Power case had turned out differently. This would have significantly limited the ability of major social media players to lock down their users. Instead, the rule has more or less turned Facebook into a roach motel where your data checks in, but can never verify.

Other Internet companies have unfortunately followed suit, using similar lawsuits against websites providing useful add-on services. Craigslist went after 3 taps, which made Craigslist data available to third-party apps. LinkedIn took on a company called HiQ that scraped and used LinkedIn data. Here, unlike the Power case, the courts actually ruled against LinkedIn saying that LinkedIn could not use the CFAA to block the scraping of public data. The main difference between this case and Power’s was that HiQ was scraping public information (i.e. it didn’t need to log into LinkedIn with someone’s information to access the data) . LinkedIn appealed…and lost again. LinkedIn then asked the Supreme Court to intervene, which resulted in the Supreme Court’s decision reversing the decision of the 9th Circuit and send it back to the court for reconsideration in light of last summer’s big Van Buren decision that limited parts of the CFAA.

So now, with yet another chance…the 9th Circuit has properly concluded the same. The scraping of public information by HiQ still does not violate the CFAA. There are a few different legal issues involved here, but the CFAA claims are the main event. LinkedIn argued that it had sent a cease and desist to HiQ, so in accordance with the Power ruling, its continued scraping violated the law.

The panel examining this case digs deeper into the CFAA, why it exists and what it is supposed to do before concluding that LinkedIn’s interpretation cannot be the correct one, noting that “the CFAA is best understood as an anti-trespassing law not as a ‘misappropriation law’” and as such access to public information should not constitute a violation.

In other words, the CFAA envisages the existence of three types of computer systems: (1) computers whose access is open to the general public and without authorization (2) computers for which authorization is required and has been granted, and (3) computers for which authorization is required but has not been granted (or, if prohibited from exceeding authorized access, has not been granted for the part of the system consulted). Public LinkedIn profiles, accessible to anyone with an internet connection, fall into the first category. With respect to websites made freely available on the Internet, the analog of “breaking and entering” so frequently invoked in congressional deliberations has no application, and the concept of “unauthorized” is misplaced.

As for reconsidering in the light of the Van Buren judgment, that does not change anything.

Van Buren’s “gates-up-or-down survey” is consistent with our interpretation of the CFAA as considering three categories of computer systems


Van Buren’s distinction between computer users who “may or may not access a computer system” suggests a baseline in which there are “access limitations” that prevent certain users from accessing the system ( i.e. a “gate” exists and can be either up or down). The Court’s gates-up-or-down inquiry therefore applies to the last two categories of computers we have identified: if permission is required and has been given, the gates are open; if authorization is required and has not been given, the barriers are closed. As we have noted, however, a defining characteristic of public websites is that their publicly accessible sections have no access limitations; instead, these sections are open to anyone with a web browser. In other words, applying the “doors” analogy to a computer hosting publicly accessible web pages, that computer erected no doors to raise or lower in the first place.17 Van Buren thus reinforces our conclusion that the concept of “permissionless” does not apply to public websites.

The court again distinguished Power from the HiQ case by saying that Facebook limited data access only to those who were logged in, as opposed to the more public access available on LinkedIn.

In that case, Facebook sued Power Ventures, a social networking website that aggregated social media information from multiple platforms, for accessing Facebook user data and using that data to send mass messages. as part of a promotional campaign. Identifier. at 1062–63. After Facebook sent a cease and desist letter, Power Ventures continued to bypass IP barriers and access password-protected Facebook member profiles. Identifier. at 1063. We ruled that after receiving an individualized cease-and-desist letter, Power Ventures accessed Facebook computers “without authorization” and was therefore liable under the CFAA. Identifier. at 1067–68. But we specifically acknowledged that “Facebook has attempted to limit and control access to its website” as to the purposes for which Power Ventures sought to use it. Identifier. at 1063. Indeed, Facebook requires its users to register with a unique username and password, and Power Ventures required Facebook users to provide their Facebook username and password to access to their Facebook data on the Power Ventures platform. Facebook, Inc. v. Power Ventures, Inc., 844 F. Supp. 2d 1025, 1028 (ND Cal. 2012). While Power Ventures collected user data protected by Facebook’s username and password authentication system, the data hiQ scraped was accessible to anyone with a web browser.

And so that doesn’t fix the unfortunate precedent in the Power case, but at least it keeps it from getting worse, while clarifying that scraping public web pages isn’t piracy, even if you receive a cease and abstain letter.

Filed Under: 9th circuit, cfaa, scraping, web scraping

Companies: hiq, linkedin


Comments are closed.