Data crawling must follow rules

2022-07-30 0 By

Data crawler technology as an important means of data acquisition, are widely used in many fields of the Internet has also sparked a growing number of disputes, such as relevant Internet even criminal crime of unfair competition, between enterprise data need to be addressed in the development, has become one of the need to further clarify the focus of law practice.Recently, Shanghai Yangpu District Procuratorate organized the “Legal compliance Seminar of data crawler”, and legal practitioners, university experts and scholars and enterprise representatives carried out in-depth discussions on the concepts and technical principles of data crawler, industry autonomy norms, legal boundaries and law application and other topics.In the era of big data, data resource is an important foundation for the development of Internet enterprises.At present, data crawler is one of the common technical means for enterprises to collect public data.Data crawler technology can realize the mass capture of text, picture, audio, video and other Internet information.So, what is data crawler on earth, and what is its technical principle?In this regard, Shao Min, the prosecutor of Shanghai Yangpu District Procuratorate, believes that crawler is a kind of automatic browsing network program, which automatically captures Internet data and information by simulating manual click according to the set rules, so as to read or collect Internet data automatically and efficiently.The basic principle of this technology is to set up a queue of URLS to be crawled (uniform resource locator) according to the search purpose, take out the URL, access the corresponding page of the URL, and carry out page parsing, extract all the urls on the page and store them in the queue to be crawled.This loop crawls until all urls in the URL queue are crawled or certain stop conditions of the system are met.Liu Yuchen, head of L ‘Oreal China digitisation, said that from a technical point of view, data crawler is to use programs to simulate the human through a browser (or App) to access the Internet, and efficiently capture the required data information on the Internet.Crawler can capture all the data, and it can also capture the data needed according to the conditions.Of course, improper application of data crawler technology will also have adverse effects. Zeng Xiang, general counsel of Xiaohongshu, analyzed that improper application of data crawler technology may not only infringe the rights of individuals and platforms, but also destroy the order of Internet public management, thus leading to waste of social resources.According to Shao Min, websites usually take appropriate measures, such as using Robots protocol, crawler detection, strengthening Web sites, setting verification codes and other restrictions on crawler access to prevent crawlers from over-capturing data.Among them,Robots protocol, due to its simplicity and efficiency, has become a widely accepted and observed technical specification in the Internet industry at home and abroad.Robots protocol mainly restricts the behavior of network crawling data.The data party to be crawled will put the Robots protocol file with the range of crawlable information on the website, and only the data crawler is allowed to climb the data within the range of the protocol.Gao Fuping, professor of East China University of Political Science and Law, believes that Robots protocol emerged in the context of the birth and development of search engines. It is the result of the game between Internet enterprises and a compromise reached on the basis of commercial interests, users’ personal interests and website security.Its main function is to play a kind of exclusion, when some websites do not want their data to be captured by search engines, these network robots will automatically exclude these unwilling to be captured content.The range that cannot be crawled by Robots protocol is the red line of the crawler, and data cannot be crawled beyond this red line.Gao Fuping believes that crawler is a means to support data economy. On this premise, the following factors can be taken into consideration to judge the validity boundary of crawler: First, whether data belongs to open data.Whether the data is open or not is not the criterion of legality judgment. Open data is not necessarily equal to open data.The second is whether the means of obtaining data are legal.Whether the technology adopted by crawler breaks through data access control and whether it breaks through the Robots protocol of website or App legally;The third is whether the purpose of use is legal.If the purpose of the crawler is to substantially replace part of the product content or service provided by the crawler operator, the purpose will be considered illegal.The fourth is whether to cause damage.Whether the crawler substantially hinders the normal operation of the crawler operator, increases the operation cost unreasonably, and destroys the normal operation of the system.From the perspective of civil law regulation, four cases can be distinguished for data crawlers that transcend legal boundaries. First, crawler behavior of public data.If the data right party informs the crawling scope and other obligations that should be complied with in the Robots protocol or web page, the crawling party fails to comply with the obligations, it shall bear corresponding civil liabilities.Secondly, the behavior of breaking through the anti-crawler technology Settings of websites or apps.Crawler breaks through data access control technically, such as breaking through Robots protocol of website or App, crawler detection and Web site reinforcement and other restrictions on crawler access, which may be illegal and should bear corresponding civil liability.Thirdly, improper behavior for the purpose of data use.If the crawler data is used for the purpose of substantially replacing part of the product content or service provided by the crawler operator, it is a violation of the legitimate rights and interests of the right party and the corresponding civil liability shall be borne.Finally, for the behavior causing damage to the right holder.If the crawler behavior substantially hinders the normal operation of the right owner, increases the operation cost of the right owner unreasonably, destroys the normal operation of the network system and causes losses to the right owner, the right owner can file an infringement lawsuit against the crawler actor.Shao Min proposed to delimit the boundary of legal use of data crawler technology from three aspects: First, legal network data crawler should be limited to the acquisition of open data.If the web crawler obtains non-open data, it is suspected of illegal or even criminal;Secondly, data crawler technology used legally should not be intrusive, so to speak, the intrusion of crawler is the main embodiment of its illegality.Thirdly, data crawling should be based on legitimate purposes, and the acquisition of open data may be illegal because it does not conform to legitimate purposes.The crawling of open non-commercial data shall be required to serve the fundamental purpose of public interest.The crawling of open commercial data can refer to the principle of fair use in copyright law, which requires the purpose of fair use.Criminal legal regulation on data crawler, can from the crawler behavior and data using the two aspects: first, know perfectly well without authorization shunned or forced website or App technology Settings of the creeper crawled behavior, belong to the “unauthorized access and to get the data, the offender shall be including the criminal responsibility shall bear corresponding responsibility.According to the provisions of The Criminal law of China, breaking through the technical barrier to invade the computer system of others and obtaining the data in the system may involve the crime of illegally trespassing the computer information system, illegally obtaining the data of the computer information system and destroying the computer information system.In addition, if the use of crawler technology to illegally obtain citizens’ personal information, may violate the crime of violating citizens’ personal information.Secondly, criminal law also has special regulations on the criminal acts carried out by crawling data.If the information data obtained is disseminated, utilized or transformed, it may involve the crime of spreading obscene materials, the crime of infringing commercial secrets and the crime of infringing copyright.Xiao Feng (Author: Yangpu District People’s Procuratorate, Shanghai) Source: