KEYWORDS: Data analysis, Industry, Principal component analysis, Visualization, Electroluminescence, Technology, Data processing, Data conversion, Visual analytics, Education and training
The talent structure has been flattened, from the pyramid talent structure to the olive talent structure. Jobs are more complicated, industries are upgraded from low-end to high-end, and the situation faced by jobs is more complex. Talent specification compounding, industry understanding, data thinking, programming ability, innovative thinking and comprehensive problem-solving ability. With the rapid development of science and technology, the demand for social talents is constantly changing, and the big data wave is coming,How to match the talent training of the school with the talent demand of the society. This paper uses the crawler technology to obtain data from the recruitment website, through data collection, cleaning, and data preprocessing, and finally carries out data analysis and visual display.
In the field of artificial intelligence and machine learning, only when enterprises obtain a large amount of data can they train enough reliable models [1]. How to obtain massive data at a low cost has become one of the key prerequisites for the success of data intelligence enterprises. Mastering a large amount of data is an important prerequisite for gaining competitive advantage [2]. There is a cognitive trend among enterprises with massive data. If the massive data as their advantages are collected by peers, their advantages will be weakened or even lost. Therefore, more and more massive data owners adopt various mechanisms to protect their public data in network applications and avoid data being crawled by crawlers [3]. From the perspective of data collectors, this paper introduces some common anti crawling mechanisms in details based on the Scrapy framework and the recruitment website of a well-known internet enterprise, and then gives some techniques to circumvent the above crawling mechanism. Finally, it successfully crawls all the job information on the recruitment website of the enterprise. The experimental results show that the techniques provided by the paper can effectively bypass the anti-crawling mechanism of some large websites, so as to help collectors obtain massive data.
Web crawler (also known as web spider) is a program or script that automatically grabs data from websites according to certain rules. Like a spider crawling along the silk thread of URLs on the internet, it downloads the web page pointed to by each URL, and extracts and analyzes the contents of the page. Through the web crawler program, the massive data on the target website can be automatically collected and saved in a structured file or in database. Therefore, crawler owners can obtain a large amount of data with potential economic value at very low time cost and tiny economic cost. This article takes http://novel.tingroom.com/ as the target, and introduces in detail the general steps of how to use the Python based crawler program to obtain massive data (novel content): Firstly, analyze the structure of the target page. Secondly, use the requests module to get the target page. Thirdly: use the parsel module to extract the valuable parts of the target page. Finally, save the information into structured files. The empirical research shows that the crawler program based on Python has significant practical value.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.