爬虫 Archives

微图坊爬虫 [Chrome Support]【22.09.04】【Windows】

2022年9月4日 97 条评论

更新记录：

1. 修复了地址中由于/引起的404问题。

更新记录： 1. 修复了地址中由于/引起的404问题。

更新记录：
1. 修复了地址中由于/引起的404问题。

业余爱好『Favourite』

全国统计用区划代码和城乡划分代码[爬虫代码]【Json+CSV格式】

2022年8月31日 3 条评论

页面地址：http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2021/11/01/01/110101001.html 最近需要使用最新的行政区划信息，虽然统计局公开了相关的数据，但是并没有提供数据文件。于是，就写了个爬虫把所有的数据爬取了一遍。生成的默认数据格式为json，另外提供了一个工具来把json转成csv。

业余爱好『Favourite』

微图坊爬虫 [Chrome Support]【22.08.21】【Windows】

2022年8月21日 10 条评论

更新记录：

1.修复部分页面链接失效导致创建目录之后不能下载的问题；

2.修复登录模式下超出浏览次数导致下载失败的问题，提前结束进程；

更新记录： 1.修复部分页面链接失效导致创建目录之后不能下载的问题； 2.修复登录模式下超出浏览次数导致下载失败的问题，提前结束进程；

更新记录：
1.修复部分页面链接失效导致创建目录之后不能下载的问题；
2.修复登录模式下超出浏览次数导致下载失败的问题，提前结束进程；

微软『Windows』

微图坊爬虫【22.06.07】【Windows】

2022年6月7日 77 条评论

Change Log:

1. Install newst chrome before use this program.

2. Open chrome and login to v2ph.com

3. The spider will auto stop after crawl 16 albums

1. Install newst chrome before use this program. 2. Open chrome and login to v2ph.com 3. The spider will auto stop after crawl 16 albums

1. Install newst chrome before use this program.
2. Open chrome and login to v2ph.com
3. The spider will auto stop after crawl 16 albums

Usage:

(venv) PS F:\Pycharm_Projects\meitulu-spider> python .\v2ph.py

Arguments:

-a <download all site images>

-q <query the image with keywords>

-h <display help text, just this>

Option Arguments:

-p <image download path>

-r <random index category list>

-c <single category url>

-e <early stop, work in site crawl mode only>

-s <site url eg: https://www.v2ph.com (no last backslash "/")>

****************************************************************************************************

(venv) PS F:\Pycharm_Projects\meitulu-spider> python .\v2ph.py Arguments: -a <download all site images> -q <query the image with keywords> -h <display help text, just this> Option Arguments: -p <image download path> -r <random index category list> -c <single category url> -e <early stop, work in site crawl mode only> -s <site url eg: https://www.v2ph.com (no last backslash "/")> ****************************************************************************************************

(venv) PS F:\Pycharm_Projects\meitulu-spider> python .\v2ph.py
Arguments:
         -a <download all site images>
         -q <query the image with keywords>
         -h <display help text, just this>
Option Arguments:
         -p <image download path>
         -r <random index category list>
         -c <single category url>
         -e <early stop, work in site crawl mode only>
         -s <site url eg: https://www.v2ph.com (no last backslash "/")>
****************************************************************************************************

微软『Windows』

KU138爬虫【22.05.23】【Windows】

2022年5月23日 48 条评论

****************************************************************************************************

USAGE:

spider -h <help> -a <all> -q <search>

Arguments:

-a <download all site images>

-q <query the image with keywords>

-h <display help text, just this>

Option Arguments:

-p <image download path>

-r <random index category list>

-c <single category url>

-e <early stop, work in site crawl mode only>

-s <site url eg: https://www.v2ph.com (no last backslash "/")>

****************************************************************************************************

**************************************************************************************************** USAGE: spider -h <help> -a <all> -q <search> Arguments: -a <download all site images> -q <query the image with keywords> -h <display help text, just this> Option Arguments: -p <image download path> -r <random index category list> -c <single category url> -e <early stop, work in site crawl mode only> -s <site url eg: https://www.v2ph.com (no last backslash "/")> ****************************************************************************************************

****************************************************************************************************
USAGE:
spider -h <help> -a <all> -q <search>
Arguments:
         -a <download all site images>
         -q <query the image with keywords>
         -h <display help text, just this>
Option Arguments:
         -p <image download path>
         -r <random index category list>
         -c <single category url>
         -e <early stop, work in site crawl mode only>
         -s <site url eg: https://www.v2ph.com (no last backslash "/")>
****************************************************************************************************

微软『Windows』

微图坊爬虫【22.05.16】【Windows】

2022年5月16日 65 条评论

使用参数：

****************************************************************************************************

USAGE:

spider -h <help> -a <all> -q <search>

Arguments:

-a <download all site images>

-q <query the image with keywords>

-h <display help text, just this>

Option Arguments:

-p <image download path>

-r <random index category list>

-c <single category url>

-e <early stop, work in site crawl mode only>

-s <site url eg: https://www.v2ph.com (no last backslash "/")>

****************************************************************************************************

使用参数： **************************************************************************************************** USAGE: spider -h <help> -a <all> -q <search> Arguments: -a <download all site images> -q <query the image with keywords> -h <display help text, just this> Option Arguments: -p <image download path> -r <random index category list> -c <single category url> -e <early stop, work in site crawl mode only> -s <site url eg: https://www.v2ph.com (no last backslash "/")> ****************************************************************************************************

使用参数：

****************************************************************************************************
USAGE:
spider -h <help> -a <all> -q <search>
Arguments:
         -a <download all site images>
         -q <query the image with keywords>
         -h <display help text, just this>
Option Arguments:
         -p <image download path>
         -r <random index category list>
         -c <single category url>
         -e <early stop, work in site crawl mode only>
         -s <site url eg: https://www.v2ph.com (no last backslash "/")>
****************************************************************************************************