ftp上传网站之后,专业南京网站建设,律师网站设计,推荐专业做网站公司一、Kaggle数据集如何下载
1.1 问题的起因
最近看到了 Google 组织的 Kaggle 比赛#xff0c;想自己试一下#xff0c;但是数据集太大了#xff0c;将近有370G的数据。直接下载的话#xff0c;网速太慢#xff0c;可能要下载3-4天#xff0c;所以萌生了用命令语句下载的…一、Kaggle数据集如何下载
1.1 问题的起因
最近看到了 Google 组织的 Kaggle 比赛想自己试一下但是数据集太大了将近有370G的数据。直接下载的话网速太慢可能要下载3-4天所以萌生了用命令语句下载的想法。 1.2 解决方法
一开始的想法简单粗暴直接 wget 浏览器获取到的链接然后在服务器上直接 wget结果一试果然不行。
然后就搜索了下发现官方提供了下载的命令行工具直接pip安装之后就可用。
https://github.com/Kaggle/kaggle-api下面就写一下自己总结的关键步骤。
1.2.1 安装 Kaggle API
确保您已安装 Python 和包管理器 pip。 运行以下命令以使用命令行访问 Kaggle API
pip install kaggle 可能需要在 Mac/Linux 上执行
pip install --user kaggle如果在安装过程中出现问题建议执行此操作。
通过 root 用户完成的安装即 sudo pip install kaggle将无法正常工作除非你明白你在做什么。 即使这样它们仍然可能无法工作。
如果出现权限错误强烈建议用户安装。
如果您遇到 kaggle: command not found 错误请确保您的 Python 二进制文件位于您的路径上。
您可以通过执行 pip uninstall kaggle 并查看二进制文件的位置来查看 kaggle 的安装位置。 对于 Linux 上的本地用户安装默认位置是 ~/.local/bin 在 Windows 上默认位置是 $PYTHON_HOME/Scripts。
我是在 Windows 上运行的
pip install kaggle我们的输出为
(PyTorch) F:\kagglepip install kaggle
Collecting kaggleDownloading kaggle-1.5.16.tar.gz (83 kB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 83.6/83.6 kB 130.5 kB/s eta 0:00:00Preparing metadata (setup.py) ... done
Requirement already satisfied: six1.10 in d:\anaconda\envs\pytorch\lib\site-packages (from kaggle) (1.16.0)
Requirement already satisfied: certifi in d:\anaconda\envs\pytorch\lib\site-packages (from kaggle) (2022.12.7)
Requirement already satisfied: python-dateutil in d:\anaconda\envs\pytorch\lib\site-packages (from kaggle) (2.8.2)
Requirement already satisfied: requests in d:\anaconda\envs\pytorch\lib\site-packages (from kaggle) (2.31.0)
Requirement already satisfied: tqdm in d:\anaconda\envs\pytorch\lib\site-packages (from kaggle) (4.65.0)
Collecting python-slugifyDownloading python_slugify-8.0.1-py2.py3-none-any.whl (9.7 kB)
Requirement already satisfied: urllib3 in d:\anaconda\envs\pytorch\lib\site-packages (from kaggle) (1.26.12)
Requirement already satisfied: bleach in d:\anaconda\envs\pytorch\lib\site-packages (from kaggle) (5.0.1)
Requirement already satisfied: webencodings in d:\anaconda\envs\pytorch\lib\site-packages (from bleach-kaggle) (0.5.1)
Collecting text-unidecode1.3Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.2/78.2 kB 543.3 kB/s eta 0:00:00
Requirement already satisfied: charset-normalizer4,2 in d:\anaconda\envs\pytorch\lib\site-packages (from requests-kaggle) (3.1.0)
Requirement already satisfied: idna4,2.5 in d:\anaconda\envs\pytorch\lib\site-packages (from requests-kaggle) (3.4)
Requirement already satisfied: colorama in d:\anaconda\envs\pytorch\lib\site-packages (from tqdm-kaggle) (0.4.6)
Building wheels for collected packages: kaggleBuilding wheel for kaggle (setup.py) ... doneCreated wheel for kaggle: filenamekaggle-1.5.16-py3-none-any.whl size110697 sha256b988a133c1466dda33402c76755602048d45d3e79d6600b04c67842c464b53ecStored in directory: c:\users\xiaowang\appdata\local\pip\cache\wheels\43\4b\fb\736478af5e8004810081a06259f9aa2f7c3329fc5d03c2c412
Successfully built kaggle
Installing collected packages: text-unidecode, python-slugify, kaggle
Successfully installed kaggle-1.5.16 python-slugify-8.0.1 text-unidecode-1.31.2.2 创建token
登录 kaggle 自己的主页https://www.kaggle.com/USER_NAME/account找到 API点击 create api token 按钮生成 kaggle.json 配置文件文件中便包含了用户名和 token 串。 将该文件移动至 kaggle 默认的路径下~/.kaggle/kaggle.json我的放置路径为
C:\Users\XiaoWang\.kaggle如果在用户路径下没有找到 .kaggle 的文件夹自己新建一个
这里需要注意kaggle.json 文件除了可以配置用户名和 token 外还可以配置 proxy 等内容具体参考如下
usage: kaggle config set [-h] -n NAME -v VALUErequired arguments:-n NAME, --name NAME Name of the configuration parameter(one of competition, path, proxy)-v VALUE, --value VALUEValue of the configuration parameter, valid values depending on name- competition: Competition URL suffix (use kaggle competitions list to show options)- path: Folder where file(s) will be downloaded, defaults to current working directory- proxy: Proxy for HTTP requests当然也可以直接编辑 kaggle.json 文件。编辑好后执行 kaggle config view查看当前配置。
(PyTorch) F:\kagglekaggle config view
Configuration values from C:\Users\XiaoWang\.kaggle
- username: *****
- path: F:/kaggle
- proxy: None
- competition: None1.2.3 下载数据
上面都准备好之后找到要下载数据的页面就可以进行数据下载了。这里以我要下载数据的地址为例
https://www.kaggle.com/competitions/google-research-identify-contrails-reduce-global-warming我们找到下面的数据集下载的 API 命令 kaggle competitions download -c google-research-identify-contrails-reduce-global-warming即可看到如下命令提示
(PyTorch) F:\kagglekaggle competitions download -c google-research-identify-contrails-reduce-global-warming
Downloading google-research-identify-contrails-reduce-global-warming.zip to F:/kaggle\competitions\google-research-identify-contrails-reduce-global-warming16%|███████████████▋ | 47.4G/302G [1:21:356:24:02, 11.9MB/s]更多的数据下载方式如下
usage: kaggle datasets download [-h] [-f FILE_NAME] [-p PATH] [-w] [--unzip][-o] [-q][dataset]optional arguments:-h, --help show this help message and exitdataset Dataset URL suffix in format owner/dataset-name (use kaggle datasets list to show options)-f FILE_NAME, --file FILE_NAMEFile name, all files downloaded if not provided(use kaggle datasets files -d dataset to show options)-p PATH, --path PATH Folder where file(s) will be downloaded, defaults to current working directory-w, --wp Download files to current working path--unzip Unzip the downloaded file. Will delete the zip file when completed.-o, --force Skip check whether local version of file is up to date, force file download-q, --quiet Suppress printing information about the upload/download progress