当前位置: 首页 > news >正文

高端网站制作服务中国互联网中心官网

高端网站制作服务,中国互联网中心官网,潍坊外贸网站建设,开发一个网上商城多少钱Nougat#xff1a;结合光学神经网络#xff0c;引领学术PDF文档的智能解析、挖掘学术论文PDF的价值 这是Nougat的官方存储库#xff0c;Nougat是一种学术文档PDF解析器#xff0c;可以理解LaTeX数学和表格。 Project page: https://facebookresearch.github.io/nougat/ …Nougat结合光学神经网络引领学术PDF文档的智能解析、挖掘学术论文PDF的价值 这是Nougat的官方存储库Nougat是一种学术文档PDF解析器可以理解LaTeX数学和表格。 Project page: https://facebookresearch.github.io/nougat/ 1.安装 From pip: pip install nougat-ocrFrom repository: pip install githttps://github.com/facebookresearch/nougatNote, on Windows: If you want to utilize a GPU, make sure you first install the correct PyTorch version. Follow instructions here 如果您想从API调用模型或生成数据集则会有额外的依赖项。 安装通过 pip install nougat-ocr[api] or pip install nougat-ocr[dataset] 1.2 获取PDF的预测 1.2.1 CLI To get predictions for a PDF run $ nougat path/to/file.pdf -o output_directory目录或文件的路径(其中每行都是PDF的路径)也可以作为位置参数传递 $ nougat path/to/directory -o output_directoryusage: nougat [-h] [--batchsize BATCHSIZE] [--checkpoint CHECKPOINT] [--model MODEL] [--out OUT][--recompute] [--markdown] [--no-skipping] pdf [pdf ...]positional arguments:pdf PDF(s) to process.options:-h, --help show this help message and exit--batchsize BATCHSIZE, -b BATCHSIZEBatch size to use.--checkpoint CHECKPOINT, -c CHECKPOINTPath to checkpoint directory.--model MODEL_TAG, -m MODEL_TAGModel tag to use.--out OUT, -o OUT Output directory.--recompute Recompute already computed PDF, discarding previous predictions.--full-precision Use float32 instead of bfloat16. Can speed up CPU conversion for some setups.--no-markdown Do not add postprocessing step for markdown compatibility.--markdown Add postprocessing step for markdown compatibility (default).--no-skipping Dont apply failure detection heuristic.--pages PAGES, -p PAGESProvide page numbers like 1-4,7 for pages 1 through 4 and page 7. Only works for single PDFs.The default model tag is 0.1.0-small. If you want to use the base model, use 0.1.0-base. $ nougat path/to/file.pdf -o output_directory -m 0.1.0-baseIn the output directory every PDF will be saved as a .mmd file, the lightweight markup language, mostly compatible with Mathpix Markdown (we make use of the LaTeX tables). Note: On some devices the failure detection heuristic is not working properly. If you experience a lot of [MISSING_PAGE] responses, try to run with the --no-skipping flag. Related: #11, #67 1.2.2 API With the extra dependencies you use app.py to start an API. Call $ nougat_api通过向http://127.0.0.1:8503/ predict/发出POST请求来获得PDF文件的预测。它还接受参数“start”和“stop”以限制计算选择页码(包括边界)。 响应是一个带有文档标记文本的字符串。 curl -X POST \http://127.0.0.1:8503/predict/ \-H accept: application/json \-H Content-Type: multipart/form-data \-F filePDFFILE.pdf;typeapplication/pdfTo use the limit the conversion to pages 1 to 5, use the start/stop parameters in the request URL: http://127.0.0.1:8503/predict/?start1stop5 2.Dataset 2.1 生成数据集 To generate a dataset you need A directory containing the PDFsA directory containing the .html files (processed .tex files by LaTeXML) with the same folder structureA binary file of pdffigures2 and a corresponding environment variable export PDFFIGURES_PATH/path/to/binary.jar Next run python -m nougat.dataset.split_htmls_to_pages --html path/html/root --pdfs path/pdf/root --out path/paired/output --figure path/pdffigures/outputsAdditional arguments include ArgumentDescription--recomputerecompute all splits--markdown MARKDOWNMarkdown output dir--workers WORKERSHow many processes to use--dpi DPIWhat resolution the pages will be saved at--timeout TIMEOUTmax time per paper in seconds--tesseractTesseract OCR prediction for each page Finally create a jsonl file that contains all the image paths, markdown text and meta information. python -m nougat.dataset.create_index --dir path/paired/output --out index.jsonlFor each jsonl file you also need to generate a seek map for faster data loading: python -m nougat.dataset.gen_seek file.jsonlThe resulting directory structure can look as follows: root/ ├── images ├── train.jsonl ├── train.seek.map ├── test.jsonl ├── test.seek.map ├── validation.jsonl └── validation.seek.mapNote that the .mmd and .json files in the path/paired/output (here images) are no longer required. This can be useful for pushing to a S3 bucket by halving the amount of files. 2.2Training To train or fine tune a Nougat model, run python train.py --config config/train_nougat.yaml2.3 Evaluation Run python test.py --checkpoint path/to/checkpoint --dataset path/to/test.jsonl --save_path path/to/results.jsonTo get the results for the different text modalities, run python -m nougat.metrics path/to/results.json2.4 FAQ Why am I only getting [MISSING_PAGE]? Nougat was trained on scientific papers found on arXiv and PMC. Is the document you’re processing similar to that? What language is the document in? Nougat works best with English papers, other Latin-based languages might work. Chinese, Russian, Japanese etc. will not work. If these requirements are fulfilled it might be because of false positives in the failure detection, when computing on CPU or older GPUs (#11). Try passing the --no-skipping flag for now. Where can I download the model checkpoint from. They are uploaded here on GitHub in the release section. You can also download them during the first execution of the program. Choose the preferred preferred model by passing --model 0.1.0-{base,small} 参考链接 https://github.com/facebookresearch/nougat 更多优质内容请关注公号汀丶人工智能会提供一些相关的资源和优质文章免费获取阅读。
http://www.yutouwan.com/news/13232/

相关文章:

  • 晟合建设集团网站上海网站建设服务公司
  • 已备案个人网站做淘宝客动态图片在线制作
  • 网站开发前途长沙企业建站
  • 怀化组织部网站编程除了做网站还能干什么
  • 网站百度排名旅行社网站模板
  • 重庆网站制作长沙优化方案2021版语文答案
  • 江苏省实训基地建设网站网站分类查询
  • 水文站网站建设应当坚持单位网站建设费算无形资产吗
  • 招商网站办公家具 技术支持 东莞网站建设
  • 临沂建设质量监督站网站网站建设需要哪些常用技术
  • 抚松网站建设灵台门户网站建设
  • 有什么网站是专做婚礼素材的南京江宁网站制作
  • editplus怎么创网站电商运营网站设计
  • 医疗网站建设机构分类导航wordpress
  • 安徽做网站找谁怎么上传图片到公司网站
  • 零基础 网站百度指数怎么用
  • 建筑公司网站设计模板商城网站免费建设
  • 网站开发的问题企业管理专业大学排名
  • 乐山旅游 英文网站建设网站a记录吗
  • 绍兴专业制作网站做商业广告有什么网站好推销的
  • 山西网站推广公司广州做网站好的公司
  • 虚拟主机 多个网站vue做网站如何优化seo
  • 网站建设后怎么写成都个人建网站
  • 网站专题页策划白银网站网站建设
  • 网站开发区网页设计与制作课程大纲
  • 美食网站开发做网站模板用什么软件
  • 百度建网站多少钱万江区仿做网站
  • 各大网站logo图标网站ip和uv
  • 庆阳门户网站网址大全下载安装
  • 网站负责人照片最好大连网站建设