(Please insert an screenshot of the Tool)
Type of tool: Web application
Required skills:
- Elementary User: No programming
- Advanced User: Python, Basic Deep Learning (PyTorch)
Short description of the tool:
- Detailed description: Data Extractor/MatrixDataExtractor_UserGuide.pdf
Disclaimer:
Any support to provide table detection model will not be provided unfortunately after project completion. The accuracy of table detection model depends on various factors such as volume, variety of annotated datasets, hyperparameters of model. You can do your experiment to get better accuracy of your table detection model.
How to use/download/access it:
Get the GitHub https://github.com/cslab-hub/MatrixDataExtractor, copy
the code into your computer, prepare your annotated dataset and start using it
Use case/problem:
Extract tabular data from product technical datasheets (PDF documents).
Description of the problem the tools solves:
Tabular data extraction from PDF documents is critical task due to diverse PDF templates and Table templates. Some open-source tools do not support all possible types of PDF templates for tabular data extraction. A computer vision based document table detection approach is considered along with Camelot tool to extract tabular information from PDF documents. A post-processing work is necessary after tabular data extraction.
Contact person of the tool: Arnab Ghosh Chowdhury, Osnabrueck University.
Related tools:
- Analyse and Visualize your process data with data analytics -> Data Analytics
- Get guidance to set up a working data infrastucture -> Data Infrastructure Wiki
- Find the right sensor to survey your process -> Sensor Tool
- Improve internal information and material flow -> VSM
- Match material requirements with material properties -> Matrix