This page (revision-61) was last changed on 19-Sep-2022 11:15 by Arnab Ghosh Chowdhury

This page was created on 10-May-2022 15:30 by Arnab Ghosh Chowdhury

Only authorized users are allowed to rename pages.

Only authorized users are allowed to delete pages.

Page revision history

Version Date Modified Size Author Changes ... Change note
61 19-Sep-2022 11:15 5 KB Arnab Ghosh Chowdhury to previous

Page References

Incoming links Outgoing links

Version management

Difference between version and

At line 3 changed one line
__%%( color: #003399; font-size: 18px;)Type of tool: Web application
__%%( color: #003399; font-size: 30px;)MAtrix Data Extractor:__
At line 5 changed one line
__%%( color: #003399; font-size: 18px;)Required skills: __
Tabular data extraction from PDF documents is critical task due to diverse PDF templates and Table templates. Some open-source tools do not support all possible types of PDF templates for tabular data extraction. A computer vision based document table detection approach is considered along with Camelot tool to extract tabular information from PDF documents. A post-processing work is necessary after tabular data extraction.
At line 7 removed one line
- Elementary User: No programming
At line 9 changed one line
- Advanced User: Python, Basic Deep Learning (PyTorch)
__%%( color: #003399; font-size: 16px;)Type of tool:__ Web application
At line 11 changed one line
__%%( color: #003399; font-size: 18px;)Short description of the tool: __
__%%( color: #003399; font-size: 16px;)Short description of the tool: __ Extract tabular data from product technical datasheets (PDF documents)
At line 13 changed one line
- Detailed description: [Data Extractor/MatrixDataExtractor_UserGuide.pdf]
Matrix Data Extractor (MDE) is a web-based application that identifies document table regions on PDF documents using Computer Vision based Deep Learning algorithm and extracts data to text files by applying Optical Character Recognition (OCR). It supports to transfer extracted data to MongoDB database tables. A search functionality is also provided to retrieve data on user interface based on Keyword matching (e.g. Manufacturer Name, Technical Datasheet Name, Keyword for Table Data).
At line 15 changed one line
__%%( color: #003399; font-size: 18px;)Disclaimer:__
\\__%%( color: #003399; font-size: 16px;)Required skills: __\\
- Elementary User: No programming\\
- Advanced User: Python, Basic Deep Learning (PyTorch)\\
At line 18 added 7 lines
\\__%%( color: #003399; font-size: 16px;)Required programs %%( color: #003000; font-size: 14px;)(step-by-step guide and links provided in user guideline blow): __
\\- Python
\\- Java
\\- Anaconda
\\- Tool files from the GitHub (link below)
__%%( color: #003399; font-size: 16px;)Disclaimer:__\\
At line 27 added 7 lines
__%%( color: #003399; font-size: 16px;)Screenshots:__
%%carousel
[Data Analytics/Di-Plast Data Validation Screenshot first page.JPG]
[Data Analytics/Di-Plast Data Validation Screenshot Introduction.JPG]
[Data Analytics/Di-Plast Data Validation Screenshot C1.JPG]
[Data Analytics/Di-Plast Data Validation Screenshot C2.JPG]
/%
At line 20 changed one line
__%%( color: #003399; font-size: 18px;)How to use/download/access it:__
\\__%%( color: #003399; font-size: 24px;)This tool supports you to:__\\
- (Text)
At line 22 removed 2 lines
Get the GitHub [https://github.com/cslab-hub/MatrixDataExtractor], copy
the code into your computer, prepare your annotated dataset and start using it\\
At line 39 added 2 lines
\\__%%( color: #003399; font-size: 18x;)Example use case:__\\
- (Text)
At line 26 changed one line
__%%( color: #003399; font-size: 18px;)Use case/problem:__
\\__%%( color: #003399; font-size: 24px;)Tool guideline and access: __
\\ - ⚠️ We recommend to open and save the user guideline before proceeding: [Data Extractor/MatrixDataExtractor_UserGuide.pdf]
\\ - The tool can be accessed throughout the following link: [https://share.streamlit.io/cslab-hub/data_validation/main/main.py]
\\- Get the code/installation files from github [https://cslab-hub-data-validation-main-bx6ggw.streamlitapp.com/] and start using the app by browsing through the pages.
At line 28 removed one line
Extract tabular data from product technical datasheets (PDF documents).
At line 30 changed one line
__%%( color: #003399; font-size: 18px;)__Description of the problem the tools solves__:__
Get the GitHub [https://github.com/cslab-hub/MatrixDataExtractor], copy
the code into your computer, prepare your annotated dataset and start using it\\
At line 32 removed one line
Tabular data extraction from PDF documents is critical task due to diverse PDF templates and Table templates. Some open-source tools do not support all possible types of PDF templates for tabular data extraction. A computer vision based document table detection approach is considered along with Camelot tool to extract tabular information from PDF documents. A post-processing work is necessary after tabular data extraction.
At line 34 removed 21 lines
__%%( color: #003399; font-size: 18px;)__Contact person of the tool__: Arnab Ghosh Chowdhury, Osnabrueck University.
__%%( color: #003399; font-size: 18px;)Related tools:__
- Analyse and Visualize your process data with data analytics -> [Data Analytics]
- Get guidance to set up a working data infrastucture -> [Data Infrastructure Wiki]
- Find the right sensor to survey your process -> [Sensor Tool]
- Improve internal information and material flow -> [VSM]
- Match material requirements with material properties -> [Matrix]
!Tool Description
Matrix Data Extractor (MDE) is a web-based application that identifies document table regions on PDF documents using Computer Vision based Deep Learning algorithm and extracts data to text files by applying Optical Character Recognition (OCR). It supports to transfer extracted data to MongoDB database tables. A search functionality is also provided to retrieve data on user interface based on Keyword matching (e.g. Manufacturer Name, Technical Datasheet Name, Keyword for Table Data).
! Guidelines
Before getting started, please take a look at [Data Extractor/MatrixDataExtractor_UserGuide.pdf] and make yourself familiar with how to use the tool.
At line 61 added 16 lines
\\__%%( color: #003399; font-size: 16px;)Contact person of the tool: __
Arnab Ghosh Chowdhury, [mailto:arnab.ghosh.chowdhury@uni-osnabrueck.de] form the Osnabrueck University.
__%%( color: #003399; font-size: 16px;) Before applying this tool:__
\\We recommend also taking a look at the following Di-Plast tools below. They can help you to gather necessary information and data, help to better prepare your data and continue working with it afterwards:
\\--> Improve internal information and material flow -> [VSM]
\\--> Get guidance to set up a working data infrastructure -> [Data Infrastructure Wiki]
\\--> Find the right sensor to survey your process -> [Sensor Tool]
\\-->Analyse and Visualize your process data with data analytics -> [Data Analytics]
\\-->Get important insights in enhancing your data understanding
-> [Exploratory Pattern Analytics]
\\
\\__%%( color: #003399; font-size: 16px;)After applying this tool:__
\\-->Match material requirements with material properties -> [Matrix]