Extraction of Structure of German Organizational Charts

This repository contains the orgxtract package and CLI.

Development

We develop with Pipenv. A guide for setting up a dev environment is on the The Hitchhiker’s Guide to Python.

Installation

To run the project Python 3 and a package installer/manager must be installed that can handle pyproject.toml. The following installation steps use Pipenv as example.

Install all dependencies in a virtual environment.

pipenv install -e .

Install the spacy German model.

pipenv run python -m spacy download de_core_news_md

Usage

This is a basic example to extract data from the first page of a PDF.

from orgxtract import Document, TextPipeline
import orgxtract.pdf as pdf

# Return the first page of the PDF
drawing = next(pdf.open("examples/orgchart.pdf"))
document = Document.extract(drawing)

# The with statement is only necessary when using threads.
with TextPipeline() as text_pipeline:
	texts = document.text_contents.values()

	for content in text_pipeline.pipe(texts):
		print(content)

CLI

This package contains a command line tool. It can be executed by running it as script.

The input can be either a PDF or a directory containing multiple PDFs.

pipenv run python -m orgxtract path/to/input -o path/to/output

Use --help to see all parameters

Logging

The package does use the Python logging module. It is enabled in the CLI and the level can be configured.

Data Set

We used a subset from the data from these websites.

License

The code is licensed under the MIT license. For distribution, the licenses of the dependencies must be consulted.

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
example_orgcharts		example_orgcharts
src/orgxtract		src/orgxtract
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extraction of Structure of German Organizational Charts

Development

Installation

Usage

CLI

Logging

Data Set

License

About

Releases

Packages

Contributors 4

Languages

License

FDS-HTW-2024/fds_orgchart

Folders and files

Latest commit

History

Repository files navigation

Extraction of Structure of German Organizational Charts

Development

Installation

Usage

CLI

Logging

Data Set

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages