Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support export and import knowledge base #490

Open
1 task
sykp241095 opened this issue Dec 10, 2024 · 5 comments
Open
1 task

Support export and import knowledge base #490

sykp241095 opened this issue Dec 10, 2024 · 5 comments
Assignees

Comments

@sykp241095
Copy link
Member

sykp241095 commented Dec 10, 2024

Description

Providing the import and export knowledge bases feature to enable user can reused the chunks / knowledge graph across multiple Autoflow instances, avoiding the repeated costs of embedding and knowledge graph extraction.

Design

What kind of files are used to transmit the knowledge base data?

export KB data to csv files?

export related uploads files into a folders named uploads

migration_kb_data
  - kb.{kb_id}.uploads.csv
  - kb.{kb_id}.documents.csv
  - kb.{kb_id}.chunks.csv
  - kb.{kb_id}.entities.csv
  - kb.{kb_id}.relationships.csv
  - uploads
    - xxxx.md
    - xxxx.pdf

Consideration

  • Whether to support import to the existing knowledge base
  • the upload / document / user id may be changed.

TODO

  • Support export and import knowledge base via CLI
@634750802
Copy link
Collaborator

Related #398

@Mini256
Copy link
Member

Mini256 commented Jan 3, 2025

Do we really have such a scenario?

@Mini256 Mini256 removed this from the Release v0.5.0 milestone Jan 3, 2025
@sykp241095
Copy link
Member Author

Do we really have such a scenario?

Yes, sometimes when user uses a local and private network environment, it is difficult for them to download docs.pingcap.com or other online docs. This function can help them to download an existing knowledge base and import it to their own self-hosted autoflow easily.

@Mini256
Copy link
Member

Mini256 commented Jan 3, 2025

help them to download an existing knowledge base

What would the existing knowledge base be, a internal website or a folder containing a lot of local files? Please provide a detailed description in the issue description.

If the data source is not common, we should use custom script to implement

import it to their own self-hosted autoflow

Why not using upload local file data source? Do we have to use CLI to upload?

@sykp241095
Copy link
Member Author

sykp241095 commented Jan 3, 2025

What would the existing knowledge base

For examples, TiDB knowledgebase, redis kb, mongodb kb.

Why not using upload local file data source

  • Cost:
    If we add tidb knowledge by crawl docs.pingcap.com, users should pay again for llm while extract knowledge graphs from about thousand of pages;
    if we achieve this by upload an about 100MB tidb-user-guide.pdf, it still need llm to extract the whole knowledge graph from this pdf file, it will cost about $50< cost <$100 maybe.

  • LLM Performance
    Users may not have smartest llm for knowledge graph extraction, for example many users use llama3.* 32B, or self-hosted model. these llm didn't have high performance for extracting and building graphs

Do we have to use CLI to upload?

The ultra solution might be a UI based export/import experience, I think.

@Mini256 Mini256 changed the title core(kb): support export and import a knowledge base via cli Support export and import knowledge base Jan 3, 2025
@sykp241095 sykp241095 assigned Icemap and unassigned sszgwdk Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants