diff --git a/docs/docs/integrations/document_loaders/google_alloydb.ipynb b/docs/docs/integrations/document_loaders/google_alloydb.ipynb new file mode 100644 index 0000000000000..de1955a696d0e --- /dev/null +++ b/docs/docs/integrations/document_loaders/google_alloydb.ipynb @@ -0,0 +1,379 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "E_RJy7C1bpCT" + }, + "source": [ + "# Google AlloyDB for PostgreSQL\n", + "\n", + "> [AlloyDB](https://cloud.google.com/alloydb) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. AlloyDB is 100% compatible with PostgreSQL. Extend your database application to build AI-powered experiences leveraging AlloyDB's Langchain integrations.\n", + "\n", + "This notebook goes over how to use `AlloyDB for PostgreSQL` to load Documents with the `AlloyDBLoader` class." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xjcxaw6--Xyy" + }, + "source": [ + "## Before you begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + " * [Enable the AlloyDB Admin API.](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com)\n", + " * [Create a AlloyDB cluster and instance.](https://cloud.google.com/alloydb/docs/cluster-create)\n", + " * [Create a AlloyDB database.](https://cloud.google.com/alloydb/docs/quickstart/create-and-connect)\n", + " * [Add a User to the database.](https://cloud.google.com/alloydb/docs/database-users/about)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IR54BmgvdHT_" + }, + "source": [ + "### 🦜🔗 Library Installation\n", + "Install the integration library, `langchain-google-alloydb-pg`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "0ZITIDE160OD", + "outputId": "90e0636e-ff34-4e1e-ad37-d2a6db4a317e" + }, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet langchain-google-alloydb-pg" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v40bB_GMcr9f" + }, + "source": [ + "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6o0iGVIdDD6K" + }, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cTXTbj4UltKf" + }, + "source": [ + "### 🔐 Authentication\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "* If you are using Colab to run this notebook, use the cell below and continue.\n", + "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Uj02bMRAc9_c" + }, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "wnp1R1PYc9_c", + "outputId": "6502c721-a2fd-451f-b946-9f7b850d5966" + }, + "outputs": [], + "source": [ + "# @title Project { display-mode: \"form\" }\n", + "PROJECT_ID = \"gcp_project_id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "! gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "id": "rEWWNoNnKOgq", + "metadata": { + "id": "rEWWNoNnKOgq" + }, + "source": [ + "### 💡 API Enablement\n", + "The `langchain-google-alloydb-pg` package requires that you [enable the AlloyDB Admin API](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5utKIdq7KYi5", + "metadata": { + "id": "5utKIdq7KYi5" + }, + "outputs": [], + "source": [ + "# enable AlloyDB Admin API\n", + "!gcloud services enable alloydb.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "id": "f8f2830ee9ca1e01", + "metadata": { + "id": "f8f2830ee9ca1e01" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "id": "OMvzMWRrR6n7", + "metadata": { + "id": "OMvzMWRrR6n7" + }, + "source": [ + "### Set AlloyDB database variables\n", + "Find your database values, in the [AlloyDB Instances page](https://console.cloud.google.com/alloydb/clusters)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "irl7eMFnSPZr", + "metadata": { + "id": "irl7eMFnSPZr" + }, + "outputs": [], + "source": [ + "# @title Set Your Values Here { display-mode: \"form\" }\n", + "REGION = \"us-central1\" # @param {type: \"string\"}\n", + "CLUSTER = \"my-cluster\" # @param {type: \"string\"}\n", + "INSTANCE = \"my-primary\" # @param {type: \"string\"}\n", + "DATABASE = \"my-database\" # @param {type: \"string\"}\n", + "TABLE_NAME = \"vector_store\" # @param {type: \"string\"}" + ] + }, + { + "cell_type": "markdown", + "id": "QuQigs4UoFQ2", + "metadata": { + "id": "QuQigs4UoFQ2" + }, + "source": [ + "### AlloyDBEngine Connection Pool\n", + "\n", + "One of the requirements and arguments to establish AlloyDB as a vector store is a `AlloyDBEngine` object. The `AlloyDBEngine` configures a connection pool to your AlloyDB database, enabling successful connections from your application and following industry best practices.\n", + "\n", + "To create a `AlloyDBEngine` using `AlloyDBEngine.from_instance()` you need to provide only 4 things:\n", + "\n", + "1. `project_id` : Project ID of the Google Cloud Project where the AlloyDB instance is located.\n", + "1. `region` : Region where the AlloyDB instance is located.\n", + "1. `cluster`: The name of the AlloyDB cluster.\n", + "1. `instance` : The name of the AlloyDB instance.\n", + "1. `database` : The name of the database to connect to on the AlloyDB instance.\n", + "\n", + "By default, [IAM database authentication](https://cloud.google.com/alloydb/docs/connect-iam) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the environment.\n", + "\n", + "Optionally, [built-in database authentication](https://cloud.google.com/alloydb/docs/database-users/about) using a username and password to access the AlloyDB database can also be used. Just provide the optional `user` and `password` arguments to `AlloyDBEngine.from_instance()`:\n", + "* `user` : Database user to use for built-in database authentication and login\n", + "* `password` : Database password to use for built-in database authentication and login.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note**: This tutorial demonstrates the async interface. All async methods have corresponding sync methods." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_alloydb_pg import AlloyDBEngine\n", + "\n", + "engine = await AlloyDBEngine.afrom_instance(\n", + " project_id=PROJECT_ID,\n", + " region=REGION,\n", + " cluster=CLUSTER,\n", + " instance=INSTANCE,\n", + " database=DATABASE,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e1tl0aNx7SWy" + }, + "source": [ + "### Create AlloyDBLoader" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "z-AZyzAQ7bsf" + }, + "outputs": [], + "source": [ + "from langchain_google_alloydb_pg import AlloyDBLoader\n", + "\n", + "# Creating a basic AlloyDBLoader object\n", + "loader = await AlloyDBLoader.create(engine, table_name=TABLE_NAME)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PeOMpftjc9_e" + }, + "source": [ + "### Load Documents via default table\n", + "The loader returns a list of Documents from the table using the first column as page_content and all other columns as metadata. The default table will have the first column as\n", + "page_content and the second column as metadata (JSON). Each row becomes a document." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cwvi_O5Wc9_e" + }, + "outputs": [], + "source": [ + "docs = await loader.aload()\n", + "print(docs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kSkL9l1Hc9_e" + }, + "source": [ + "### Load documents via custom table/metadata or custom page content columns" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "loader = await AlloyDBLoader.create(\n", + " engine,\n", + " table_name=TABLE_NAME,\n", + " content_columns=[\"product_name\"], # Optional\n", + " metadata_columns=[\"id\"], # Optional\n", + ")\n", + "docs = await loader.aload()\n", + "print(docs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5R6h0_Cvc9_f" + }, + "source": [ + "### Set page content format\n", + "The loader returns a list of Documents, with one document per row, with page content in specified string format, i.e. text (space separated concatenation), JSON, YAML, CSV, etc. JSON and YAML formats include headers, while text and CSV do not include field headers.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NGNdS7cqc9_f" + }, + "outputs": [], + "source": [ + "loader = AlloyDBLoader.create(\n", + " engine,\n", + " table_name=\"products\",\n", + " content_columns=[\"product_name\", \"description\"],\n", + " format=\"YAML\",\n", + ")\n", + "docs = await loader.aload()\n", + "print(docs)" + ] + } + ], + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/document_loaders/google_bigtable.ipynb b/docs/docs/integrations/document_loaders/google_bigtable.ipynb new file mode 100644 index 0000000000000..f6fb2005fd730 --- /dev/null +++ b/docs/docs/integrations/document_loaders/google_bigtable.ipynb @@ -0,0 +1,466 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google Bigtable\n", + "\n", + "> [Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data. Extend your database application to build AI-powered experiences leveraging Bigtable's Langchain integrations.\n", + "\n", + "This notebook goes over how to use [Bigtable](https://cloud.google.com/bigtable) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `BigtableLoader` and `BigtableSaver`.\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-bigtable-python/blob/main/docs/document_loader.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + "* [Create a Bigtable instance](https://cloud.google.com/bigtable/docs/creating-instance)\n", + "* [Create a Bigtable table](https://cloud.google.com/bigtable/docs/managing-tables)\n", + "* [Create Bigtable access credentials](https://developers.google.com/workspace/guides/create-credentials)\n", + "\n", + "After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please specify an instance and a table for demo purpose.\n", + "INSTANCE_ID = \"my_instance\" # @param {type:\"string\"}\n", + "TABLE_ID = \"my_table\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "\n", + "The integration lives in its own `langchain-google-bigtable` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -upgrade --quiet langchain-google-bigtable" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🔐 Authentication\n", + "\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "- If you are using Colab to run this notebook, use the cell below and continue.\n", + "- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Using the saver\n", + "\n", + "Save langchain documents with `BigtableSaver.add_documents()`. To initialize `BigtableSaver` class you need to provide 2 things:\n", + "1. `instance_id` - An instance of Bigtable.\n", + "1. `table_id` - The name of the table within the Bigtable to store langchain documents." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_core.documents import Document\n", + "from langchain_google_bigtable import BigtableSaver\n", + "\n", + "test_docs = [\n", + " Document(\n", + " page_content=\"Apple Granny Smith 150 0.99 1\",\n", + " metadata={\"fruit_id\": 1},\n", + " ),\n", + " Document(\n", + " page_content=\"Banana Cavendish 200 0.59 0\",\n", + " metadata={\"fruit_id\": 2},\n", + " ),\n", + " Document(\n", + " page_content=\"Orange Navel 80 1.29 1\",\n", + " metadata={\"fruit_id\": 3},\n", + " ),\n", + "]\n", + "\n", + "saver = BigtableSaver(\n", + " instance_id=INSTANCE_ID,\n", + " table_id=TABLE_ID,\n", + ")\n", + "\n", + "saver.add_documents(test_docs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Querying for Documents from Bigtable\n", + "For more details on connecting to a Bigtable table, please check the [Python SDK documentation](https://cloud.google.com/python/docs/reference/bigtable/latest/client)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Load documents from table\n", + "\n", + "Load langchain documents with `BigtableLoader.load()` or `BigtableLoader.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `BigtableLoader` class you need to provide:\n", + "1. `instance_id` - An instance of Bigtable.\n", + "1. `table_id` - The name of the table within the Bigtable to store langchain documents." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_bigtable import BigtableLoader\n", + "\n", + "loader = BigtableLoader(\n", + " instance_id=INSTANCE_ID,\n", + " table_id=TABLE_ID,\n", + ")\n", + "\n", + "for doc in loader.lazy_load():\n", + " print(doc)\n", + " break" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete documents\n", + "\n", + "Delete a list of langchain documents from Bigtable table with `BigtableSaver.delete()`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_bigtable import BigtableSaver\n", + "\n", + "docs = loader.load()\n", + "print(\"Documents before delete: \", docs)\n", + "\n", + "onedoc = test_docs[0]\n", + "saver.delete([onedoc])\n", + "print(\"Documents after delete: \", loader.load())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Advanced Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Limiting the returned rows\n", + "There are two ways to limit the returned rows:\n", + "1. Using a [filter](https://cloud.google.com/python/docs/reference/bigtable/latest/row-filters)\n", + "2. Using a [row_set](https://cloud.google.com/python/docs/reference/bigtable/latest/row-set#google.cloud.bigtable.row_set.RowSet)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import google.cloud.bigtable.row_filters as row_filters\n", + "\n", + "filter_loader = BigtableLoader(\n", + " INSTANCE_ID, TABLE_ID, filter=row_filters.ColumnQualifierRegexFilter(b\"os_build\")\n", + ")\n", + "\n", + "\n", + "from google.cloud.bigtable.row_set import RowSet\n", + "\n", + "row_set = RowSet()\n", + "row_set.add_row_range_from_keys(\n", + " start_key=\"phone#4c410523#20190501\", end_key=\"phone#4c410523#201906201\"\n", + ")\n", + "\n", + "row_set_loader = BigtableLoader(\n", + " INSTANCE_ID,\n", + " TABLE_ID,\n", + " row_set=row_set,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Custom client\n", + "The client created by default is the default client, using only admin=True option. To use a non-default, a [custom client](https://cloud.google.com/python/docs/reference/bigtable/latest/client#class-googlecloudbigtableclientclientprojectnone-credentialsnone-readonlyfalse-adminfalse-clientinfonone-clientoptionsnone-adminclientoptionsnone-channelnone) can be passed to the constructor." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.cloud import bigtable\n", + "\n", + "custom_client_loader = BigtableLoader(\n", + " INSTANCE_ID,\n", + " TABLE_ID,\n", + " client=bigtable.Client(...),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Custom content\n", + "The BigtableLoader assumes there is a column family called `langchain`, that has a column called `content`, that contains values encoded in UTF-8. These defaults can be changed like so:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_bigtable import Encoding\n", + "\n", + "custom_content_loader = BigtableLoader(\n", + " INSTANCE_ID,\n", + " TABLE_ID,\n", + " content_encoding=Encoding.ASCII,\n", + " content_column_family=\"my_content_family\",\n", + " content_column_name=\"my_content_column_name\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Metadata mapping\n", + "By default, the `metadata` map on the `Document` object will contain a single key, `rowkey`, with the value of the row's rowkey value. To add more items to that map, use metadata_mapping." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "from langchain_google_bigtable import MetadataMapping\n", + "\n", + "metadata_mapping_loader = BigtableLoader(\n", + " INSTANCE_ID,\n", + " TABLE_ID,\n", + " metadata_mappings=[\n", + " MetadataMapping(\n", + " column_family=\"my_int_family\",\n", + " column_name=\"my_int_column\",\n", + " metadata_key=\"key_in_metadata_map\",\n", + " encoding=Encoding.INT_BIG_ENDIAN,\n", + " ),\n", + " MetadataMapping(\n", + " column_family=\"my_custom_family\",\n", + " column_name=\"my_custom_column\",\n", + " metadata_key=\"custom_key\",\n", + " encoding=Encoding.CUSTOM,\n", + " custom_decoding_func=lambda input: json.loads(input.decode()),\n", + " custom_encoding_func=lambda input: str.encode(json.dumps(input)),\n", + " ),\n", + " ],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Metadata as JSON\n", + "\n", + "If there is a column in Bigtable that contains a JSON string that you would like to have added to the output document metadata, it is possible to add the following parameters to BigtableLoader. Note, the default value for `metadata_as_json_encoding` is UTF-8." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "metadata_as_json_loader = BigtableLoader(\n", + " INSTANCE_ID,\n", + " TABLE_ID,\n", + " metadata_as_json_encoding=Encoding.ASCII,\n", + " metadata_as_json_family=\"my_metadata_as_json_family\",\n", + " metadata_as_json_name=\"my_metadata_as_json_column_name\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Customize BigtableSaver\n", + "\n", + "The BigtableSaver is also customizable similar to BigtableLoader." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "saver = BigtableSaver(\n", + " INSTANCE_ID,\n", + " TABLE_ID,\n", + " client=bigtable.Client(...),\n", + " content_encoding=Encoding.ASCII,\n", + " content_column_family=\"my_content_family\",\n", + " content_column_name=\"my_content_column_name\",\n", + " metadata_mappings=[\n", + " MetadataMapping(\n", + " column_family=\"my_int_family\",\n", + " column_name=\"my_int_column\",\n", + " metadata_key=\"key_in_metadata_map\",\n", + " encoding=Encoding.INT_BIG_ENDIAN,\n", + " ),\n", + " MetadataMapping(\n", + " column_family=\"my_custom_family\",\n", + " column_name=\"my_custom_column\",\n", + " metadata_key=\"custom_key\",\n", + " encoding=Encoding.CUSTOM,\n", + " custom_decoding_func=lambda input: json.loads(input.decode()),\n", + " custom_encoding_func=lambda input: str.encode(json.dumps(input)),\n", + " ),\n", + " ],\n", + " metadata_as_json_encoding=Encoding.ASCII,\n", + " metadata_as_json_family=\"my_metadata_as_json_family\",\n", + " metadata_as_json_name=\"my_metadata_as_json_column_name\",\n", + ")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/document_loaders/google_cloud_sql_mssql.ipynb b/docs/docs/integrations/document_loaders/google_cloud_sql_mssql.ipynb new file mode 100644 index 0000000000000..22c405014df61 --- /dev/null +++ b/docs/docs/integrations/document_loaders/google_cloud_sql_mssql.ipynb @@ -0,0 +1,629 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google Cloud SQL for SQL server\n", + "\n", + "> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers [MySQL](https://cloud.google.com/sql/mysql), [PostgreSQL](https://cloud.google.com/sql/postgres), and [SQL Server](https://cloud.google.com/sql/sqlserver) database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n", + "\n", + "This notebook goes over how to use [Cloud SQL for SQL server](https://cloud.google.com/sql/sqlserver) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `MSSQLLoader` and `MSSQLDocumentSaver`.\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-cloud-sql-mssql-python/blob/main/docs/document_loader.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + "* [Create a Cloud SQL for SQL server instance](https://cloud.google.com/sql/docs/sqlserver/create-instance)\n", + "* [Create a Cloud SQL database](https://cloud.google.com/sql/docs/mssql/create-manage-databases)\n", + "* [Add an IAM database user to the database](https://cloud.google.com/sql/docs/sqlserver/add-manage-iam-users#creating-a-database-user) (Optional)\n", + "\n", + "After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the both the Google Cloud region and name of your Cloud SQL instance.\n", + "REGION = \"us-central1\" # @param {type:\"string\"}\n", + "INSTANCE = \"test-instance\" # @param {type:\"string\"}\n", + "\n", + "# @markdown Please fill in user name and password of your Cloud SQL instance.\n", + "DB_USER = \"sqlserver\" # @param {type:\"string\"}\n", + "DB_PASS = \"password\" # @param {type:\"string\"}\n", + "\n", + "# @markdown Please specify a database and a table for demo purpose.\n", + "DATABASE = \"test\" # @param {type:\"string\"}\n", + "TABLE_NAME = \"test-default\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "\n", + "The integration lives in its own `langchain-google-cloud-sql-mssql` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet langchain-google-cloud-sql-mssql" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🔐 Authentication\n", + "\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "- If you are using Colab to run this notebook, use the cell below and continue.\n", + "- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 💡 API Enablement\n", + "The `langchain-google-cloud-sql-mssql` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# enable Cloud SQL Admin API\n", + "!gcloud services enable sqladmin.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### MSSQLEngine Connection Pool\n", + "\n", + "Before saving or loading documents from MSSQL table, we need first configures a connection pool to Cloud SQL database. The `MSSQLEngine` configures a [SQLAlchemy connection pool](https://docs.sqlalchemy.org/en/20/core/pooling.html#module-sqlalchemy.pool) to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n", + "\n", + "To create a `MSSQLEngine` using `MSSQLEngine.from_instance()` you need to provide only 4 things:\n", + "\n", + "1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n", + "1. `region` : Region where the Cloud SQL instance is located.\n", + "1. `instance` : The name of the Cloud SQL instance.\n", + "1. `database` : The name of the database to connect to on the Cloud SQL instance.\n", + "1. `user` : Database user to use for built-in database authentication and login.\n", + "1. `password` : Database password to use for built-in database authentication and login." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_mssql import MSSQLEngine\n", + "\n", + "engine = MSSQLEngine.from_instance(\n", + " project_id=PROJECT_ID,\n", + " region=REGION,\n", + " instance=INSTANCE,\n", + " database=DATABASE,\n", + " user=DB_USER,\n", + " password=DB_PASS,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize a table\n", + "\n", + "Initialize a table of default schema via `MSSQLEngine.init_document_table()`. Table Columns:\n", + "- page_content (type: text)\n", + "- langchain_metadata (type: JSON)\n", + "\n", + "`overwrite_existing=True` flag means the newly initialized table will replace any existing table of the same name." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "engine.init_document_table(TABLE_NAME, overwrite_existing=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Save documents\n", + "\n", + "Save langchain documents with `MSSQLDocumentSaver.add_documents()`. To initialize `MSSQLDocumentSaver` class you need to provide 2 things:\n", + "1. `engine` - An instance of a `MSSQLEngine` engine.\n", + "2. `table_name` - The name of the table within the Cloud SQL database to store langchain documents." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_core.documents import Document\n", + "from langchain_google_cloud_sql_mssql import MSSQLDocumentSaver\n", + "\n", + "test_docs = [\n", + " Document(\n", + " page_content=\"Apple Granny Smith 150 0.99 1\",\n", + " metadata={\"fruit_id\": 1},\n", + " ),\n", + " Document(\n", + " page_content=\"Banana Cavendish 200 0.59 0\",\n", + " metadata={\"fruit_id\": 2},\n", + " ),\n", + " Document(\n", + " page_content=\"Orange Navel 80 1.29 1\",\n", + " metadata={\"fruit_id\": 3},\n", + " ),\n", + "]\n", + "saver = MSSQLDocumentSaver(engine=engine, table_name=TABLE_NAME)\n", + "saver.add_documents(test_docs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load documents" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load langchain documents with `MSSQLLoader.load()` or `MSSQLLoader.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `MSSQLDocumentSaver` class you need to provide:\n", + "1. `engine` - An instance of a `MSSQLEngine` engine.\n", + "2. `table_name` - The name of the table within the Cloud SQL database to store langchain documents." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_mssql import MSSQLLoader\n", + "\n", + "loader = MSSQLLoader(engine=engine, table_name=TABLE_NAME)\n", + "docs = loader.lazy_load()\n", + "for doc in docs:\n", + " print(\"Loaded documents:\", doc)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load documents via query" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Other than loading documents from a table, we can also choose to load documents from a view generated from a SQL query. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_mssql import MSSQLLoader\n", + "\n", + "loader = MSSQLLoader(\n", + " engine=engine,\n", + " query=f\"select * from \\\"{TABLE_NAME}\\\" where JSON_VALUE(langchain_metadata, '$.fruit_id') = 1;\",\n", + ")\n", + "onedoc = loader.load()\n", + "onedoc" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The view generated from SQL query can have different schema than default table. In such cases, the behavior of MSSQLLoader is the same as loading from table with non-default schema. Please refer to section [Load documents with customized document page content & metadata](#Load-documents-with-customized-document-page-content-&-metadata)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete documents" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Delete a list of langchain documents from MSSQL table with `MSSQLDocumentSaver.delete()`.\n", + "\n", + "For table with default schema (page_content, langchain_metadata), the deletion criteria is:\n", + "\n", + "A `row` should be deleted if there exists a `document` in the list, such that\n", + "- `document.page_content` equals `row[page_content]`\n", + "- `document.metadata` equals `row[langchain_metadata]`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_mssql import MSSQLLoader\n", + "\n", + "loader = MSSQLLoader(engine=engine, table_name=TABLE_NAME)\n", + "docs = loader.load()\n", + "print(\"Documents before delete:\", docs)\n", + "saver.delete(onedoc)\n", + "print(\"Documents after delete:\", loader.load())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Advanced Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load documents with customized document page content & metadata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First we prepare an example table with non-default schema, and populate it with some arbitary data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import sqlalchemy\n", + "\n", + "with engine.connect() as conn:\n", + " conn.execute(sqlalchemy.text(f'DROP TABLE IF EXISTS \"{TABLE_NAME}\"'))\n", + " conn.commit()\n", + " conn.execute(\n", + " sqlalchemy.text(\n", + " f\"\"\"\n", + " IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[{TABLE_NAME}]') AND type in (N'U'))\n", + " BEGIN\n", + " CREATE TABLE [dbo].[{TABLE_NAME}](\n", + " fruit_id INT IDENTITY(1,1) PRIMARY KEY,\n", + " fruit_name VARCHAR(100) NOT NULL,\n", + " variety VARCHAR(50),\n", + " quantity_in_stock INT NOT NULL,\n", + " price_per_unit DECIMAL(6,2) NOT NULL,\n", + " organic BIT NOT NULL\n", + " )\n", + " END\n", + " \"\"\"\n", + " )\n", + " )\n", + " conn.execute(\n", + " sqlalchemy.text(\n", + " f\"\"\"\n", + " INSERT INTO \"{TABLE_NAME}\" (fruit_name, variety, quantity_in_stock, price_per_unit, organic)\n", + " VALUES\n", + " ('Apple', 'Granny Smith', 150, 0.99, 1),\n", + " ('Banana', 'Cavendish', 200, 0.59, 0),\n", + " ('Orange', 'Navel', 80, 1.29, 1);\n", + " \"\"\"\n", + " )\n", + " )\n", + " conn.commit()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we still load langchain documents with default parameters of `MSSQLLoader` from this example table, the `page_content` of loaded documents will be the first column of the table, and `metadata` will be consisting of key-value pairs of all the other columns." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "loader = MSSQLLoader(\n", + " engine=engine,\n", + " table_name=TABLE_NAME,\n", + ")\n", + "loader.load()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can specify the content and metadata we want to load by setting the `content_columns` and `metadata_columns` when initializing the `MSSQLLoader`.\n", + "1. `content_columns`: The columns to write into the `page_content` of the document.\n", + "2. `metadata_columns`: The columns to write into the `metadata` of the document.\n", + "\n", + "For example here, the values of columns in `content_columns` will be joined together into a space-separated string, as `page_content` of loaded documents, and `metadata` of loaded documents will only contain key-value pairs of columns specified in `metadata_columns`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "loader = MSSQLLoader(\n", + " engine=engine,\n", + " table_name=TABLE_NAME,\n", + " content_columns=[\n", + " \"variety\",\n", + " \"quantity_in_stock\",\n", + " \"price_per_unit\",\n", + " \"organic\",\n", + " ],\n", + " metadata_columns=[\"fruit_id\", \"fruit_name\"],\n", + ")\n", + "loader.load()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Save document with customized page content & metadata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In order to save langchain document into table with customized metadata fields. We need first create such a table via `MSSQLEngine.init_document_table()`, and specify the list of `metadata_columns` we want it to have. In this example, the created table will have table columns:\n", + "- description (type: text): for storing fruit description.\n", + "- fruit_name (type text): for storing fruit name.\n", + "- organic (type tinyint(1)): to tell if the fruit is organic.\n", + "- other_metadata (type: JSON): for storing other metadata information of the fruit.\n", + "\n", + "We can use the following parameters with `MSSQLEngine.init_document_table()` to create the table:\n", + "1. `table_name`: The name of the table within the Cloud SQL database to store langchain documents.\n", + "2. `metadata_columns`: A list of `sqlalchemy.Column` indicating the list of metadata columns we need.\n", + "3. `content_column`: The name of column to store `page_content` of langchain document. Default: `page_content`.\n", + "4. `metadata_json_column`: The name of JSON column to store extra `metadata` of langchain document. Default: `langchain_metadata`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "engine.init_document_table(\n", + " TABLE_NAME,\n", + " metadata_columns=[\n", + " sqlalchemy.Column(\n", + " \"fruit_name\",\n", + " sqlalchemy.UnicodeText,\n", + " primary_key=False,\n", + " nullable=True,\n", + " ),\n", + " sqlalchemy.Column(\n", + " \"organic\",\n", + " sqlalchemy.Boolean,\n", + " primary_key=False,\n", + " nullable=True,\n", + " ),\n", + " ],\n", + " content_column=\"description\",\n", + " metadata_json_column=\"other_metadata\",\n", + " overwrite_existing=True,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Save documents with `MSSQLDocumentSaver.add_documents()`. As you can see in this example, \n", + "- `document.page_content` will be saved into `description` column.\n", + "- `document.metadata.fruit_name` will be saved into `fruit_name` column.\n", + "- `document.metadata.organic` will be saved into `organic` column.\n", + "- `document.metadata.fruit_id` will be saved into `other_metadata` column in JSON format." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "test_docs = [\n", + " Document(\n", + " page_content=\"Granny Smith 150 0.99\",\n", + " metadata={\"fruit_id\": 1, \"fruit_name\": \"Apple\", \"organic\": 1},\n", + " ),\n", + "]\n", + "saver = MSSQLDocumentSaver(\n", + " engine=engine,\n", + " table_name=TABLE_NAME,\n", + " content_column=\"description\",\n", + " metadata_json_column=\"other_metadata\",\n", + ")\n", + "saver.add_documents(test_docs)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with engine.connect() as conn:\n", + " result = conn.execute(sqlalchemy.text(f'select * from \"{TABLE_NAME}\";'))\n", + " print(result.keys())\n", + " print(result.fetchall())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete documents with customized page content & metadata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also delete documents from table with customized metadata columns via `MSSQLDocumentSaver.delete()`. The deletion criteria is:\n", + "\n", + "A `row` should be deleted if there exists a `document` in the list, such that\n", + "- `document.page_content` equals `row[page_content]`\n", + "- For every metadata field `k` in `document.metadata`\n", + " - `document.metadata[k]` equals `row[k]` or `document.metadata[k]` equals `row[langchain_metadata][k]`\n", + "- There no extra metadata field presents in `row` but not in `document.metadata`.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "loader = MSSQLLoader(engine=engine, table_name=TABLE_NAME)\n", + "docs = loader.load()\n", + "print(\"Documents before delete:\", docs)\n", + "saver.delete(docs)\n", + "print(\"Documents after delete:\", loader.load())" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/document_loaders/google_cloud_sql_mysql.ipynb b/docs/docs/integrations/document_loaders/google_cloud_sql_mysql.ipynb new file mode 100644 index 0000000000000..9b7361061b69e --- /dev/null +++ b/docs/docs/integrations/document_loaders/google_cloud_sql_mysql.ipynb @@ -0,0 +1,631 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google Cloud SQL for MySQL\n", + "\n", + "> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers [MySQL](https://cloud.google.com/sql/mysql), [PostgreSQL](https://cloud.google.com/sql/postgres), and [SQL Server](https://cloud.google.com/sql/sqlserver) database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n", + "\n", + "This notebook goes over how to use [Cloud SQL for MySQL](https://cloud.google.com/sql/mysql) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `MySQLLoader` and `MySQLDocumentSaver`.\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-cloud-sql-mysql-python/blob/main/docs/document_loader.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + "* [Create a Cloud SQL for MySQL instance](https://cloud.google.com/sql/docs/mysql/create-instance)\n", + "* [Create a Cloud SQL database](https://cloud.google.com/sql/docs/mysql/create-manage-databases)\n", + "* [Add an IAM database user to the database](https://cloud.google.com/sql/docs/mysql/add-manage-iam-users#creating-a-database-user) (Optional)\n", + "\n", + "After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# @markdown Please fill in the both the Google Cloud region and name of your Cloud SQL instance.\n", + "REGION = \"us-central1\" # @param {type:\"string\"}\n", + "INSTANCE = \"test-instance\" # @param {type:\"string\"}\n", + "\n", + "# @markdown Please specify a database and a table for demo purpose.\n", + "DATABASE = \"test\" # @param {type:\"string\"}\n", + "TABLE_NAME = \"test-default\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "\n", + "The integration lives in its own `langchain-google-cloud-sql-mysql` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%pip install -upgrade --quiet langchain-google-cloud-sql-mysql" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🔐 Authentication\n", + "\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "- If you are using Colab to run this notebook, use the cell below and continue.\n", + "- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### API Enablement\n", + "The `langchain-google-cloud-sql-mysql` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# enable Cloud SQL Admin API\n", + "!gcloud services enable sqladmin.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### MySQLEngine Connection Pool\n", + "\n", + "Before saving or loading documents from MySQL table, we need first configures a connection pool to Cloud SQL database. The `MySQLEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n", + "\n", + "To create a `MySQLEngine` using `MySQLEngine.from_instance()` you need to provide only 4 things:\n", + "\n", + "1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n", + "2. `region` : Region where the Cloud SQL instance is located.\n", + "3. `instance` : The name of the Cloud SQL instance.\n", + "4. `database` : The name of the database to connect to on the Cloud SQL instance.\n", + "\n", + "By default, [IAM database authentication](https://cloud.google.com/sql/docs/mysql/iam-authentication#iam-db-auth) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n", + "\n", + "For more informatin on IAM database authentication please see:\n", + "* [Configure an instance for IAM database authentication](https://cloud.google.com/sql/docs/mysql/create-edit-iam-instances)\n", + "* [Manage users with IAM database authentication](https://cloud.google.com/sql/docs/mysql/add-manage-iam-users)\n", + "\n", + "Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/mysql/built-in-authentication) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `MySQLEngine.from_instance()`:\n", + "* `user` : Database user to use for built-in database authentication and login\n", + "* `password` : Database password to use for built-in database authentication and login." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_mysql import MySQLEngine\n", + "\n", + "engine = MySQLEngine.from_instance(\n", + " project_id=PROJECT_ID, region=REGION, instance=INSTANCE, database=DATABASE\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize a table\n", + "\n", + "Initialize a table of default schema via `MySQLEngine.init_document_table()`. Table Columns:\n", + "- page_content (type: text)\n", + "- langchain_metadata (type: JSON)\n", + "\n", + "`overwrite_existing=True` flag means the newly initialized table will replace any existing table of the same name." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "engine.init_document_table(TABLE_NAME, overwrite_existing=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Save documents\n", + "\n", + "Save langchain documents with `MySQLDocumentSaver.add_documents()`. To initialize `MySQLDocumentSaver` class you need to provide 2 things:\n", + "1. `engine` - An instance of a `MySQLEngine` engine.\n", + "2. `table_name` - The name of the table within the Cloud SQL database to store langchain documents." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain_core.documents import Document\n", + "from langchain_google_cloud_sql_mysql import MySQLDocumentSaver\n", + "\n", + "test_docs = [\n", + " Document(\n", + " page_content=\"Apple Granny Smith 150 0.99 1\",\n", + " metadata={\"fruit_id\": 1},\n", + " ),\n", + " Document(\n", + " page_content=\"Banana Cavendish 200 0.59 0\",\n", + " metadata={\"fruit_id\": 2},\n", + " ),\n", + " Document(\n", + " page_content=\"Orange Navel 80 1.29 1\",\n", + " metadata={\"fruit_id\": 3},\n", + " ),\n", + "]\n", + "saver = MySQLDocumentSaver(engine=engine, table_name=TABLE_NAME)\n", + "saver.add_documents(test_docs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load documents" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load langchain documents with `MySQLLoader.load()` or `MySQLLoader.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `MySQLLoader` class you need to provide:\n", + "1. `engine` - An instance of a `MySQLEngine` engine.\n", + "2. `table_name` - The name of the table within the Cloud SQL database to store langchain documents." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_mysql import MySQLLoader\n", + "\n", + "loader = MySQLLoader(engine=engine, table_name=TABLE_NAME)\n", + "docs = loader.lazy_load()\n", + "for doc in docs:\n", + " print(\"Loaded documents:\", doc)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load documents via query" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Other than loading documents from a table, we can also choose to load documents from a view generated from a SQL query. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_mysql import MySQLLoader\n", + "\n", + "loader = MySQLLoader(\n", + " engine=engine,\n", + " query=f\"select * from `{TABLE_NAME}` where JSON_EXTRACT(langchain_metadata, '$.fruit_id') = 1;\",\n", + ")\n", + "onedoc = loader.load()\n", + "onedoc" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The view generated from SQL query can have different schema than default table. In such cases, the behavior of MySQLLoader is the same as loading from table with non-default schema. Please refer to section [Load documents with customized document page content & metadata](#Load-documents-with-customized-document-page-content-&-metadata)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete documents" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Delete a list of langchain documents from MySQL table with `MySQLDocumentSaver.delete()`.\n", + "\n", + "For table with default schema (page_content, langchain_metadata), the deletion criteria is:\n", + "\n", + "A `row` should be deleted if there exists a `document` in the list, such that\n", + "- `document.page_content` equals `row[page_content]`\n", + "- `document.metadata` equals `row[langchain_metadata]`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_mysql import MySQLLoader\n", + "\n", + "loader = MySQLLoader(engine=engine, table_name=TABLE_NAME)\n", + "docs = loader.load()\n", + "print(\"Documents before delete:\", docs)\n", + "saver.delete(onedoc)\n", + "print(\"Documents after delete:\", loader.load())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Advanced Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load documents with customized document page content & metadata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First we prepare an example table with non-default schema, and populate it with some arbitary data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import sqlalchemy\n", + "\n", + "with engine.connect() as conn:\n", + " conn.execute(sqlalchemy.text(f\"DROP TABLE IF EXISTS `{TABLE_NAME}`\"))\n", + " conn.commit()\n", + " conn.execute(\n", + " sqlalchemy.text(\n", + " f\"\"\"\n", + " CREATE TABLE IF NOT EXISTS `{TABLE_NAME}`(\n", + " fruit_id INT AUTO_INCREMENT PRIMARY KEY,\n", + " fruit_name VARCHAR(100) NOT NULL,\n", + " variety VARCHAR(50),\n", + " quantity_in_stock INT NOT NULL,\n", + " price_per_unit DECIMAL(6,2) NOT NULL,\n", + " organic TINYINT(1) NOT NULL\n", + " )\n", + " \"\"\"\n", + " )\n", + " )\n", + " conn.execute(\n", + " sqlalchemy.text(\n", + " f\"\"\"\n", + " INSERT INTO `{TABLE_NAME}` (fruit_name, variety, quantity_in_stock, price_per_unit, organic)\n", + " VALUES\n", + " ('Apple', 'Granny Smith', 150, 0.99, 1),\n", + " ('Banana', 'Cavendish', 200, 0.59, 0),\n", + " ('Orange', 'Navel', 80, 1.29, 1);\n", + " \"\"\"\n", + " )\n", + " )\n", + " conn.commit()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we still load langchain documents with default parameters of `MySQLLoader` from this example table, the `page_content` of loaded documents will be the first column of the table, and `metadata` will be consisting of key-value pairs of all the other columns." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "loader = MySQLLoader(\n", + " engine=engine,\n", + " table_name=TABLE_NAME,\n", + ")\n", + "loader.load()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can specify the content and metadata we want to load by setting the `content_columns` and `metadata_columns` when initializing the `MySQLLoader`.\n", + "1. `content_columns`: The columns to write into the `page_content` of the document.\n", + "2. `metadata_columns`: The columns to write into the `metadata` of the document.\n", + "\n", + "For example here, the values of columns in `content_columns` will be joined together into a space-separated string, as `page_content` of loaded documents, and `metadata` of loaded documents will only contain key-value pairs of columns specified in `metadata_columns`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "loader = MySQLLoader(\n", + " engine=engine,\n", + " table_name=TABLE_NAME,\n", + " content_columns=[\n", + " \"variety\",\n", + " \"quantity_in_stock\",\n", + " \"price_per_unit\",\n", + " \"organic\",\n", + " ],\n", + " metadata_columns=[\"fruit_id\", \"fruit_name\"],\n", + ")\n", + "loader.load()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Save document with customized page content & metadata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In order to save langchain document into table with customized metadata fields. We need first create such a table via `MySQLEngine.init_document_table()`, and specify the list of `metadata_columns` we want it to have. In this example, the created table will have table columns:\n", + "- description (type: text): for storing fruit description.\n", + "- fruit_name (type text): for storing fruit name.\n", + "- organic (type tinyint(1)): to tell if the fruit is organic.\n", + "- other_metadata (type: JSON): for storing other metadata information of the fruit.\n", + "\n", + "We can use the following parameters with `MySQLEngine.init_document_table()` to create the table:\n", + "1. `table_name`: The name of the table within the Cloud SQL database to store langchain documents.\n", + "2. `metadata_columns`: A list of `sqlalchemy.Column` indicating the list of metadata columns we need.\n", + "3. `content_column`: The name of column to store `page_content` of langchain document. Default: `page_content`.\n", + "4. `metadata_json_column`: The name of JSON column to store extra `metadata` of langchain document. Default: `langchain_metadata`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "engine.init_document_table(\n", + " TABLE_NAME,\n", + " metadata_columns=[\n", + " sqlalchemy.Column(\n", + " \"fruit_name\",\n", + " sqlalchemy.UnicodeText,\n", + " primary_key=False,\n", + " nullable=True,\n", + " ),\n", + " sqlalchemy.Column(\n", + " \"organic\",\n", + " sqlalchemy.Boolean,\n", + " primary_key=False,\n", + " nullable=True,\n", + " ),\n", + " ],\n", + " content_column=\"description\",\n", + " metadata_json_column=\"other_metadata\",\n", + " overwrite_existing=True,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Save documents with `MySQLDocumentSaver.add_documents()`. As you can see in this example, \n", + "- `document.page_content` will be saved into `description` column.\n", + "- `document.metadata.fruit_name` will be saved into `fruit_name` column.\n", + "- `document.metadata.organic` will be saved into `organic` column.\n", + "- `document.metadata.fruit_id` will be saved into `other_metadata` column in JSON format." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "test_docs = [\n", + " Document(\n", + " page_content=\"Granny Smith 150 0.99\",\n", + " metadata={\"fruit_id\": 1, \"fruit_name\": \"Apple\", \"organic\": 1},\n", + " ),\n", + "]\n", + "saver = MySQLDocumentSaver(\n", + " engine=engine,\n", + " table_name=TABLE_NAME,\n", + " content_column=\"description\",\n", + " metadata_json_column=\"other_metadata\",\n", + ")\n", + "saver.add_documents(test_docs)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with engine.connect() as conn:\n", + " result = conn.execute(sqlalchemy.text(f\"select * from `{TABLE_NAME}`;\"))\n", + " print(result.keys())\n", + " print(result.fetchall())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete documents with customized page content & metadata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also delete documents from table with customized metadata columns via `MySQLDocumentSaver.delete()`. The deletion criteria is:\n", + "\n", + "A `row` should be deleted if there exists a `document` in the list, such that\n", + "- `document.page_content` equals `row[page_content]`\n", + "- For every metadata field `k` in `document.metadata`\n", + " - `document.metadata[k]` equals `row[k]` or `document.metadata[k]` equals `row[langchain_metadata][k]`\n", + "- There no extra metadata field presents in `row` but not in `document.metadata`.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "loader = MySQLLoader(engine=engine, table_name=TABLE_NAME)\n", + "docs = loader.load()\n", + "print(\"Documents before delete:\", docs)\n", + "saver.delete(docs)\n", + "print(\"Documents after delete:\", loader.load())" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/document_loaders/google_cloud_sql_pg.ipynb b/docs/docs/integrations/document_loaders/google_cloud_sql_pg.ipynb new file mode 100644 index 0000000000000..d0da8921440fd --- /dev/null +++ b/docs/docs/integrations/document_loaders/google_cloud_sql_pg.ipynb @@ -0,0 +1,381 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "E_RJy7C1bpCT" + }, + "source": [ + "# Google Cloud SQL for PostgreSQL\n", + "\n", + "> [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres) is a fully-managed database service that helps you set up, maintain, manage, and administer your PostgreSQL relational databases on Google Cloud Platform. Extend your database application to build AI-powered experiences leveraging Cloud SQL for PostgreSQL's Langchain integrations.\n", + "\n", + "This notebook goes over how to use `Cloud SQL for PostgreSQL` to load Documents with the `PostgreSQLLoader` class." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xjcxaw6--Xyy" + }, + "source": [ + "## Before you begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + " * [Enable the Cloud SQL Admin API.](https://console.cloud.google.com/marketplace/product/google/sqladmin.googleapis.com)\n", + " * [Create a Cloud SQL for PostgreSQL instance.](https://cloud.google.com/sql/docs/postgres/create-instance)\n", + " * [Create a Cloud SQL for PostgreSQL database.](https://cloud.google.com/sql/docs/postgres/create-manage-databases)\n", + " * [Add a User to the database.](https://cloud.google.com/sql/docs/postgres/create-manage-users)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IR54BmgvdHT_" + }, + "source": [ + "### 🦜🔗 Library Installation\n", + "Install the integration library, `langchain-google-cloud-sql-pg`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "0ZITIDE160OD", + "outputId": "90e0636e-ff34-4e1e-ad37-d2a6db4a317e" + }, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet langchain-google-cloud-sql-pg" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v40bB_GMcr9f" + }, + "source": [ + "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6o0iGVIdDD6K" + }, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cTXTbj4UltKf" + }, + "source": [ + "### 🔐 Authentication\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "* If you are using Colab to run this notebook, use the cell below and continue.\n", + "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Uj02bMRAc9_c" + }, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "wnp1R1PYc9_c", + "outputId": "6502c721-a2fd-451f-b946-9f7b850d5966" + }, + "outputs": [], + "source": [ + "# @title Project { display-mode: \"form\" }\n", + "PROJECT_ID = \"gcp_project_id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "! gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "id": "rEWWNoNnKOgq", + "metadata": { + "id": "rEWWNoNnKOgq" + }, + "source": [ + "### 💡 API Enablement\n", + "The `langchain_google_cloud_sql_pg` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5utKIdq7KYi5", + "metadata": { + "id": "5utKIdq7KYi5" + }, + "outputs": [], + "source": [ + "# enable Cloud SQL Admin API\n", + "!gcloud services enable sqladmin.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "id": "f8f2830ee9ca1e01", + "metadata": { + "id": "f8f2830ee9ca1e01" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "id": "OMvzMWRrR6n7", + "metadata": { + "id": "OMvzMWRrR6n7" + }, + "source": [ + "### Set Cloud SQL database values\n", + "Find your database variables, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql/instances)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "irl7eMFnSPZr", + "metadata": { + "id": "irl7eMFnSPZr" + }, + "outputs": [], + "source": [ + "# @title Set Your Values Here { display-mode: \"form\" }\n", + "REGION = \"us-central1\" # @param {type: \"string\"}\n", + "INSTANCE = \"my-primary\" # @param {type: \"string\"}\n", + "DATABASE = \"my-database\" # @param {type: \"string\"}\n", + "TABLE_NAME = \"vector_store\" # @param {type: \"string\"}" + ] + }, + { + "cell_type": "markdown", + "id": "QuQigs4UoFQ2", + "metadata": { + "id": "QuQigs4UoFQ2" + }, + "source": [ + "### Cloud SQL Engine\n", + "\n", + "One of the requirements and arguments to establish PostgreSQL as a document loader is a `PostgreSQLEngine` object. The `PostgreSQLEngine` configures a connection pool to your Cloud SQL for PostgreSQL database, enabling successful connections from your application and following industry best practices.\n", + "\n", + "To create a `PostgreSQLEngine` using `PostgreSQLEngine.from_instance()` you need to provide only 4 things:\n", + "\n", + "1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n", + "1. `region` : Region where the Cloud SQL instance is located.\n", + "1. `instance` : The name of the Cloud SQL instance.\n", + "1. `database` : The name of the database to connect to on the Cloud SQL instance.\n", + "\n", + "By default, [IAM database authentication](https://cloud.google.com/sql/docs/postgres/iam-authentication) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the environment.\n", + "\n", + "Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/postgres/users) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `PostgreSQLEngine.from_instance()`:\n", + "* `user` : Database user to use for built-in database authentication and login\n", + "* `password` : Database password to use for built-in database authentication and login.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note**: This tutorial demonstrates the async interface. All async methods have corresponding sync methods." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_pg import PostgreSQLEngine\n", + "\n", + "engine = await PostgreSQLEngine.afrom_instance(\n", + " project_id=PROJECT_ID,\n", + " region=REGION,\n", + " instance=INSTANCE,\n", + " database=DATABASE,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e1tl0aNx7SWy" + }, + "source": [ + "### Create PostgreSQLLoader" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "z-AZyzAQ7bsf" + }, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_pg import PostgreSQLLoader\n", + "\n", + "# Creating a basic PostgreSQL object\n", + "loader = await PostgreSQLLoader.create(engine, table_name=TABLE_NAME)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PeOMpftjc9_e" + }, + "source": [ + "### Load Documents via default table\n", + "The loader returns a list of Documents from the table using the first column as page_content and all other columns as metadata. The default table will have the first column as\n", + "page_content and the second column as metadata (JSON). Each row becomes a document. Please note that if you want your documents to have ids you will need to add them in." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cwvi_O5Wc9_e" + }, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_pg import PostgreSQLLoader\n", + "\n", + "# Creating a basic PostgreSQLLoader object\n", + "loader = await PostgreSQLLoader.create(engine, table_name=TABLE_NAME)\n", + "\n", + "docs = await loader.aload()\n", + "print(docs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kSkL9l1Hc9_e" + }, + "source": [ + "### Load documents via custom table/metadata or custom page content columns" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "loader = await PostgreSQLLoader.create(\n", + " engine,\n", + " table_name=TABLE_NAME,\n", + " content_columns=[\"product_name\"], # Optional\n", + " metadata_columns=[\"id\"], # Optional\n", + ")\n", + "docs = await loader.aload()\n", + "print(docs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5R6h0_Cvc9_f" + }, + "source": [ + "### Set page content format\n", + "The loader returns a list of Documents, with one document per row, with page content in specified string format, i.e. text (space separated concatenation), JSON, YAML, CSV, etc. JSON and YAML formats include headers, while text and CSV do not include field headers.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NGNdS7cqc9_f" + }, + "outputs": [], + "source": [ + "loader = await PostgreSQLLoader.create(\n", + " engine,\n", + " table_name=\"products\",\n", + " content_columns=[\"product_name\", \"description\"],\n", + " format=\"YAML\",\n", + ")\n", + "docs = await loader.aload()\n", + "print(docs)" + ] + } + ], + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/document_loaders/google_datastore.ipynb b/docs/docs/integrations/document_loaders/google_datastore.ipynb new file mode 100644 index 0000000000000..71cc49cbf43e1 --- /dev/null +++ b/docs/docs/integrations/document_loaders/google_datastore.ipynb @@ -0,0 +1,410 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google Firestore in Datastore mode\n", + "\n", + "> [Firestore in Datastore mode](https://cloud.google.com/datastore) is a serverless document-oriented database that scales to meet any demand. Extend your database application to build AI-powered experiences leveraging Datastore's Langchain integrations.\n", + "\n", + "This notebook goes over how to use [Firestore in Datastore mode](https://cloud.google.com/datastore) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `DatastoreLoader` and `DatastoreSaver`.\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-datastore-python/blob/main/docs/document_loader.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + "* [Create a Datastore database](https://cloud.google.com/datastore/docs/manage-databases)\n", + "\n", + "After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please specify a source for demo purpose.\n", + "SOURCE = \"test\" # @param {type:\"Query\"|\"CollectionGroup\"|\"DocumentReference\"|\"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "\n", + "The integration lives in its own `langchain-google-datastore` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%pip install -upgrade --quiet langchain-google-datastore" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🔐 Authentication\n", + "\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "- If you are using Colab to run this notebook, use the cell below and continue.\n", + "- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### API Enablement\n", + "The `langchain-google-datastore` package requires that you [enable the Datastore API](https://console.cloud.google.com/flows/enableapi?apiid=datastore.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# enable Datastore API\n", + "!gcloud services enable datastore.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Save documents\n", + "\n", + "`DatastoreSaver` can store Documents into Datastore. By default it will try to extract the Document reference from the metadata\n", + "\n", + "Save langchain documents with `DatastoreSaver.upsert_documents()`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_core.documents import Document\n", + "from langchain_google_datastore import DatastoreSaver\n", + "\n", + "data = [Document(page_content=\"Hello, World!\")]\n", + "saver = DatastoreSaver()\n", + "saver.upsert_documents(data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Save documents without reference\n", + "\n", + "If a collection is specified the documents will be stored with an auto generated id." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "saver = DatastoreSaver(\"Collection\")\n", + "\n", + "saver.upsert_documents(data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Save documents with other references" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "doc_ids = [\"AnotherCollection/doc_id\", \"foo/bar\"]\n", + "saver = DatastoreSaver()\n", + "\n", + "saver.upsert_documents(documents=data, document_ids=doc_ids)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load from Collection or SubCollection" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load langchain documents with `DatastoreLoader.load()` or `Datastore.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `DatastoreLoader` class you need to provide:\n", + "1. `source` - An instance of a Query, CollectionGroup, DocumentReference or the single `\\`-delimited path to a Datastore collection`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_datastore import DatastoreLoader\n", + "\n", + "loader_collection = DatastoreLoader(\"Collection\")\n", + "loader_subcollection = DatastoreLoader(\"Collection/doc/SubCollection\")\n", + "\n", + "\n", + "data_collection = loader_collection.load()\n", + "data_subcollection = loader_subcollection.load()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load a single Document" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.cloud import datastore\n", + "\n", + "client = datastore.Client()\n", + "doc_ref = client.collection(\"foo\").document(\"bar\")\n", + "\n", + "loader_document = DatastoreLoader(doc_ref)\n", + "\n", + "data = loader_document.load()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load from CollectionGroup or Query" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.cloud.datastore import CollectionGroup, FieldFilter, Query\n", + "\n", + "col_ref = client.collection(\"col_group\")\n", + "collection_group = CollectionGroup(col_ref)\n", + "\n", + "loader_group = DatastoreLoader(collection_group)\n", + "\n", + "col_ref = client.collection(\"collection\")\n", + "query = col_ref.where(filter=FieldFilter(\"region\", \"==\", \"west_coast\"))\n", + "\n", + "loader_query = DatastoreLoader(query)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete documents\n", + "\n", + "Delete a list of langchain documents from Datastore collection with `DatastoreSaver.delete_documents()`.\n", + "\n", + "If document ids is provided, the Documents will be ignored." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "saver = DatastoreSaver()\n", + "\n", + "saver.delete_documents(data)\n", + "\n", + "# The Documents will be ignored and only the document ids will be used.\n", + "saver.delete_documents(data, doc_ids)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Advanced Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load documents with customize document page content & metadata\n", + "\n", + "The arguments of `page_content_fields` and `metadata_fields` will specify the Datastore Document fields to be written into LangChain Document `page_content` and `metadata`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "loader = DatastoreLoader(\n", + " source=\"foo/bar/subcol\",\n", + " page_content_fields=[\"data_field\"],\n", + " metadata_fields=[\"metadata_field\"],\n", + ")\n", + "\n", + "data = loader.load()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Customize Page Content Format\n", + "\n", + "When the `page_content` contains only one field the information will be the field value only. Otherwise the `page_content` will be in JSON format." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Customize Connection & Authentication" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.auth import compute_engine\n", + "from google.cloud.datastore import Client\n", + "\n", + "client = Client(database=\"non-default-db\", creds=compute_engine.Credentials())\n", + "loader = DatastoreLoader(\n", + " source=\"foo\",\n", + " client=client,\n", + ")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/document_loaders/google_firestore.ipynb b/docs/docs/integrations/document_loaders/google_firestore.ipynb new file mode 100644 index 0000000000000..6bc247ce7ee38 --- /dev/null +++ b/docs/docs/integrations/document_loaders/google_firestore.ipynb @@ -0,0 +1,412 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google Firestore (Native Mode)\n", + "\n", + "> [Firestore](https://cloud.google.com/firestore) is a serverless document-oriented database that scales to meet any demand. Extend your database application to build AI-powered experiences leveraging Firestore's Langchain integrations.\n", + "\n", + "This notebook goes over how to use [Firestore](https://cloud.google.com/firestore) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `FirestoreLoader` and `FirestoreSaver`.\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-firestore-python/blob/main/docs/document_loader.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + "* [Create a Firestore database](https://cloud.google.com/firestore/docs/manage-databases)\n", + "\n", + "After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please specify a source for demo purpose.\n", + "SOURCE = \"test\" # @param {type:\"Query\"|\"CollectionGroup\"|\"DocumentReference\"|\"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "\n", + "The integration lives in its own `langchain-google-firestore` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%pip install -upgrade --quiet langchain-google-firestore" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🔐 Authentication\n", + "\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "- If you are using Colab to run this notebook, use the cell below and continue.\n", + "- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### API Enablement\n", + "The `langchain-google-firestore` package requires that you [enable the Firestore Admin API](https://console.cloud.google.com/flows/enableapi?apiid=firestore.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# enable Firestore Admin API\n", + "!gcloud services enable firestore.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Save documents\n", + "\n", + "`FirestoreSaver` can store Documents into Firestore. By default it will try to extract the Document reference from the metadata\n", + "\n", + "Save langchain documents with `FirestoreSaver.upsert_documents()`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_core.documents.base import Document\n", + "from langchain_google_firestore import FirestoreSaver\n", + "\n", + "saver = FirestoreSaver()\n", + "\n", + "data = [Document(page_content=\"Hello, World!\")]\n", + "\n", + "saver.upsert_documents(data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Save documents without reference\n", + "\n", + "If a collection is specified the documents will be stored with an auto generated id." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "saver = FirestoreSaver(\"Collection\")\n", + "\n", + "saver.upsert_documents(data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Save documents with other references" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "doc_ids = [\"AnotherCollection/doc_id\", \"foo/bar\"]\n", + "saver = FirestoreSaver()\n", + "\n", + "saver.upsert_documents(documents=data, document_ids=doc_ids)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load from Collection or SubCollection" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load langchain documents with `FirestoreLoader.load()` or `Firestore.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `FirestoreLoader` class you need to provide:\n", + "1. `source` - An instance of a Query, CollectionGroup, DocumentReference or the single `\\`-delimited path to a Firestore collection`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_firestore import FirestoreLoader\n", + "\n", + "loader_collection = FirestoreLoader(\"Collection\")\n", + "loader_subcollection = FirestoreLoader(\"Collection/doc/SubCollection\")\n", + "\n", + "\n", + "data_collection = loader_collection.load()\n", + "data_subcollection = loader_subcollection.load()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load a single Document" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.cloud import firestore\n", + "\n", + "client = firestore.Client()\n", + "doc_ref = client.collection(\"foo\").document(\"bar\")\n", + "\n", + "loader_document = FirestoreLoader(doc_ref)\n", + "\n", + "data = loader_document.load()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load from CollectionGroup or Query" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.cloud.firestore import CollectionGroup, FieldFilter, Query\n", + "\n", + "col_ref = client.collection(\"col_group\")\n", + "collection_group = CollectionGroup(col_ref)\n", + "\n", + "loader_group = FirestoreLoader(collection_group)\n", + "\n", + "col_ref = client.collection(\"collection\")\n", + "query = col_ref.where(filter=FieldFilter(\"region\", \"==\", \"west_coast\"))\n", + "\n", + "loader_query = FirestoreLoader(query)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete documents\n", + "\n", + "Delete a list of langchain documents from Firestore collection with `FirestoreSaver.delete_documents()`.\n", + "\n", + "If document ids is provided, the Documents will be ignored." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "saver = FirestoreSaver()\n", + "\n", + "saver.delete_documents(data)\n", + "\n", + "# The Documents will be ignored and only the document ids will be used.\n", + "saver.delete_documents(data, doc_ids)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Advanced Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load documents with customize document page content & metadata\n", + "\n", + "The arguments of `page_content_fields` and `metadata_fields` will specify the Firestore Document fields to be written into LangChain Document `page_content` and `metadata`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "loader = FirestoreLoader(\n", + " source=\"foo/bar/subcol\",\n", + " page_content_fields=[\"data_field\"],\n", + " metadata_fields=[\"metadata_field\"],\n", + ")\n", + "\n", + "data = loader.load()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Customize Page Content Format\n", + "\n", + "When the `page_content` contains only one field the information will be the field value only. Otherwise the `page_content` will be in JSON format." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Customize Connection & Authentication" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.auth import compute_engine\n", + "from google.cloud.firestore import Client\n", + "\n", + "client = Client(database=\"non-default-db\", creds=compute_engine.Credentials())\n", + "loader = FirestoreLoader(\n", + " source=\"foo\",\n", + " client=client,\n", + ")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/document_loaders/google_memorystore_redis.ipynb b/docs/docs/integrations/document_loaders/google_memorystore_redis.ipynb new file mode 100644 index 0000000000000..31548263cc530 --- /dev/null +++ b/docs/docs/integrations/document_loaders/google_memorystore_redis.ipynb @@ -0,0 +1,316 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "6-0_o3DxsFGi" + }, + "source": [ + "# Google Memorystore for Redis\n", + "\n", + "> [Google Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) is a fully-managed service that is powered by the Redis in-memory data store to build application caches that provide sub-millisecond data access. Extend your database application to build AI-powered experiences leveraging Memorystore for Redis's Langchain integrations.\n", + "\n", + "This notebook goes over how to use [Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `MemorystoreDocumentLoader` and `MemorystoreDocumentSaver`.\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-memorystore-redis-python/blob/main/docs/document_loader.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + "* [Create a Memorystore for Redis instance](https://cloud.google.com/memorystore/docs/redis/create-instance-console). Ensure that the version is greater than or equal to 5.0.\n", + "\n", + "After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please specify an endpoint associated with the instance and a key prefix for demo purpose.\n", + "ENDPOINT = \"redis://127.0.0.1:6379\" # @param {type:\"string\"}\n", + "KEY_PREFIX = \"doc:\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "\n", + "The integration lives in its own `langchain-google-memorystore-redis` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -upgrade --quiet langchain-google-memorystore-redis" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🔐 Authentication\n", + "\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "- If you are using Colab to run this notebook, use the cell below and continue.\n", + "- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2L7kMu__sFGl" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Save documents\n", + "\n", + "Save langchain documents with `MemorystoreDocumentSaver.add_documents()`. To initialize `MemorystoreDocumentSaver` class you need to provide 2 things:\n", + "1. `client` - A `redis.Redis` client object.\n", + "1. `key_prefix` - A prefix for the keys to store Documents in Redis.\n", + "\n", + "The Documents will be stored into randomly generated keys with the specified prefix of `key_prefix`. Alternatively, you can designate the suffixes of the keys by specifying `ids` in the `add_documents` method." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import redis\n", + "from langchain_core.documents.base import Document\n", + "from langchain_google_memorystore_redis import MemorystoreDocumentSaver\n", + "\n", + "test_docs = [\n", + " Document(\n", + " page_content=\"Apple Granny Smith 150 0.99 1\",\n", + " metadata={\"fruit_id\": 1},\n", + " ),\n", + " Document(\n", + " page_content=\"Banana Cavendish 200 0.59 0\",\n", + " metadata={\"fruit_id\": 2},\n", + " ),\n", + " Document(\n", + " page_content=\"Orange Navel 80 1.29 1\",\n", + " metadata={\"fruit_id\": 3},\n", + " ),\n", + "]\n", + "doc_ids = [f\"{i}\" for i in range(len(test_docs))]\n", + "\n", + "redis_client = redis.from_url(ENDPOINT)\n", + "saver = MemorystoreDocumentSaver(\n", + " client=redis_client,\n", + " key_prefix=KEY_PREFIX,\n", + " content_field=\"page_content\",\n", + ")\n", + "saver.add_documents(test_docs, ids=doc_ids)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A2fT1iEhsFGl" + }, + "source": [ + "### Load documents\n", + "\n", + "Initialize a loader that loads all documents stored in the Memorystore for Redis instance with a specific prefix.\n", + "\n", + "Load langchain documents with `MemorystoreDocumentLoader.load()` or `MemorystoreDocumentLoader.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `MemorystoreDocumentLoader` class you need to provide:\n", + "1. `client` - A `redis.Redis` client object.\n", + "1. `key_prefix` - A prefix for the keys to store Documents in Redis." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YEDKWR6asFGl" + }, + "outputs": [], + "source": [ + "import redis\n", + "from langchain_google_memorystore_redis import MemorystoreDocumentLoader\n", + "\n", + "redis_client = redis.from_url(ENDPOINT)\n", + "loader = MemorystoreDocumentLoader(\n", + " client=redis_client,\n", + " key_prefix=KEY_PREFIX,\n", + " content_fields=set([\"page_content\"]),\n", + ")\n", + "for doc in loader.lazy_load():\n", + " print(\"Loaded documents:\", doc)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete documents\n", + "\n", + "Delete all of keys with the specified prefix in the Memorystore for Redis instance with `MemorystoreDocumentSaver.delete()`. You can also specify the suffixes of the keys if you know." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "docs = loader.load()\n", + "print(\"Documents before delete:\", docs)\n", + "\n", + "saver.delete(ids=[0])\n", + "print(\"Documents after delete:\", loader.load())\n", + "\n", + "saver.delete()\n", + "print(\"Documents after delete all:\", loader.load())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Advanced Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "02xxvmzTsFGm" + }, + "source": [ + "### Customize Document Page Content & Metadata\n", + "\n", + "When initializing a loader with more than 1 content field, the `page_content` of the loaded Documents will contain a JSON-encoded string with top level fields equal to the specified fields in `content_fields`.\n", + "\n", + "If the `metadata_fields` are specified, the `metadata` field of the loaded Documents will only have the top level fields equal to the specified `metadata_fields`. If any of the values of the metadata fields is stored as a JSON-encoded string, it will be decoded prior to being loaded to the metadata fields." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BvS3UFsysFGm" + }, + "outputs": [], + "source": [ + "loader = MemorystoreDocumentLoader(\n", + " client=redis_client,\n", + " key_prefix=KEY_PREFIX,\n", + " content_fields=set([\"content_field_1\", \"content_field_2\"]),\n", + " metadata_fields=set([\"title\", \"author\"]),\n", + ")" + ] + } + ], + "metadata": { + "colab": { + "provenance": [ + { + "file_id": "1kuFhDfyzOdzS1apxQ--1efXB1pJ79yVY", + "timestamp": 1708033015250 + } + ] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.6" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/docs/docs/integrations/document_loaders/google_spanner.ipynb b/docs/docs/integrations/document_loaders/google_spanner.ipynb new file mode 100644 index 0000000000000..d9767e382f019 --- /dev/null +++ b/docs/docs/integrations/document_loaders/google_spanner.ipynb @@ -0,0 +1,510 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google Spanner\n", + "\n", + "> [Spanner](https://cloud.google.com/spanner) is a highly scalable database that combines unlimited scalability with relational semantics, such as secondary indexes, strong consistency, schemas, and SQL providing 99.999% availability in one easy solution. Extend your database application to build AI-powered experiences leveraging Spanner's Langchain integrations.\n", + "\n", + "This notebook goes over how to use [Spanner](https://cloud.google.com/spanner) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `SpannerLoader` and `SpannerDocumentSaver`.\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-spanner-python/blob/main/docs/document_loader.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + "* [Create a Spanner instance](https://cloud.google.com/spanner/docs/create-manage-instances)\n", + "* [Create a Spanner database](https://cloud.google.com/spanner/docs/create-manage-databases)\n", + "* [Create a Spanner table](https://cloud.google.com/spanner/docs/create-query-database-console#create-schema)\n", + "\n", + "After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please specify an instance id, a database, and a table for demo purpose.\n", + "INSTANCE_ID = \"test_instance\" # @param {type:\"string\"}\n", + "DATABASE_ID = \"test_database\" # @param {type:\"string\"}\n", + "TABLE_NAME = \"test_table\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "\n", + "The integration lives in its own `langchain-google-spanner` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%pip install -upgrade --quiet langchain-google-spanner" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🔐 Authentication\n", + "\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "- If you are using Colab to run this notebook, use the cell below and continue.\n", + "- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Save documents\n", + "\n", + "Save langchain documents with `SpannerDocumentSaver.add_documents()`. To initialize `SpannerDocumentSaver` class you need to provide 3 things:\n", + "1. `instance_id` - An instance of Spanner to load data from.\n", + "1. `DATABASE_ID,` - An instance of Spanner database to load data from.\n", + "1. `table_name` - The name of the table within the Spanner database to store langchain documents." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_core.documents import Document\n", + "from langchain_google_spanner import SpannerDocumentSaver\n", + "\n", + "test_docs = [\n", + " Document(\n", + " page_content=\"Apple Granny Smith 150 0.99 1\",\n", + " metadata={\"fruit_id\": 1},\n", + " ),\n", + " Document(\n", + " page_content=\"Banana Cavendish 200 0.59 0\",\n", + " metadata={\"fruit_id\": 2},\n", + " ),\n", + " Document(\n", + " page_content=\"Orange Navel 80 1.29 1\",\n", + " metadata={\"fruit_id\": 3},\n", + " ),\n", + "]\n", + "\n", + "saver = SpannerDocumentSaver(\n", + " instance_id=INSTANCE_ID,\n", + " database_id=DATABASE_ID,\n", + " table_name=TABLE_NAME,\n", + ")\n", + "saver.add_documents(test_docs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Querying for Documents from Spanner\n", + "\n", + "For more details on connecting to a Spanner table, please check the [Python SDK documentation](https://cloud.google.com/python/docs/reference/spanner/latest)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Load documents from table\n", + "\n", + "Load langchain documents with `SpannerLoader.load()` or `SpannerLoader.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `SpannerLoader` class you need to provide:\n", + "1. `instance_id` - An instance of Spanner to load data from.\n", + "1. `DATABASE_ID,` - An instance of Spanner database to load data from.\n", + "1. `query` - A query of the database dialect." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_spanner import SpannerLoader\n", + "\n", + "query = f\"SELECT * from {TABLE_NAME}\"\n", + "loader = SpannerLoader(\n", + " instance_id=INSTANCE_ID,\n", + " database_id=DATABASE_ID,\n", + " query=query,\n", + ")\n", + "\n", + "for doc in loader.lazy_load():\n", + " print(doc)\n", + " break" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete documents\n", + "\n", + "Delete a list of langchain documents from the table with `SpannerDocumentSaver.delete()`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "docs = loader.load()\n", + "print(\"Documents before delete:\", docs)\n", + "\n", + "doc = test_docs[0]\n", + "saver.delete([doc])\n", + "print(\"Documents after delete:\", loader.load())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Advanced Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Customize Document Page Content & Metadata\n", + "\n", + "The loader will returns a list of Documents with page content from a specific data columns. All other data columns will be added to metadata. Each row becomes a document." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Customize page content format\n", + "\n", + "The SpannerLoader assumes there is a column called `page_content`. These defaults can be changed like so:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "custom_content_loader = SpannerLoader(\n", + " INSTANCE_ID, DATABASE_ID, query, content_columns=[\"custom_content\"]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If multiple columns are specified, the page content's string format will default to `text` (space-separated string concatenation). There are other format that user can specify, including `text`, `JSON`, `YAML`, `CSV`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Customize metadata format\n", + "\n", + "The SpannerLoader assumes there is a metadata column called `langchain_metadata` that store JSON data. The metadata column will be used as the base dictionary. By default, all other column data will be added and may overwrite the original value. These defaults can be changed like so:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "custom_metadata_loader = SpannerLoader(\n", + " INSTANCE_ID, DATABASE_ID, query, metadata_columns=[\"column1\", \"column2\"]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Customize JSON metadata column name\n", + "\n", + "By default, the loader uses `langchain_metadata` as the base dictionary. This can be customized to select a JSON column to use as base dictionary for the Document's metadata." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "custom_metadata_json_loader = SpannerLoader(\n", + " INSTANCE_ID, DATABASE_ID, query, metadata_json_column=\"another-json-column\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Custom staleness\n", + "\n", + "The default [staleness](https://cloud.google.com/python/docs/reference/spanner/latest/snapshot-usage#beginning-a-snapshot) is 15s. This can be customized by specifying a weaker bound (which can either be to perform all reads as of a given timestamp), or as of a given duration in the past." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import datetime\n", + "\n", + "timestamp = datetime.datetime.utcnow()\n", + "custom_timestamp_loader = SpannerLoader(\n", + " INSTANCE_ID,\n", + " DATABASE_ID,\n", + " query,\n", + " staleness=timestamp,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "duration = 20.0\n", + "custom_duration_loader = SpannerLoader(\n", + " INSTANCE_ID,\n", + " DATABASE_ID,\n", + " query,\n", + " staleness=duration,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Turn on data boost\n", + "\n", + "By default, the loader will not use [data boost](https://cloud.google.com/spanner/docs/databoost/databoost-overview) since it has additional costs associated, and require additional IAM permissions. However, user can choose to turn it on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "custom_databoost_loader = SpannerLoader(\n", + " INSTANCE_ID,\n", + " DATABASE_ID,\n", + " query,\n", + " databoost=True,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Custom client\n", + "\n", + "The client created by default is the default client. To pass in `credentials` and `project` explicitly, a custom client can be passed to the constructor." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.cloud import spanner\n", + "from google.oauth2 import service_account\n", + "\n", + "creds = service_account.Credentials.from_service_account_file(\"/path/to/key.json\")\n", + "custom_client = spanner.Client(project=\"my-project\", credentials=creds)\n", + "saver = SpannerDocumentSaver(\n", + " INSTANCE_ID,\n", + " DATABASE_ID,\n", + " TABLE_NAME,\n", + " client=custom_client,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Custom initialization for SpannerDocumentSaver\n", + "\n", + "The SpannerDocumentSaver allows custom initialization. This allows user to specify how the Document is saved into the table.\n", + "\n", + "\n", + "content_column: This will be used as the column name for the Document's page content. Defaulted to `page_content`.\n", + "\n", + "metadata_columns: These metadata will be saved into specific columns if the key exists in the Document's metadata.\n", + "\n", + "metadata_json_column: This will be the column name for the spcial JSON column. Defaulted to `langchain_metadata`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "custom_saver = SpannerDocumentSaver(\n", + " INSTANCE_ID,\n", + " DATABASE_ID,\n", + " TABLE_NAME,\n", + " content_column=\"my-content\",\n", + " metadata_columns=[\"foo\"],\n", + " metadata_json_column=\"my-special-json-column\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize custom schema for Spanner\n", + "\n", + "The SpannerDocumentSaver will have a `init_document_table` method to create a new table to store docs with custom schema." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_spanner import Column\n", + "\n", + "new_table_name = \"my_new_table\"\n", + "\n", + "SpannerDocumentSaver.init_document_table(\n", + " INSTANCE_ID,\n", + " DATABASE_ID,\n", + " new_table_name,\n", + " content_column=\"my-page-content\",\n", + " metadata_columns=[\n", + " Column(\"category\", \"STRING(36)\", True),\n", + " Column(\"price\", \"FLOAT64\", False),\n", + " ],\n", + ")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/memory/firestore_chat_message_history.ipynb b/docs/docs/integrations/memory/firestore_chat_message_history.ipynb deleted file mode 100644 index 8bdd586cd4896..0000000000000 --- a/docs/docs/integrations/memory/firestore_chat_message_history.ipynb +++ /dev/null @@ -1,147 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "91c6a7ef", - "metadata": {}, - "source": [ - "# Google Cloud Firestore\n", - "\n", - "> [`Cloud Firestore`](https://cloud.google.com/firestore) is a NoSQL document database built for automatic scaling, high performance, and ease of application development.\n", - "\n", - "This notebook goes over how to use Firestore to store chat message history." - ] - }, - { - "cell_type": "markdown", - "id": "2d6ed3c8-b70a-498c-bc9e-41b91797d3b7", - "metadata": {}, - "source": [ - "## Setting up" - ] - }, - { - "cell_type": "markdown", - "id": "b8eca282", - "metadata": {}, - "source": [ - "To run this notebook, you will need a Google Cloud Project, a Firestore database instance in Native Mode, and Google credentials, see [Firestore Quickstarts](https://cloud.google.com/firestore/docs/quickstarts)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "5a7f3b3f-d9b8-4577-a7ef-bdd8ecaedb70", - "metadata": {}, - "outputs": [], - "source": [ - "!pip install firebase-admin" - ] - }, - { - "cell_type": "markdown", - "id": "a8e63850-3e14-46fe-a59e-be6d6bf8fe61", - "metadata": {}, - "source": [ - "## Basic Usage" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "d15e3302", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain_community.chat_message_histories.firestore import (\n", - " FirestoreChatMessageHistory,\n", - ")\n", - "\n", - "message_history = FirestoreChatMessageHistory(\n", - " collection_name=\"langchain-chat-history\",\n", - " session_id=\"user-session-id\",\n", - " user_id=\"user-id\",\n", - ")\n", - "\n", - "message_history.add_user_message(\"hi!\")\n", - "message_history.add_ai_message(\"whats up?\")" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "64fc465e", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[HumanMessage(content='hi!'),\n", - " HumanMessage(content='hi!'),\n", - " AIMessage(content='whats up?')]" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "message_history.messages" - ] - }, - { - "cell_type": "markdown", - "id": "4be8576e", - "metadata": {}, - "source": [ - "## Custom Firestore Client" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "12999273", - "metadata": {}, - "outputs": [], - "source": [ - "import firebase_admin\n", - "from firebase_admin import credentials, firestore\n", - "\n", - "# Use a service account.\n", - "cred = credentials.Certificate(\"path/to/serviceAccount.json\")\n", - "\n", - "app = firebase_admin.initialize_app(cred)\n", - "client = firestore.client(app=app)\n", - "\n", - "message_history = FirestoreChatMessageHistory(\n", - " collection_name=\"langchain-chat-history\",\n", - " session_id=\"user-session-id\",\n", - " user_id=\"user-id\",\n", - " firestore_client=client,\n", - ")" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.5" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/docs/docs/integrations/memory/google_alloydb.ipynb b/docs/docs/integrations/memory/google_alloydb.ipynb new file mode 100644 index 0000000000000..22dd7eef3cded --- /dev/null +++ b/docs/docs/integrations/memory/google_alloydb.ipynb @@ -0,0 +1,394 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google AlloyDB for PostgreSQL\n", + "\n", + "> [AlloyDB](https://cloud.google.com/alloydb) is a fully managed PostgreSQL compatible database service for your most demanding enterprise workloads. AlloyDB combines the best of Google with PostgreSQL, for superior performance, scale, and availability. Extend your database application to build AI-powered experiences leveraging AlloyDB Langchain integrations.\n", + "\n", + "This notebook goes over how to use `AlloyDB for PostgreSQL` to store chat message history with the `AlloyDBChatMessageHistory` class." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + " * [Create a AlloyDB instance](https://cloud.google.com/alloydb/docs/instance-primary-create)\n", + " * [Create a AlloyDB database](https://cloud.google.com/alloydb/docs/database-create)\n", + " * [Add an IAM database user to the database](https://cloud.google.com/alloydb/docs/manage-iam-authn) (Optional)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "The integration lives in its own `langchain-google-alloydb-pg` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet langchain-google-alloydb-pg langchain-google-vertexai" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🔐 Authentication\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "* If you are using Colab to run this notebook, use the cell below and continue.\n", + "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 💡 API Enablement\n", + "The `langchain-google-alloydb-pg` package requires that you [enable the AlloyDB Admin API](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# enable AlloyDB API\n", + "!gcloud services enable alloydb.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set AlloyDB database values\n", + "Find your database values, in the [AlloyDB cluster page](https://console.cloud.google.com/alloydb?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @title Set Your Values Here { display-mode: \"form\" }\n", + "REGION = \"us-central1\" # @param {type: \"string\"}\n", + "CLUSTER = \"my-alloydb-cluster\" # @param {type: \"string\"}\n", + "INSTANCE = \"my-alloydb-instance\" # @param {type: \"string\"}\n", + "DATABASE = \"my-database\" # @param {type: \"string\"}\n", + "TABLE_NAME = \"message_store\" # @param {type: \"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### AlloyDBEngine Connection Pool\n", + "\n", + "One of the requirements and arguments to establish AlloyDB as a ChatMessageHistory memory store is a `AlloyDBEngine` object. The `AlloyDBEngine` configures a connection pool to your AlloyDB database, enabling successful connections from your application and following industry best practices.\n", + "\n", + "To create a `AlloyDBEngine` using `AlloyDBEngine.from_instance()` you need to provide only 4 things:\n", + "\n", + "1. `project_id` : Project ID of the Google Cloud Project where the AlloyDB instance is located.\n", + "1. `region` : Region where the AlloyDB instance is located.\n", + "1. `cluster`: The name of the AlloyDB cluster.\n", + "1. `instance` : The name of the AlloyDB instance.\n", + "1. `database` : The name of the database to connect to on the AlloyDB instance.\n", + "\n", + "By default, [IAM database authentication](https://cloud.google.com/alloydb/docs/manage-iam-authn) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n", + "\n", + "Optionally, [built-in database authentication](https://cloud.google.com/alloydb/docs/database-users/about) using a username and password to access the AlloyDB database can also be used. Just provide the optional `user` and `password` arguments to `AlloyDBEngine.from_instance()`:\n", + "* `user` : Database user to use for built-in database authentication and login\n", + "* `password` : Database password to use for built-in database authentication and login.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_alloydb_pg import AlloyDBEngine\n", + "\n", + "engine = AlloyDBEngine.from_instance(\n", + " project_id=PROJECT_ID,\n", + " region=REGION,\n", + " cluster=CLUSTER,\n", + " instance=INSTANCE,\n", + " database=DATABASE,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize a table\n", + "The `AlloyDBChatMessageHistory` class requires a database table with a specific schema in order to store the chat message history.\n", + "\n", + "The `AlloyDBEngine` engine has a helper method `init_chat_history_table()` that can be used to create a table with the proper schema for you." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "engine.init_chat_history_table(table_name=TABLE_NAME)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### AlloyDBChatMessageHistory\n", + "\n", + "To initialize the `AlloyDBChatMessageHistory` class you need to provide only 3 things:\n", + "\n", + "1. `engine` - An instance of a `AlloyDBEngine` engine.\n", + "1. `session_id` - A unique identifier string that specifies an id for the session.\n", + "1. `table_name` : The name of the table within the AlloyDB database to store the chat message history." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_alloydb_pg import AlloyDBChatMessageHistory\n", + "\n", + "history = AlloyDBChatMessageHistory.create_sync(\n", + " engine, session_id=\"test_session\", table_name=TABLE_NAME\n", + ")\n", + "history.add_user_message(\"hi!\")\n", + "history.add_ai_message(\"whats up?\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "history.messages" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Cleaning up\n", + "When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n", + "\n", + "**Note:** Once deleted, the data is no longer stored in AlloyDB and is gone forever." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "history.clear()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 🔗 Chaining\n", + "\n", + "We can easily combine this message history class with [LCEL Runnables](/docs/expression_language/how_to/message_history)\n", + "\n", + "To do this we will use one of [Google's Vertex AI chat models](https://python.langchain.com/docs/integrations/chat/google_vertex_ai_palm) which requires that you [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com) in your Google Cloud Project.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# enable Vertex AI API\n", + "!gcloud services enable aiplatform.googleapis.com" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n", + "from langchain_core.runnables.history import RunnableWithMessageHistory\n", + "from langchain_google_vertexai import ChatVertexAI" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "prompt = ChatPromptTemplate.from_messages(\n", + " [\n", + " (\"system\", \"You are a helpful assistant.\"),\n", + " MessagesPlaceholder(variable_name=\"history\"),\n", + " (\"human\", \"{question}\"),\n", + " ]\n", + ")\n", + "\n", + "chain = prompt | ChatVertexAI(project=PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "chain_with_history = RunnableWithMessageHistory(\n", + " chain,\n", + " lambda session_id: AlloyDBChatMessageHistory.create_sync(\n", + " engine,\n", + " session_id=session_id,\n", + " table_name=TABLE_NAME,\n", + " ),\n", + " input_messages_key=\"question\",\n", + " history_messages_key=\"history\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# This is where we configure the session id\n", + "config = {\"configurable\": {\"session_id\": \"test_session\"}}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "chain_with_history.invoke({\"question\": \"Hi! I'm bob\"}, config=config)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "chain_with_history.invoke({\"question\": \"Whats my name\"}, config=config)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/memory/google_bigtable.ipynb b/docs/docs/integrations/memory/google_bigtable.ipynb new file mode 100644 index 0000000000000..7e57dcf5a4398 --- /dev/null +++ b/docs/docs/integrations/memory/google_bigtable.ipynb @@ -0,0 +1,288 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google Bigtable\n", + "\n", + "> [Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data. Extend your database application to build AI-powered experiences leveraging Bigtable's Langchain integrations.\n", + "\n", + "This notebook goes over how to use [Bigtable](https://cloud.google.com/bigtable) to store chat message history with the `BigtableChatMessageHistory` class.\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-bigtable-python/blob/main/docs/chat_message_history.ipynb)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + "* [Create a Bigtable instance](https://cloud.google.com/bigtable/docs/creating-instance)\n", + "* [Create a Bigtable table](https://cloud.google.com/bigtable/docs/managing-tables)\n", + "* [Create Bigtable access credentials](https://developers.google.com/workspace/guides/create-credentials)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "\n", + "The integration lives in its own `langchain-google-bigtable` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -upgrade --quiet langchain-google-bigtable" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🔐 Authentication\n", + "\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "- If you are using Colab to run this notebook, use the cell below and continue.\n", + "- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize Bigtable schema\n", + "\n", + "The schema for BigtableChatMessageHistory requires the instance and table to exist, and have a column family called `langchain`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please specify an instance and a table for demo purpose.\n", + "INSTANCE_ID = \"my_instance\" # @param {type:\"string\"}\n", + "TABLE_ID = \"my_table\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the table or the column family do not exist, you can use the following function to create them:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.cloud import bigtable\n", + "from langchain_google_bigtable import create_chat_history_table\n", + "\n", + "create_chat_history_table(\n", + " instance_id=INSTANCE_ID,\n", + " table_id=TABLE_ID,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### BigtableChatMessageHistory\n", + "\n", + "To initialize the `BigtableChatMessageHistory` class you need to provide only 3 things:\n", + "\n", + "1. `instance_id` - The Bigtable instance to use for chat message history.\n", + "1. `table_id` : The Bigtable table to store the chat message history.\n", + "1. `session_id` - A unique identifier string that specifies an id for the session." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_bigtable import BigtableChatMessageHistory\n", + "\n", + "message_history = BigtableChatMessageHistory(\n", + " instance_id=INSTANCE_ID,\n", + " table_id=TABLE_ID,\n", + " session_id=\"user-session-id\",\n", + ")\n", + "\n", + "message_history.add_user_message(\"hi!\")\n", + "message_history.add_ai_message(\"whats up?\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "message_history.messages" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Cleaning up\n", + "\n", + "When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n", + "\n", + "**Note:** Once deleted, the data is no longer stored in Bigtable and is gone forever." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "message_history.clear()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Advanced Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Custom client\n", + "The client created by default is the default client, using only admin=True option. To use a non-default, a [custom client](https://cloud.google.com/python/docs/reference/bigtable/latest/client#class-googlecloudbigtableclientclientprojectnone-credentialsnone-readonlyfalse-adminfalse-clientinfonone-clientoptionsnone-adminclientoptionsnone-channelnone) can be passed to the constructor." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.cloud import bigtable\n", + "\n", + "client = (bigtable.Client(...),)\n", + "\n", + "create_chat_history_table(\n", + " instance_id=\"my-instance\",\n", + " table_id=\"my-table\",\n", + " client=client,\n", + ")\n", + "\n", + "custom_client_message_history = BigtableChatMessageHistory(\n", + " instance_id=\"my-instance\",\n", + " table_id=\"my-table\",\n", + " client=client,\n", + ")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/memory/google_cloud_sql_mssql.ipynb b/docs/docs/integrations/memory/google_cloud_sql_mssql.ipynb new file mode 100644 index 0000000000000..15dc621bdbc84 --- /dev/null +++ b/docs/docs/integrations/memory/google_cloud_sql_mssql.ipynb @@ -0,0 +1,553 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "f22eab3f84cbeb37", + "metadata": { + "id": "f22eab3f84cbeb37" + }, + "source": [ + "# Google Cloud SQL for SQL Server\n", + "\n", + "> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers MySQL, PostgreSQL, and SQL Server database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n", + "\n", + "This notebook goes over how to use `Cloud SQL for SQL Server` to store chat message history with the `MSSQLChatMessageHistory` class." + ] + }, + { + "cell_type": "markdown", + "id": "da400c79-a360-43e2-be60-401fd02b2819", + "metadata": { + "id": "da400c79-a360-43e2-be60-401fd02b2819" + }, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + " * [Create a Cloud SQL for SQL Server instance](https://cloud.google.com/sql/docs/sqlserver/create-instance)\n", + " * [Create a Cloud SQL database](https://cloud.google.com/sql/docs/sqlserver/create-manage-databases)\n", + " * [Create a database user](https://cloud.google.com/sql/docs/sqlserver/create-manage-users) (Optional if you choose to use the `sqlserver` user)" + ] + }, + { + "cell_type": "markdown", + "id": "Mm7-fG_LltD7", + "metadata": { + "id": "Mm7-fG_LltD7" + }, + "source": [ + "### 🦜🔗 Library Installation\n", + "The integration lives in its own `langchain-google-cloud-sql-mssql` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1VELXvcj8AId", + "metadata": { + "id": "1VELXvcj8AId" + }, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet langchain-google-cloud-sql-mssql langchain-google-vertexai" + ] + }, + { + "cell_type": "markdown", + "id": "98TVoM3MNDHu", + "metadata": { + "id": "98TVoM3MNDHu" + }, + "source": [ + "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "v6jBDnYnNM08", + "metadata": { + "id": "v6jBDnYnNM08" + }, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "id": "yygMe6rPWxHS", + "metadata": { + "id": "yygMe6rPWxHS" + }, + "source": [ + "### 🔐 Authentication\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "* If you are using Colab to run this notebook, use the cell below and continue.\n", + "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "PTXN1_DSXj2b", + "metadata": { + "id": "PTXN1_DSXj2b" + }, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "id": "NEvB9BoLEulY", + "metadata": { + "id": "NEvB9BoLEulY" + }, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "gfkS3yVRE4_W", + "metadata": { + "cellView": "form", + "id": "gfkS3yVRE4_W" + }, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "id": "rEWWNoNnKOgq", + "metadata": { + "id": "rEWWNoNnKOgq" + }, + "source": [ + "### 💡 API Enablement\n", + "The `langchain-google-cloud-sql-mssql` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5utKIdq7KYi5", + "metadata": { + "id": "5utKIdq7KYi5" + }, + "outputs": [], + "source": [ + "# enable Cloud SQL Admin API\n", + "!gcloud services enable sqladmin.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "id": "f8f2830ee9ca1e01", + "metadata": { + "id": "f8f2830ee9ca1e01" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "id": "OMvzMWRrR6n7", + "metadata": { + "id": "OMvzMWRrR6n7" + }, + "source": [ + "### Set Cloud SQL database values\n", + "Find your database values, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "irl7eMFnSPZr", + "metadata": { + "id": "irl7eMFnSPZr" + }, + "outputs": [], + "source": [ + "# @title Set Your Values Here { display-mode: \"form\" }\n", + "REGION = \"us-central1\" # @param {type: \"string\"}\n", + "INSTANCE = \"my-mssql-instance\" # @param {type: \"string\"}\n", + "DATABASE = \"my-database\" # @param {type: \"string\"}\n", + "DB_USER = \"my-username\" # @param {type: \"string\"}\n", + "DB_PASS = \"my-password\" # @param {type: \"string\"}\n", + "TABLE_NAME = \"message_store\" # @param {type: \"string\"}" + ] + }, + { + "cell_type": "markdown", + "id": "QuQigs4UoFQ2", + "metadata": { + "id": "QuQigs4UoFQ2" + }, + "source": [ + "### MSSQLEngine Connection Pool\n", + "\n", + "One of the requirements and arguments to establish Cloud SQL as a ChatMessageHistory memory store is a `MSSQLEngine` object. The `MSSQLEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n", + "\n", + "To create a `MSSQLEngine` using `MSSQLEngine.from_instance()` you need to provide only 6 things:\n", + "\n", + "1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n", + "1. `region` : Region where the Cloud SQL instance is located.\n", + "1. `instance` : The name of the Cloud SQL instance.\n", + "1. `database` : The name of the database to connect to on the Cloud SQL instance.\n", + "1. `user` : Database user to use for built-in database authentication and login.\n", + "1. `password` : Database password to use for built-in database authentication and login.\n", + "\n", + "By default, [built-in database authentication](https://cloud.google.com/sql/docs/sqlserver/users) using a username and password to access the Cloud SQL database is used for database authentication.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "4576e914a866fb40", + "metadata": { + "ExecuteTime": { + "end_time": "2023-08-28T10:04:38.077748Z", + "start_time": "2023-08-28T10:04:36.105894Z" + }, + "id": "4576e914a866fb40", + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_mssql import MSSQLEngine\n", + "\n", + "engine = MSSQLEngine.from_instance(\n", + " project_id=PROJECT_ID,\n", + " region=REGION,\n", + " instance=INSTANCE,\n", + " database=DATABASE,\n", + " user=DB_USER,\n", + " password=DB_PASS,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "qPV8WfWr7O54", + "metadata": { + "id": "qPV8WfWr7O54" + }, + "source": [ + "### Initialize a table\n", + "The `MSSQLChatMessageHistory` class requires a database table with a specific schema in order to store the chat message history.\n", + "\n", + "The `MSSQLEngine` engine has a helper method `init_chat_history_table()` that can be used to create a table with the proper schema for you." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "TEu4VHArRttE", + "metadata": { + "id": "TEu4VHArRttE" + }, + "outputs": [], + "source": [ + "engine.init_chat_history_table(table_name=TABLE_NAME)" + ] + }, + { + "cell_type": "markdown", + "id": "zSYQTYf3UfOi", + "metadata": { + "id": "zSYQTYf3UfOi" + }, + "source": [ + "### MSSQLChatMessageHistory\n", + "\n", + "To initialize the `MSSQLChatMessageHistory` class you need to provide only 3 things:\n", + "\n", + "1. `engine` - An instance of a `MSSQLEngine` engine.\n", + "1. `session_id` - A unique identifier string that specifies an id for the session.\n", + "1. `table_name` : The name of the table within the Cloud SQL database to store the chat message history." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "Kq7RLtfOq0wi", + "metadata": { + "id": "Kq7RLtfOq0wi" + }, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_mssql import MSSQLChatMessageHistory\n", + "\n", + "history = MSSQLChatMessageHistory(\n", + " engine, session_id=\"test_session\", table_name=TABLE_NAME\n", + ")\n", + "history.add_user_message(\"hi!\")\n", + "history.add_ai_message(\"whats up?\")" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "b476688cbb32ba90", + "metadata": { + "ExecuteTime": { + "end_time": "2023-08-28T10:04:38.929396Z", + "start_time": "2023-08-28T10:04:38.915727Z" + }, + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "b476688cbb32ba90", + "jupyter": { + "outputs_hidden": false + }, + "outputId": "f8c170e8-ea9d-4905-a9f4-bc83f9726ac5" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[HumanMessage(content='hi!'), AIMessage(content='whats up?')]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "history.messages" + ] + }, + { + "cell_type": "markdown", + "id": "ss6CbqcTTedr", + "metadata": { + "id": "ss6CbqcTTedr" + }, + "source": [ + "#### Cleaning up\n", + "When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n", + "\n", + "**Note:** Once deleted, the data is no longer stored in Cloud SQL and is gone forever." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "3khxzFxYO7x6", + "metadata": { + "id": "3khxzFxYO7x6" + }, + "outputs": [], + "source": [ + "history.clear()" + ] + }, + { + "cell_type": "markdown", + "id": "2e5337719d5614fd", + "metadata": { + "id": "2e5337719d5614fd" + }, + "source": [ + "## 🔗 Chaining\n", + "\n", + "We can easily combine this message history class with [LCEL Runnables](/docs/expression_language/how_to/message_history)\n", + "\n", + "To do this we will use one of [Google's Vertex AI chat models](https://python.langchain.com/docs/integrations/chat/google_vertex_ai_palm) which requires that you [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com) in your Google Cloud Project.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "hYtHM3-TOMCe", + "metadata": { + "id": "hYtHM3-TOMCe" + }, + "outputs": [], + "source": [ + "# enable Vertex AI API\n", + "!gcloud services enable aiplatform.googleapis.com" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "6558418b-0ece-4d01-9661-56d562d78f7a", + "metadata": { + "id": "6558418b-0ece-4d01-9661-56d562d78f7a" + }, + "outputs": [], + "source": [ + "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n", + "from langchain_core.runnables.history import RunnableWithMessageHistory\n", + "from langchain_google_vertexai import ChatVertexAI" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "82149122-61d3-490d-9bdb-bb98606e8ba1", + "metadata": { + "id": "82149122-61d3-490d-9bdb-bb98606e8ba1" + }, + "outputs": [], + "source": [ + "prompt = ChatPromptTemplate.from_messages(\n", + " [\n", + " (\"system\", \"You are a helpful assistant.\"),\n", + " MessagesPlaceholder(variable_name=\"history\"),\n", + " (\"human\", \"{question}\"),\n", + " ]\n", + ")\n", + "\n", + "chain = prompt | ChatVertexAI(project=PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "2df90853-b67c-490f-b7f8-b69d69270b9c", + "metadata": { + "id": "2df90853-b67c-490f-b7f8-b69d69270b9c" + }, + "outputs": [], + "source": [ + "chain_with_history = RunnableWithMessageHistory(\n", + " chain,\n", + " lambda session_id: MSSQLChatMessageHistory(\n", + " engine,\n", + " session_id=session_id,\n", + " table_name=TABLE_NAME,\n", + " ),\n", + " input_messages_key=\"question\",\n", + " history_messages_key=\"history\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "0ce596b8-3b78-48fd-9f92-46dccbbfd58b", + "metadata": { + "id": "0ce596b8-3b78-48fd-9f92-46dccbbfd58b" + }, + "outputs": [], + "source": [ + "# This is where we configure the session id\n", + "config = {\"configurable\": {\"session_id\": \"test_session\"}}" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "38e1423b-ba86-4496-9151-25932fab1a8b", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "38e1423b-ba86-4496-9151-25932fab1a8b", + "outputId": "750fcff4-6374-4978-defd-e30ee9bce05f" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "AIMessage(content=' Hello Bob, how can I help you today?')" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "chain_with_history.invoke({\"question\": \"Hi! I'm bob\"}, config=config)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "2ee4ee62-a216-4fb1-bf33-57476a84cf16", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "2ee4ee62-a216-4fb1-bf33-57476a84cf16", + "outputId": "01fdc638-81f3-4350-edb4-7609c586d3a7" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "AIMessage(content=' Your name is Bob.')" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "chain_with_history.invoke({\"question\": \"Whats my name\"}, config=config)" + ] + } + ], + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.1" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/docs/integrations/memory/google_cloud_sql_mysql.ipynb b/docs/docs/integrations/memory/google_cloud_sql_mysql.ipynb new file mode 100644 index 0000000000000..6b30f5b30bab4 --- /dev/null +++ b/docs/docs/integrations/memory/google_cloud_sql_mysql.ipynb @@ -0,0 +1,554 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "f22eab3f84cbeb37", + "metadata": { + "id": "f22eab3f84cbeb37" + }, + "source": [ + "# Google Cloud SQL for MySQL\n", + "\n", + "> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers MySQL, PostgreSQL, and SQL Server database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n", + "\n", + "This notebook goes over how to use `Cloud SQL for MySQL` to store chat message history with the `MySQLChatMessageHistory` class.\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-cloud-sql-mysql-python/blob/main/docs/chat_message_history.ipynb)" + ] + }, + { + "cell_type": "markdown", + "id": "da400c79-a360-43e2-be60-401fd02b2819", + "metadata": { + "id": "da400c79-a360-43e2-be60-401fd02b2819" + }, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + " * [Create a Cloud SQL for MySQL instance](https://cloud.google.com/sql/docs/mysql/create-instance)\n", + " * [Create a Cloud SQL database](https://cloud.google.com/sql/docs/mysql/create-manage-databases)\n", + " * [Add an IAM database user to the database](https://cloud.google.com/sql/docs/mysql/add-manage-iam-users#creating-a-database-user) (Optional)" + ] + }, + { + "cell_type": "markdown", + "id": "Mm7-fG_LltD7", + "metadata": { + "id": "Mm7-fG_LltD7" + }, + "source": [ + "### 🦜🔗 Library Installation\n", + "The integration lives in its own `langchain-google-cloud-sql-mysql` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1VELXvcj8AId", + "metadata": { + "id": "1VELXvcj8AId" + }, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet langchain-google-cloud-sql-mysql langchain-google-vertexai" + ] + }, + { + "cell_type": "markdown", + "id": "98TVoM3MNDHu", + "metadata": { + "id": "98TVoM3MNDHu" + }, + "source": [ + "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "v6jBDnYnNM08", + "metadata": { + "id": "v6jBDnYnNM08" + }, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "id": "yygMe6rPWxHS", + "metadata": { + "id": "yygMe6rPWxHS" + }, + "source": [ + "### 🔐 Authentication\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "* If you are using Colab to run this notebook, use the cell below and continue.\n", + "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "PTXN1_DSXj2b", + "metadata": { + "id": "PTXN1_DSXj2b" + }, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "id": "NEvB9BoLEulY", + "metadata": { + "id": "NEvB9BoLEulY" + }, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "gfkS3yVRE4_W", + "metadata": { + "cellView": "form", + "id": "gfkS3yVRE4_W" + }, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "id": "rEWWNoNnKOgq", + "metadata": { + "id": "rEWWNoNnKOgq" + }, + "source": [ + "### 💡 API Enablement\n", + "The `langchain-google-cloud-sql-mysql` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5utKIdq7KYi5", + "metadata": { + "id": "5utKIdq7KYi5" + }, + "outputs": [], + "source": [ + "# enable Cloud SQL Admin API\n", + "!gcloud services enable sqladmin.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "id": "f8f2830ee9ca1e01", + "metadata": { + "id": "f8f2830ee9ca1e01" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "id": "OMvzMWRrR6n7", + "metadata": { + "id": "OMvzMWRrR6n7" + }, + "source": [ + "### Set Cloud SQL database values\n", + "Find your database values, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "irl7eMFnSPZr", + "metadata": { + "id": "irl7eMFnSPZr" + }, + "outputs": [], + "source": [ + "# @title Set Your Values Here { display-mode: \"form\" }\n", + "REGION = \"us-central1\" # @param {type: \"string\"}\n", + "INSTANCE = \"my-mysql-instance\" # @param {type: \"string\"}\n", + "DATABASE = \"my-database\" # @param {type: \"string\"}\n", + "TABLE_NAME = \"message_store\" # @param {type: \"string\"}" + ] + }, + { + "cell_type": "markdown", + "id": "QuQigs4UoFQ2", + "metadata": { + "id": "QuQigs4UoFQ2" + }, + "source": [ + "### MySQLEngine Connection Pool\n", + "\n", + "One of the requirements and arguments to establish Cloud SQL as a ChatMessageHistory memory store is a `MySQLEngine` object. The `MySQLEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n", + "\n", + "To create a `MySQLEngine` using `MySQLEngine.from_instance()` you need to provide only 4 things:\n", + "\n", + "1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n", + "1. `region` : Region where the Cloud SQL instance is located.\n", + "1. `instance` : The name of the Cloud SQL instance.\n", + "1. `database` : The name of the database to connect to on the Cloud SQL instance.\n", + "\n", + "By default, [IAM database authentication](https://cloud.google.com/sql/docs/mysql/iam-authentication#iam-db-auth) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n", + "\n", + "For more informatin on IAM database authentication please see:\n", + "* [Configure an instance for IAM database authentication](https://cloud.google.com/sql/docs/mysql/create-edit-iam-instances)\n", + "* [Manage users with IAM database authentication](https://cloud.google.com/sql/docs/mysql/add-manage-iam-users)\n", + "\n", + "Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/mysql/built-in-authentication) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `MySQLEngine.from_instance()`:\n", + "* `user` : Database user to use for built-in database authentication and login\n", + "* `password` : Database password to use for built-in database authentication and login.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "4576e914a866fb40", + "metadata": { + "ExecuteTime": { + "end_time": "2023-08-28T10:04:38.077748Z", + "start_time": "2023-08-28T10:04:36.105894Z" + }, + "id": "4576e914a866fb40", + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_mysql import MySQLEngine\n", + "\n", + "engine = MySQLEngine.from_instance(\n", + " project_id=PROJECT_ID, region=REGION, instance=INSTANCE, database=DATABASE\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "qPV8WfWr7O54", + "metadata": { + "id": "qPV8WfWr7O54" + }, + "source": [ + "### Initialize a table\n", + "The `MySQLChatMessageHistory` class requires a database table with a specific schema in order to store the chat message history.\n", + "\n", + "The `MySQLEngine` engine has a helper method `init_chat_history_table()` that can be used to create a table with the proper schema for you." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "TEu4VHArRttE", + "metadata": { + "id": "TEu4VHArRttE" + }, + "outputs": [], + "source": [ + "engine.init_chat_history_table(table_name=TABLE_NAME)" + ] + }, + { + "cell_type": "markdown", + "id": "zSYQTYf3UfOi", + "metadata": { + "id": "zSYQTYf3UfOi" + }, + "source": [ + "### MySQLChatMessageHistory\n", + "\n", + "To initialize the `MySQLChatMessageHistory` class you need to provide only 3 things:\n", + "\n", + "1. `engine` - An instance of a `MySQLEngine` engine.\n", + "1. `session_id` - A unique identifier string that specifies an id for the session.\n", + "1. `table_name` : The name of the table within the Cloud SQL database to store the chat message history." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "Kq7RLtfOq0wi", + "metadata": { + "id": "Kq7RLtfOq0wi" + }, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_mysql import MySQLChatMessageHistory\n", + "\n", + "history = MySQLChatMessageHistory(\n", + " engine, session_id=\"test_session\", table_name=TABLE_NAME\n", + ")\n", + "history.add_user_message(\"hi!\")\n", + "history.add_ai_message(\"whats up?\")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "b476688cbb32ba90", + "metadata": { + "ExecuteTime": { + "end_time": "2023-08-28T10:04:38.929396Z", + "start_time": "2023-08-28T10:04:38.915727Z" + }, + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "b476688cbb32ba90", + "jupyter": { + "outputs_hidden": false + }, + "outputId": "a19e5cd8-4225-476a-d28d-e870c6b838bb" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[HumanMessage(content='hi!'), AIMessage(content='whats up?')]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "history.messages" + ] + }, + { + "cell_type": "markdown", + "id": "ss6CbqcTTedr", + "metadata": { + "id": "ss6CbqcTTedr" + }, + "source": [ + "#### Cleaning up\n", + "When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n", + "\n", + "**Note:** Once deleted, the data is no longer stored in Cloud SQL and is gone forever." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "3khxzFxYO7x6", + "metadata": { + "id": "3khxzFxYO7x6" + }, + "outputs": [], + "source": [ + "history.clear()" + ] + }, + { + "cell_type": "markdown", + "id": "2e5337719d5614fd", + "metadata": { + "id": "2e5337719d5614fd" + }, + "source": [ + "## 🔗 Chaining\n", + "\n", + "We can easily combine this message history class with [LCEL Runnables](/docs/expression_language/how_to/message_history)\n", + "\n", + "To do this we will use one of [Google's Vertex AI chat models](https://python.langchain.com/docs/integrations/chat/google_vertex_ai_palm) which requires that you [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com) in your Google Cloud Project.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "hYtHM3-TOMCe", + "metadata": { + "id": "hYtHM3-TOMCe" + }, + "outputs": [], + "source": [ + "# enable Vertex AI API\n", + "!gcloud services enable aiplatform.googleapis.com" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "6558418b-0ece-4d01-9661-56d562d78f7a", + "metadata": { + "id": "6558418b-0ece-4d01-9661-56d562d78f7a" + }, + "outputs": [], + "source": [ + "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n", + "from langchain_core.runnables.history import RunnableWithMessageHistory\n", + "from langchain_google_vertexai import ChatVertexAI" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "82149122-61d3-490d-9bdb-bb98606e8ba1", + "metadata": { + "id": "82149122-61d3-490d-9bdb-bb98606e8ba1" + }, + "outputs": [], + "source": [ + "prompt = ChatPromptTemplate.from_messages(\n", + " [\n", + " (\"system\", \"You are a helpful assistant.\"),\n", + " MessagesPlaceholder(variable_name=\"history\"),\n", + " (\"human\", \"{question}\"),\n", + " ]\n", + ")\n", + "\n", + "chain = prompt | ChatVertexAI(project=PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "2df90853-b67c-490f-b7f8-b69d69270b9c", + "metadata": { + "id": "2df90853-b67c-490f-b7f8-b69d69270b9c" + }, + "outputs": [], + "source": [ + "chain_with_history = RunnableWithMessageHistory(\n", + " chain,\n", + " lambda session_id: MySQLChatMessageHistory(\n", + " engine,\n", + " session_id=session_id,\n", + " table_name=TABLE_NAME,\n", + " ),\n", + " input_messages_key=\"question\",\n", + " history_messages_key=\"history\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "0ce596b8-3b78-48fd-9f92-46dccbbfd58b", + "metadata": { + "id": "0ce596b8-3b78-48fd-9f92-46dccbbfd58b" + }, + "outputs": [], + "source": [ + "# This is where we configure the session id\n", + "config = {\"configurable\": {\"session_id\": \"test_session\"}}" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "38e1423b-ba86-4496-9151-25932fab1a8b", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "38e1423b-ba86-4496-9151-25932fab1a8b", + "outputId": "d5c93570-4b0b-4fe8-d19c-4b361fe74291" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "AIMessage(content=' Hello Bob, how can I help you today?')" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "chain_with_history.invoke({\"question\": \"Hi! I'm bob\"}, config=config)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "2ee4ee62-a216-4fb1-bf33-57476a84cf16", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "2ee4ee62-a216-4fb1-bf33-57476a84cf16", + "outputId": "288fe388-3f60-41b8-8edb-37cfbec18981" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "AIMessage(content=' Your name is Bob.')" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "chain_with_history.invoke({\"question\": \"Whats my name\"}, config=config)" + ] + } + ], + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.1" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/docs/integrations/memory/google_cloud_sql_pg.ipynb b/docs/docs/integrations/memory/google_cloud_sql_pg.ipynb new file mode 100644 index 0000000000000..0cc8e133a81ec --- /dev/null +++ b/docs/docs/integrations/memory/google_cloud_sql_pg.ipynb @@ -0,0 +1,552 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "f22eab3f84cbeb37", + "metadata": { + "id": "f22eab3f84cbeb37" + }, + "source": [ + "# Google Cloud SQL for PostgreSQL\n", + "\n", + "> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers MySQL, PostgreSQL, and SQL Server database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n", + "\n", + "This notebook goes over how to use `Cloud SQL for PostgreSQL` to store chat message history with the `PostgreSQLChatMessageHistory` class." + ] + }, + { + "cell_type": "markdown", + "id": "da400c79-a360-43e2-be60-401fd02b2819", + "metadata": { + "id": "da400c79-a360-43e2-be60-401fd02b2819" + }, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + " * [Create a Cloud SQL for PostgreSQL instance](https://cloud.google.com/sql/docs/postgres/create-instance)\n", + " * [Create a Cloud SQL database](https://cloud.google.com/sql/docs/mysql/create-manage-databases)\n", + " * [Add an IAM database user to the database](https://cloud.google.com/sql/docs/postgres/add-manage-iam-users#creating-a-database-user) (Optional)" + ] + }, + { + "cell_type": "markdown", + "id": "Mm7-fG_LltD7", + "metadata": { + "id": "Mm7-fG_LltD7" + }, + "source": [ + "### 🦜🔗 Library Installation\n", + "The integration lives in its own `langchain-google-cloud-sql-pg` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1VELXvcj8AId", + "metadata": { + "id": "1VELXvcj8AId" + }, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet langchain-google-cloud-sql-pg langchain-google-vertexai" + ] + }, + { + "cell_type": "markdown", + "id": "98TVoM3MNDHu", + "metadata": { + "id": "98TVoM3MNDHu" + }, + "source": [ + "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "v6jBDnYnNM08", + "metadata": { + "id": "v6jBDnYnNM08" + }, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "id": "yygMe6rPWxHS", + "metadata": { + "id": "yygMe6rPWxHS" + }, + "source": [ + "### 🔐 Authentication\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "* If you are using Colab to run this notebook, use the cell below and continue.\n", + "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "PTXN1_DSXj2b", + "metadata": { + "id": "PTXN1_DSXj2b" + }, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "id": "NEvB9BoLEulY", + "metadata": { + "id": "NEvB9BoLEulY" + }, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "gfkS3yVRE4_W", + "metadata": { + "cellView": "form", + "id": "gfkS3yVRE4_W" + }, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "id": "rEWWNoNnKOgq", + "metadata": { + "id": "rEWWNoNnKOgq" + }, + "source": [ + "### 💡 API Enablement\n", + "The `langchain-google-cloud-sql-pg` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5utKIdq7KYi5", + "metadata": { + "id": "5utKIdq7KYi5" + }, + "outputs": [], + "source": [ + "# enable Cloud SQL Admin API\n", + "!gcloud services enable sqladmin.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "id": "f8f2830ee9ca1e01", + "metadata": { + "id": "f8f2830ee9ca1e01" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "id": "OMvzMWRrR6n7", + "metadata": { + "id": "OMvzMWRrR6n7" + }, + "source": [ + "### Set Cloud SQL database values\n", + "Find your database values, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "irl7eMFnSPZr", + "metadata": { + "id": "irl7eMFnSPZr" + }, + "outputs": [], + "source": [ + "# @title Set Your Values Here { display-mode: \"form\" }\n", + "REGION = \"us-central1\" # @param {type: \"string\"}\n", + "INSTANCE = \"my-postgresql-instance\" # @param {type: \"string\"}\n", + "DATABASE = \"my-database\" # @param {type: \"string\"}\n", + "TABLE_NAME = \"message_store\" # @param {type: \"string\"}" + ] + }, + { + "cell_type": "markdown", + "id": "QuQigs4UoFQ2", + "metadata": { + "id": "QuQigs4UoFQ2" + }, + "source": [ + "### PosdtgreSQLEngine Connection Pool\n", + "\n", + "One of the requirements and arguments to establish Cloud SQL as a ChatMessageHistory memory store is a `PostgreSQLEngine` object. The `PostgreSQLEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n", + "\n", + "To create a `PostgreSQLEngine` using `PostgreSQLEngine.from_instance()` you need to provide only 4 things:\n", + "\n", + "1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n", + "1. `region` : Region where the Cloud SQL instance is located.\n", + "1. `instance` : The name of the Cloud SQL instance.\n", + "1. `database` : The name of the database to connect to on the Cloud SQL instance.\n", + "\n", + "By default, [IAM database authentication](https://cloud.google.com/sql/docs/postgres/iam-authentication#iam-db-auth) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n", + "\n", + "For more informatin on IAM database authentication please see:\n", + "* [Configure an instance for IAM database authentication](https://cloud.google.com/sql/docs/postgres/create-edit-iam-instances)\n", + "* [Manage users with IAM database authentication](https://cloud.google.com/sql/docs/postgres/add-manage-iam-users)\n", + "\n", + "Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/postgres/built-in-authentication) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `PostgreSQLEngine.from_instance()`:\n", + "* `user` : Database user to use for built-in database authentication and login\n", + "* `password` : Database password to use for built-in database authentication and login.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "4576e914a866fb40", + "metadata": { + "ExecuteTime": { + "end_time": "2023-08-28T10:04:38.077748Z", + "start_time": "2023-08-28T10:04:36.105894Z" + }, + "id": "4576e914a866fb40", + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_pg import PostgreSQLEngine\n", + "\n", + "engine = PostgreSQLEngine.from_instance(\n", + " project_id=PROJECT_ID, region=REGION, instance=INSTANCE, database=DATABASE\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "qPV8WfWr7O54", + "metadata": { + "id": "qPV8WfWr7O54" + }, + "source": [ + "### Initialize a table\n", + "The `PostgreSQLChatMessageHistory` class requires a database table with a specific schema in order to store the chat message history.\n", + "\n", + "The `PostgreSQLEngine` engine has a helper method `init_chat_history_table()` that can be used to create a table with the proper schema for you." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "TEu4VHArRttE", + "metadata": { + "id": "TEu4VHArRttE" + }, + "outputs": [], + "source": [ + "engine.init_chat_history_table(table_name=TABLE_NAME)" + ] + }, + { + "cell_type": "markdown", + "id": "zSYQTYf3UfOi", + "metadata": { + "id": "zSYQTYf3UfOi" + }, + "source": [ + "### PostgreSQLChatMessageHistory\n", + "\n", + "To initialize the `PostgreSQLChatMessageHistory` class you need to provide only 3 things:\n", + "\n", + "1. `engine` - An instance of a `PostgreSQLEngine` engine.\n", + "1. `session_id` - A unique identifier string that specifies an id for the session.\n", + "1. `table_name` : The name of the table within the Cloud SQL database to store the chat message history." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "Kq7RLtfOq0wi", + "metadata": { + "id": "Kq7RLtfOq0wi" + }, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_pg import PostgreSQLChatMessageHistory\n", + "\n", + "history = PostgreSQLChatMessageHistory.create_sync(\n", + " engine, session_id=\"test_session\", table_name=TABLE_NAME\n", + ")\n", + "history.add_user_message(\"hi!\")\n", + "history.add_ai_message(\"whats up?\")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "b476688cbb32ba90", + "metadata": { + "ExecuteTime": { + "end_time": "2023-08-28T10:04:38.929396Z", + "start_time": "2023-08-28T10:04:38.915727Z" + }, + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "b476688cbb32ba90", + "jupyter": { + "outputs_hidden": false + }, + "outputId": "a19e5cd8-4225-476a-d28d-e870c6b838bb" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[HumanMessage(content='hi!'), AIMessage(content='whats up?')]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "history.messages" + ] + }, + { + "cell_type": "markdown", + "id": "ss6CbqcTTedr", + "metadata": { + "id": "ss6CbqcTTedr" + }, + "source": [ + "#### Cleaning up\n", + "When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n", + "\n", + "**Note:** Once deleted, the data is no longer stored in Cloud SQL and is gone forever." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "3khxzFxYO7x6", + "metadata": { + "id": "3khxzFxYO7x6" + }, + "outputs": [], + "source": [ + "history.clear()" + ] + }, + { + "cell_type": "markdown", + "id": "2e5337719d5614fd", + "metadata": { + "id": "2e5337719d5614fd" + }, + "source": [ + "## 🔗 Chaining\n", + "\n", + "We can easily combine this message history class with [LCEL Runnables](/docs/expression_language/how_to/message_history)\n", + "\n", + "To do this we will use one of [Google's Vertex AI chat models](https://python.langchain.com/docs/integrations/chat/google_vertex_ai_palm) which requires that you [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com) in your Google Cloud Project.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "hYtHM3-TOMCe", + "metadata": { + "id": "hYtHM3-TOMCe" + }, + "outputs": [], + "source": [ + "# enable Vertex AI API\n", + "!gcloud services enable aiplatform.googleapis.com" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "6558418b-0ece-4d01-9661-56d562d78f7a", + "metadata": { + "id": "6558418b-0ece-4d01-9661-56d562d78f7a" + }, + "outputs": [], + "source": [ + "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n", + "from langchain_core.runnables.history import RunnableWithMessageHistory\n", + "from langchain_google_vertexai import ChatVertexAI" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "82149122-61d3-490d-9bdb-bb98606e8ba1", + "metadata": { + "id": "82149122-61d3-490d-9bdb-bb98606e8ba1" + }, + "outputs": [], + "source": [ + "prompt = ChatPromptTemplate.from_messages(\n", + " [\n", + " (\"system\", \"You are a helpful assistant.\"),\n", + " MessagesPlaceholder(variable_name=\"history\"),\n", + " (\"human\", \"{question}\"),\n", + " ]\n", + ")\n", + "\n", + "chain = prompt | ChatVertexAI(project=PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "2df90853-b67c-490f-b7f8-b69d69270b9c", + "metadata": { + "id": "2df90853-b67c-490f-b7f8-b69d69270b9c" + }, + "outputs": [], + "source": [ + "chain_with_history = RunnableWithMessageHistory(\n", + " chain,\n", + " lambda session_id: PostgreSQLChatMessageHistory.create_sync(\n", + " engine,\n", + " session_id=session_id,\n", + " table_name=TABLE_NAME,\n", + " ),\n", + " input_messages_key=\"question\",\n", + " history_messages_key=\"history\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "0ce596b8-3b78-48fd-9f92-46dccbbfd58b", + "metadata": { + "id": "0ce596b8-3b78-48fd-9f92-46dccbbfd58b" + }, + "outputs": [], + "source": [ + "# This is where we configure the session id\n", + "config = {\"configurable\": {\"session_id\": \"test_session\"}}" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "38e1423b-ba86-4496-9151-25932fab1a8b", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "38e1423b-ba86-4496-9151-25932fab1a8b", + "outputId": "d5c93570-4b0b-4fe8-d19c-4b361fe74291" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "AIMessage(content=' Hello Bob, how can I help you today?')" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "chain_with_history.invoke({\"question\": \"Hi! I'm bob\"}, config=config)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "2ee4ee62-a216-4fb1-bf33-57476a84cf16", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "2ee4ee62-a216-4fb1-bf33-57476a84cf16", + "outputId": "288fe388-3f60-41b8-8edb-37cfbec18981" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "AIMessage(content=' Your name is Bob.')" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "chain_with_history.invoke({\"question\": \"Whats my name\"}, config=config)" + ] + } + ], + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.8" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/docs/integrations/memory/google_datastore.ipynb b/docs/docs/integrations/memory/google_datastore.ipynb new file mode 100644 index 0000000000000..512673c5beb4d --- /dev/null +++ b/docs/docs/integrations/memory/google_datastore.ipynb @@ -0,0 +1,259 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google Firestore in Datastore\n", + "\n", + "> [Firestore in Datastore](https://cloud.google.com/datastore) is a serverless document-oriented database that scales to meet any demand. Extend your database application to build AI-powered experiences leveraging Datastore's Langchain integrations.\n", + "\n", + "This notebook goes over how to use [Firestore in Datastore](https://cloud.google.com/datastore) to to store chat message history with the `DatastoreChatMessageHistory` class.\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-datastore-python/blob/main/docs/chat_message_history.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + "* [Create a Datastore database](https://cloud.google.com/datastore/docs/manage-databases)\n", + "\n", + "After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "\n", + "The integration lives in its own `langchain-google-datastore` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%pip install -upgrade --quiet langchain-google-datastore" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🔐 Authentication\n", + "\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "- If you are using Colab to run this notebook, use the cell below and continue.\n", + "- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### API Enablement\n", + "The `langchain-google-datastore` package requires that you [enable the Datastore API](https://console.cloud.google.com/flows/enableapi?apiid=datastore.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# enable Datastore API\n", + "!gcloud services enable datastore.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### DatastoreChatMessageHistory\n", + "\n", + "To initialize the `DatastoreChatMessageHistory` class you need to provide only 3 things:\n", + "\n", + "1. `session_id` - A unique identifier string that specifies an id for the session.\n", + "1. `collection` : The single `/`-delimited path to a Datastore collection." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_datastore import DatastoreChatMessageHistory\n", + "\n", + "chat_history = DatastoreChatMessageHistory(\n", + " session_id=\"user-session-id\", collection=\"HistoryMessages\"\n", + ")\n", + "\n", + "chat_history.add_user_message(\"Hi!\")\n", + "chat_history.add_ai_message(\"How can I help you?\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "chat_history.messages" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Cleaning up\n", + "When the history of a specific session is obsolete and can be deleted from the database and memory, it can be done the following way.\n", + "\n", + "**Note:** Once deleted, the data is no longer stored in Datastore and is gone forever." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "chat_history.clear()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Custom Client\n", + "\n", + "The client is created by default using the available environment variables. A [custom client](https://cloud.google.com/python/docs/reference/datastore/latest/client) can be passed to the constructor." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.auth import compute_engine\n", + "from google.cloud import datastore\n", + "\n", + "client = datastore.Client(\n", + " project=\"project-custom\",\n", + " database=\"non-default-database\",\n", + " credentials=compute_engine.Credentials(),\n", + ")\n", + "\n", + "history = DatastoreChatMessageHistory(\n", + " session_id=\"session-id\", collection=\"History\", client=client\n", + ")\n", + "\n", + "history.add_user_message(\"New message\")\n", + "\n", + "history.messages\n", + "\n", + "history.clear()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/memory/google_firestore.ipynb b/docs/docs/integrations/memory/google_firestore.ipynb new file mode 100644 index 0000000000000..82a725a87f23a --- /dev/null +++ b/docs/docs/integrations/memory/google_firestore.ipynb @@ -0,0 +1,259 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google Firestore (Native Mode)\n", + "\n", + "> [Firestore](https://cloud.google.com/firestore) is a serverless document-oriented database that scales to meet any demand. Extend your database application to build AI-powered experiences leveraging Firestore's Langchain integrations.\n", + "\n", + "This notebook goes over how to use [Firestore](https://cloud.google.com/firestore) to to store chat message history with the `FirestoreChatMessageHistory` class.\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-firestore-python/blob/main/docs/chat_message_history.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + "* [Create a Firestore database](https://cloud.google.com/firestore/docs/manage-databases)\n", + "\n", + "After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "\n", + "The integration lives in its own `langchain-google-firestore` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%pip install -upgrade --quiet langchain-google-firestore" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🔐 Authentication\n", + "\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "- If you are using Colab to run this notebook, use the cell below and continue.\n", + "- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### API Enablement\n", + "The `langchain-google-firestore` package requires that you [enable the Firestore Admin API](https://console.cloud.google.com/flows/enableapi?apiid=firestore.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# enable Firestore Admin API\n", + "!gcloud services enable firestore.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### FirestoreChatMessageHistory\n", + "\n", + "To initialize the `FirestoreChatMessageHistory` class you need to provide only 3 things:\n", + "\n", + "1. `session_id` - A unique identifier string that specifies an id for the session.\n", + "1. `collection` : The single `/`-delimited path to a Firestore collection." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_firestore import FirestoreChatMessageHistory\n", + "\n", + "chat_history = FirestoreChatMessageHistory(\n", + " session_id=\"user-session-id\", collection=\"HistoryMessages\"\n", + ")\n", + "\n", + "chat_history.add_user_message(\"Hi!\")\n", + "chat_history.add_ai_message(\"How can I help you?\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "chat_history.messages" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Cleaning up\n", + "When the history of a specific session is obsolete and can be deleted from the database and memory, it can be done the following way.\n", + "\n", + "**Note:** Once deleted, the data is no longer stored in Firestore and is gone forever." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "chat_history.clear()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Custom Client\n", + "\n", + "The client is created by default using the available environment variables. A [custom client](https://cloud.google.com/python/docs/reference/firestore/latest/client) can be passed to the constructor." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.auth import compute_engine\n", + "from google.cloud import firestore\n", + "\n", + "client = firestore.Client(\n", + " project=\"project-custom\",\n", + " database=\"non-default-database\",\n", + " credentials=compute_engine.Credentials(),\n", + ")\n", + "\n", + "history = FirestoreChatMessageHistory(\n", + " session_id=\"session-id\", collection=\"History\", client=client\n", + ")\n", + "\n", + "history.add_user_message(\"New message\")\n", + "\n", + "history.messages\n", + "\n", + "history.clear()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/memory/google_memorystore_redis.ipynb b/docs/docs/integrations/memory/google_memorystore_redis.ipynb new file mode 100644 index 0000000000000..b247997ca590e --- /dev/null +++ b/docs/docs/integrations/memory/google_memorystore_redis.ipynb @@ -0,0 +1,233 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "6-0_o3DxsFGi" + }, + "source": [ + "# Google Memorystore for Redis\n", + "\n", + "> [Google Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) is a fully-managed service that is powered by the Redis in-memory data store to build application caches that provide sub-millisecond data access. Extend your database application to build AI-powered experiences leveraging Memorystore for Redis's Langchain integrations.\n", + "\n", + "This notebook goes over how to use [Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) to store chat message history with the `MemorystoreChatMessageHistory` class.\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-memorystore-redis-python/blob/main/docs/chat_message_history.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + "* [Create a Memorystore for Redis instance](https://cloud.google.com/memorystore/docs/redis/create-instance-console). Ensure that the version is greater than or equal to 5.0.\n", + "\n", + "After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please specify an endpoint associated with the instance or demo purpose.\n", + "ENDPOINT = \"redis://127.0.0.1:6379\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "\n", + "The integration lives in its own `langchain-google-memorystore-redis` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "iLwVMVkYsFGk", + "tags": [] + }, + "outputs": [], + "source": [ + "%pip install -upgrade --quiet langchain-google-memorystore-redis" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🔐 Authentication\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "* If you are using Colab to run this notebook, use the cell below and continue.\n", + "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2L7kMu__sFGl" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A2fT1iEhsFGl" + }, + "source": [ + "### MemorystoreChatMessageHistory\n", + "\n", + "To initialize the `MemorystoreMessageHistory` class you need to provide only 2 things:\n", + "\n", + "1. `redis_client` - An instance of a Memorystore Redis.\n", + "1. `session_id` - Each chat message history object must have a unique session ID. If the session ID already has messages stored in Redis, they will can be retrieved." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YEDKWR6asFGl" + }, + "outputs": [], + "source": [ + "import redis\n", + "from langchain_google_memorystore_redis import MemorystoreChatMessageHistory\n", + "\n", + "# Connect to a Memorystore for Redis instance\n", + "redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n", + "\n", + "message_history = MemorystoreChatMessageHistory(redis_client, session_id=\"session1\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BvS3UFsysFGm" + }, + "outputs": [], + "source": [ + "message_history.messages" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sFJdt3ubsFGo" + }, + "source": [ + "#### Cleaning up\n", + "\n", + "When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n", + "\n", + "**Note:** Once deleted, the data is no longer stored in Memorystore for Redis and is gone forever." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "H5I7K3MTsFGo" + }, + "outputs": [], + "source": [ + "message_history.clear()" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.6" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/docs/docs/integrations/memory/google_spanner.ipynb b/docs/docs/integrations/memory/google_spanner.ipynb new file mode 100644 index 0000000000000..df83d84b22f83 --- /dev/null +++ b/docs/docs/integrations/memory/google_spanner.ipynb @@ -0,0 +1,332 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "# Google Spanner\n", + "> [Cloud Spanner](https://cloud.google.com/spanner) is a highly scalable database that combines unlimited scalability with relational semantics, such as secondary indexes, strong consistency, schemas, and SQL providing 99.999% availability in one easy solution.\n", + "\n", + "This notebook goes over how to use `Spanner` to store chat message history with the `SpannerChatMessageHistory` class." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + " * [Create a Spanner instance](https://cloud.google.com/spanner/docs/create-manage-instances)\n", + " * [Create a Spanner database](https://cloud.google.com/spanner/docs/create-manage-databases)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "The integration lives in its own `langchain-google-spanner` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet langchain-google-spanner" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "id": "yygMe6rPWxHS", + "metadata": { + "id": "yygMe6rPWxHS" + }, + "source": [ + "### 🔐 Authentication\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "* If you are using Colab to run this notebook, use the cell below and continue.\n", + "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "PTXN1_DSXj2b", + "metadata": { + "id": "PTXN1_DSXj2b" + }, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "id": "NEvB9BoLEulY", + "metadata": { + "id": "NEvB9BoLEulY" + }, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "gfkS3yVRE4_W", + "metadata": { + "cellView": "form", + "id": "gfkS3yVRE4_W" + }, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "id": "rEWWNoNnKOgq", + "metadata": { + "id": "rEWWNoNnKOgq" + }, + "source": [ + "### 💡 API Enablement\n", + "The `langchain-google-spanner` package requires that you [enable the Spanner API](https://console.cloud.google.com/flows/enableapi?apiid=spanner.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5utKIdq7KYi5", + "metadata": { + "id": "5utKIdq7KYi5" + }, + "outputs": [], + "source": [ + "# enable Spanner API\n", + "!gcloud services enable spanner.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "id": "f8f2830ee9ca1e01", + "metadata": { + "id": "f8f2830ee9ca1e01" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "id": "OMvzMWRrR6n7", + "metadata": { + "id": "OMvzMWRrR6n7" + }, + "source": [ + "### Set Spanner database values\n", + "Find your database values, in the [Spanner Instances page](https://console.cloud.google.com/spanner/instances)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "irl7eMFnSPZr", + "metadata": { + "id": "irl7eMFnSPZr" + }, + "outputs": [], + "source": [ + "# @title Set Your Values Here { display-mode: \"form\" }\n", + "INSTANCE = \"my-instance\" # @param {type: \"string\"}\n", + "DATABASE = \"my-database\" # @param {type: \"string\"}\n", + "TABLE_NAME = \"message_store\" # @param {type: \"string\"}" + ] + }, + { + "cell_type": "markdown", + "id": "qPV8WfWr7O54", + "metadata": { + "id": "qPV8WfWr7O54" + }, + "source": [ + "### Initialize a table\n", + "The `SpannerChatMessageHistory` class requires a database table with a specific schema in order to store the chat message history.\n", + "\n", + "The helper method `init_chat_history_table()` that can be used to create a table with the proper schema for you." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "TEu4VHArRttE", + "metadata": { + "id": "TEu4VHArRttE" + }, + "outputs": [], + "source": [ + "from langchain_google_spanner import (\n", + " SpannerChatMessageHistory,\n", + ")\n", + "\n", + "SpannerChatMessageHistory.init_chat_history_table(table_name=TABLE_NAME)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### SpannerChatMessageHistory\n", + "\n", + "To initialize the `SpannerChatMessageHistory` class you need to provide only 3 things:\n", + "\n", + "1. `instance_id` - The name of the Spanner instance\n", + "1. `database_id` - The name of the Spanner database\n", + "1. `session_id` - A unique identifier string that specifies an id for the session.\n", + "1. `table_name` - The name of the table within the database to store the chat message history." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "message_history = SpannerChatMessageHistory(\n", + " instance_id=INSTANCE,\n", + " database_id=DATABASE,\n", + " table_name=TABLE_NAME,\n", + " session_id=\"user-session-id\",\n", + ")\n", + "\n", + "message_history.add_user_message(\"hi!\")\n", + "message_history.add_ai_message(\"whats up?\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "message_history.messages" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Custom client\n", + "The client created by default is the default client. To use a non-default, a [custom client](https://cloud.google.com/spanner/docs/samples/spanner-create-client-with-query-options#spanner_create_client_with_query_options-python) can be passed to the constructor." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.cloud import spanner\n", + "\n", + "custom_client_message_history = SpannerChatMessageHistory(\n", + " instance_id=\"my-instance\",\n", + " database_id=\"my-database\",\n", + " client=spanner.Client(...),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cleaning up\n", + "\n", + "When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n", + "Note: Once deleted, the data is no longer stored in Cloud Spanner and is gone forever." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "message_history = SpannerChatMessageHistory(\n", + " instance_id=INSTANCE,\n", + " database_id=DATABASE,\n", + " table_name=TABLE_NAME,\n", + " session_id=\"user-session-id\",\n", + ")\n", + "\n", + "message_history.clear()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/platforms/google.mdx b/docs/docs/integrations/platforms/google.mdx index 6e05e143d9c22..95bfeccedd00c 100644 --- a/docs/docs/integrations/platforms/google.mdx +++ b/docs/docs/integrations/platforms/google.mdx @@ -121,6 +121,67 @@ from langchain_google_vertexai import VertexAIModelGarden ## Vector Stores +### AlloyDB for PostgreSQL + +> [AlloyDB](https://cloud.google.com/alloydb) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability on Google Cloud. AlloyDB is 100% compatible with PostgreSQL. + +Install the python package: + +```bash +pip install langchain-google-alloydb-pg +``` + +See [usage example](/docs/integrations/vectorstores/google_alloydb). + +```python +from langchain_google_alloydb_pg import AlloyDBEngine, AlloyDBVectorStore +``` + +### Cloud SQL for PostgreSQL + +> [Cloud SQL for PostgreSQL](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your PostgreSQL relational databases on Google Cloud. +Install the python package: + +```bash +pip install langchain-google-cloud-sql-pg +``` + +See [usage example](docs/docs/integrations/vectorstores/google_cloud_sql_pg). + +```python +from langchain_google_cloud_sql_pg import PostgreSQLEngine, PostgresVectorStore +``` + +### Spanner + +> [Spanner](https://cloud.google.com/spanner/docs) is a fully managed, mission-critical, relational database service on Google Cloud that offers transactional consistency at global scale, automatic, synchronous replication for high availability, and support for two SQL dialects: GoogleSQL (ANSI 2011 with extensions) and PostgreSQL. +Install the python package: + +```bash +pip install langchain-google-spanner +``` + +See [usage example](docs/docs/integrations/vectorstores/google_spanner). + +```python +from langchain_google_spanner import SpannerVectorStore +``` + +### Memorystore for Redis + +> [Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis) is a fully managed Redis service for Google Cloud. Applications running on Google Cloud can achieve extreme performance by leveraging the highly scalable, available, secure Redis service without the burden of managing complex Redis deployments. +Install the python package: + +```bash +pip install langchain-google-memorystore-redis +``` + +See [usage example](docs/docs/integrations/vectorstores/google_memorystore_redis). + +```python +from langchain_google_memorystore_redis import RedisVectorStore +``` + ### Vertex AI Vector Search > [Vertex AI Vector Search](https://cloud.google.com/vertex-ai/docs/matching-engine/overview) from Google Cloud, @@ -239,6 +300,143 @@ documents = docai_wh_retriever.get_relevant_documents( ## Document Loaders +### AlloyDB for PostgreSQL + +> [AlloyDB](https://cloud.google.com/alloydb) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability on Google Cloud. AlloyDB is 100% compatible with PostgreSQL. + +Install the python package: + +```bash +pip install langchain-google-alloydb-pg +``` + +See [usage example](/docs/integrations/document_loaders/google_alloydb). + +```python +from langchain_google_alloydb_pg import AlloyDBEngine, AlloyDBLoader +``` + +### Cloud SQL for PostgreSQL + +> [Cloud SQL for PostgreSQL](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your PostgreSQL relational databases on Google Cloud. +Install the python package: + +```bash +pip install langchain-google-cloud-sql-pg +``` + +See [usage example](docs/docs/integrations/document_loaders/google_cloud_sql_pg). + +```python +from langchain_google_cloud_sql_pg import PostgreSQLEngine, PostgreSQLLoader +``` + +### Cloud SQL for MySQL + +> [Cloud SQL for MySQL](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your MySQL relational databases on Google Cloud. +Install the python package: + +```bash +pip install langchain-google-cloud-sql-mysql +``` + +See [usage example](docs/docs/integrations/document_loader/cloud_sql_mysql). + +```python +from langchain_google_cloud_sql_mysql import MySQLEngine, MySQLDocumentLoader +``` + +### Cloud SQL for SQL Server + +> [Cloud SQL for SQL Server](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your SQL Server databases on Google Cloud. +Install the python package: + +```bash +pip install langchain-google-cloud-sql-mssql +``` + +See [usage example](docs/docs/integrations/document_loaders/google_cloud_sql_mssql). + +```python +from langchain_google_cloud_sql_mssql import MSSQLEngine, MSSQLLoader +``` + +### Bigtable + +> [Bigtable](https://cloud.google.com/bigtable/docs) is Google's fully managed NoSQL Big Data database service in Google Cloud. +Install the python package: + +```bash +pip install langchain-google-bigtable +``` + +See [usage example](docs/docs/integrations/document_loaders/google_bigtable). + +```python +from langchain_google_bigtable import BigtableLoader +``` + +### Spanner + +> [Spanner](https://cloud.google.com/spanner/docs) is a fully managed, mission-critical, relational database service on Google Cloud that offers transactional consistency at global scale, automatic, synchronous replication for high availability, and support for two SQL dialects: GoogleSQL (ANSI 2011 with extensions) and PostgreSQL. +Install the python package: + +```bash +pip install langchain-google-spanner +``` + +See [usage example](docs/docs/integrations/document_loaders/google_spanner). + +```python +from langchain_google_spanner import SpannerLoader +``` + +### Memorystore for Redis + +> [Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis) is a fully managed Redis service for Google Cloud. Applications running on Google Cloud can achieve extreme performance by leveraging the highly scalable, available, secure Redis service without the burden of managing complex Redis deployments. +Install the python package: + +```bash +pip install langchain-google-memorystore-redis +``` + +See [usage example](docs/docs/integrations/document_loaders/google_memorystore_redis). + +```python +from langchain_google_memorystore_redis import MemorystoreLoader +``` + +### Firestore (Native Mode) + +> [Firestore](https://cloud.google.com/firestore/docs/) is a NoSQL document database built for automatic scaling, high performance, and ease of application development. +Install the python package: + +```bash +pip install langchain-google-firestore +``` + +See [usage example](docs/docs/integrations/document_loader/google_firestore). + +```python +from langchain_google_firestore import FirestoreLoader +``` + +### Firestore in Datastore Mode + +> [Firestore in Datastore mode](https://cloud.google.com/datastore/docs) is a NoSQL document database built for automatic scaling, high performance, and ease of application development. +> Firestore is the newest version of Datastore and introduces several improvements over Datastore. +Install the python package: + +```bash +pip install langchain-google-datastore +``` + +See [usage example](docs/docs/integrations/document_loaders/google_datastore). + +```python +from langchain_google_datastore import DatastoreLoader +``` + ### BigQuery > [BigQuery](https://cloud.google.com/bigquery) is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data in Google Cloud. @@ -521,20 +719,142 @@ from langchain_community.agent_toolkits import GmailToolkit ## Memory -### Firestore +### AlloyDB for PostgreSQL -> [`Firestore`](https://cloud.google.com/firestore) is a NoSQL document database built for automatic scaling, high performance, and ease of application development in Google Cloud. +> [AlloyDB](https://cloud.google.com/alloydb) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability on Google Cloud. AlloyDB is 100% compatible with PostgreSQL. -First, we need to install the python package. +Install the python package: + +```bash +pip install langchain-google-alloydb-pg +``` + +See [usage example](/docs/integrations/memory/google_alloydb). + +```python +from langchain_google_alloydb_pg import AlloyDBEngine, AlloyDBChatMessageHistory +``` + +### Cloud SQL for PostgreSQL + +> [Cloud SQL for PostgreSQL](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your PostgreSQL relational databases on Google Cloud. +Install the python package: + +```bash +pip install langchain-google-cloud-sql-pg +``` + +See [usage example](docs/docs/integrations/memory/google_cloud_sql_pg). + + +```python +from langchain_google_cloud_sql_pg import PostgreSQLEngine, PostgreSQLChatMessageHistory +``` + +### Cloud SQL for MySQL + +> [Cloud SQL for MySQL](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your MySQL relational databases on Google Cloud. +Install the python package: + +```bash +pip install langchain-google-cloud-sql-mysql +``` + +See [usage example](docs/docs/integrations/memory/google_cloud_sql_mysql). + +```python +from langchain_google_cloud_sql_mysql import MySQLEngine, MySQLChatMessageHistory +``` + +### Cloud SQL for SQL Server + +> [Cloud SQL for SQL Server](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your SQL Server databases on Google Cloud. +Install the python package: + +```bash +pip install langchain-google-cloud-sql-mssql +``` + +See [usage example](docs/docs/integrations/memory/google_cloud_sql_mssql). + +```python +from langchain_google_cloud_sql_mssql import MSSQLEngine, MSSQLChatMessageHistory +``` + +### Spanner + +> [Spanner](https://cloud.google.com/spanner/docs) is a fully managed, mission-critical, relational database service on Google Cloud that offers transactional consistency at global scale, automatic, synchronous replication for high availability, and support for two SQL dialects: GoogleSQL (ANSI 2011 with extensions) and PostgreSQL. +Install the python package: + +```bash +pip install langchain-google-spanner +``` + +See [usage example](docs/docs/integrations/memory/google_spanner). + +```python +from langchain_google_spanner import SpannerChatMessageHistory +``` + +### Memorystore for Redis + +> [Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis) is a fully managed Redis service for Google Cloud. Applications running on Google Cloud can achieve extreme performance by leveraging the highly scalable, available, secure Redis service without the burden of managing complex Redis deployments. +Install the python package: + +```bash +pip install langchain-google-memorystore-redis +``` + +See [usage example](docs/docs/integrations/document_loaders/google_memorystore_redis). + +```python +from langchain_google_memorystore_redis import MemorystoreChatMessageHistory +``` + +### Bigtable + +> [Bigtable](https://cloud.google.com/bigtable/docs) is Google's fully managed NoSQL Big Data database service in Google Cloud. +Install the python package: + +```bash +pip install langchain-google-bigtable +``` + +See [usage example](docs/docs/integrations/memory/google_bigtable). + +```python +from langchain_google_bigtable import BigtableChatMessageHistory +``` + +### Firestore (Native Mode) + +> [Firestore](https://cloud.google.com/firestore/docs/) is a NoSQL document database built for automatic scaling, high performance, and ease of application development. +Install the python package: + +```bash +pip install langchain-google-firestore +``` + +See [usage example](docs/docs/integrations/memory/google_firestore). + +```python +from langchain_google_firestore import FirestoreChatMessageHistory +``` + +### Firestore in Datastore Mode + +> [Firestore in Datastore mode](https://cloud.google.com/datastore/docs) is a NoSQL document database built for automatic scaling, high performance, and ease of application development. +> Firestore is the newest version of Datastore and introduces several improvements over Datastore. +Install the python package: ```bash -pip install firebase-admin +pip install langchain-google-datastore ``` -See a [usage example and authorization instructions](/docs/integrations/memory/firestore_chat_message_history). +See [usage example](docs/docs/integrations/memory/google_datastore). ```python -from langchain_community.chat_message_histories.firestore import FirestoreChatMessageHistory +from langchain_google_datastore import DatastoreChatMessageHistory ``` ## Chat Loaders diff --git a/docs/docs/integrations/vectorstores/google_alloydb.ipynb b/docs/docs/integrations/vectorstores/google_alloydb.ipynb new file mode 100644 index 0000000000000..f13417004f4ae --- /dev/null +++ b/docs/docs/integrations/vectorstores/google_alloydb.ipynb @@ -0,0 +1,566 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google AlloyDB for PostgreSQL\n", + "\n", + "> [AlloyDB](https://cloud.google.com/alloydb) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. AlloyDB is 100% compatible with PostgreSQL. Extend your database application to build AI-powered experiences leveraging AlloyDB's Langchain integrations.\n", + "\n", + "This notebook goes over how to use `AlloyDB for PostgreSQL` to store vector embeddings with the `AlloyDBVectorStore` class." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before you begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + " * [Enable the AlloyDB Admin API.](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com)\n", + " * [Create a AlloyDB cluster and instance.](https://cloud.google.com/alloydb/docs/cluster-create)\n", + " * [Create a AlloyDB database.](https://cloud.google.com/alloydb/docs/quickstart/create-and-connect)\n", + " * [Add a User to the database.](https://cloud.google.com/alloydb/docs/database-users/about)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IR54BmgvdHT_" + }, + "source": [ + "### 🦜🔗 Library Installation\n", + "Install the integration library, `langchain-google-alloydb-pg`, and the library for the embedding service, `langchain-google-vertexai`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "0ZITIDE160OD", + "outputId": "e184bc0d-6541-4e0a-82d2-1e216db00a2d" + }, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet langchain-google-alloydb-pg langchain-google-vertexai" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v40bB_GMcr9f" + }, + "source": [ + "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "v6jBDnYnNM08", + "metadata": { + "id": "v6jBDnYnNM08" + }, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "id": "yygMe6rPWxHS", + "metadata": { + "id": "yygMe6rPWxHS" + }, + "source": [ + "### 🔐 Authentication\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "* If you are using Colab to run this notebook, use the cell below and continue.\n", + "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "PTXN1_DSXj2b", + "metadata": { + "id": "PTXN1_DSXj2b" + }, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "id": "NEvB9BoLEulY", + "metadata": { + "id": "NEvB9BoLEulY" + }, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "gfkS3yVRE4_W", + "metadata": { + "cellView": "form", + "id": "gfkS3yVRE4_W" + }, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "id": "rEWWNoNnKOgq", + "metadata": { + "id": "rEWWNoNnKOgq" + }, + "source": [ + "### 💡 API Enablement\n", + "The `langchain-google-alloydb-pg` package requires that you [enable the AlloyDB Admin API](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5utKIdq7KYi5", + "metadata": { + "id": "5utKIdq7KYi5" + }, + "outputs": [], + "source": [ + "# enable AlloyDB Admin API\n", + "!gcloud services enable alloydb.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "id": "f8f2830ee9ca1e01", + "metadata": { + "id": "f8f2830ee9ca1e01" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "id": "OMvzMWRrR6n7", + "metadata": { + "id": "OMvzMWRrR6n7" + }, + "source": [ + "### Set AlloyDB database values\n", + "Find your database values, in the [AlloyDB Instances page](https://console.cloud.google.com/alloydb/clusters)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "irl7eMFnSPZr", + "metadata": { + "id": "irl7eMFnSPZr" + }, + "outputs": [], + "source": [ + "# @title Set Your Values Here { display-mode: \"form\" }\n", + "REGION = \"us-central1\" # @param {type: \"string\"}\n", + "CLUSTER = \"my-cluster\" # @param {type: \"string\"}\n", + "INSTANCE = \"my-primary\" # @param {type: \"string\"}\n", + "DATABASE = \"my-database\" # @param {type: \"string\"}\n", + "TABLE_NAME = \"vector_store\" # @param {type: \"string\"}" + ] + }, + { + "cell_type": "markdown", + "id": "QuQigs4UoFQ2", + "metadata": { + "id": "QuQigs4UoFQ2" + }, + "source": [ + "### AlloyDBEngine Connection Pool\n", + "\n", + "One of the requirements and arguments to establish AlloyDB as a vector store is a `AlloyDBEngine` object. The `AlloyDBEngine` configures a connection pool to your AlloyDB database, enabling successful connections from your application and following industry best practices.\n", + "\n", + "To create a `AlloyDBEngine` using `AlloyDBEngine.from_instance()` you need to provide only 4 things:\n", + "\n", + "1. `project_id` : Project ID of the Google Cloud Project where the AlloyDB instance is located.\n", + "1. `region` : Region where the AlloyDB instance is located.\n", + "1. `cluster`: The name of the AlloyDB cluster.\n", + "1. `instance` : The name of the AlloyDB instance.\n", + "1. `database` : The name of the database to connect to on the AlloyDB instance.\n", + "\n", + "By default, [IAM database authentication](https://cloud.google.com/alloydb/docs/connect-iam) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the environment.\n", + "\n", + "Optionally, [built-in database authentication](https://cloud.google.com/alloydb/docs/database-users/about) using a username and password to access the AlloyDB database can also be used. Just provide the optional `user` and `password` arguments to `AlloyDBEngine.from_instance()`:\n", + "* `user` : Database user to use for built-in database authentication and login\n", + "* `password` : Database password to use for built-in database authentication and login.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note:** This tutorial demonstrates the async interface. All async methods have corresponding sync methods." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_alloydb_pg import AlloyDBEngine\n", + "\n", + "engine = await AlloyDBEngine.afrom_instance(\n", + " project_id=PROJECT_ID,\n", + " region=REGION,\n", + " cluster=CLUSTER,\n", + " instance=INSTANCE,\n", + " database=DATABASE,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "D9Xs2qhm6X56" + }, + "source": [ + "### Initialize a table\n", + "The `AlloyDBVectorStore` class requires a database table. The `AlloyDBEngine` engine has a helper method `init_vectorstore_table()` that can be used to create a table with the proper schema for you." + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "id": "avlyHEMn6gzU" + }, + "outputs": [], + "source": [ + "await engine.ainit_vectorstore_table(\n", + " table_name=TABLE_NAME,\n", + " vector_size=768, # Vector size for VertexAI model(textembedding-gecko@latest)\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an embedding class instance\n", + "\n", + "You can use any [LangChain embeddings model](https://python.langchain.com/docs/integrations/text_embedding/).\n", + "You may need to enable Vertex AI API to use `VertexAIEmbeddings`. We recommend setting the embedding model's version for production, learn more about the [Text embeddings models](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/text-embeddings)." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5utKIdq7KYi5", + "metadata": { + "id": "5utKIdq7KYi5" + }, + "outputs": [], + "source": [ + "# enable Vertex AI API\n", + "!gcloud services enable aiplatform.googleapis.com" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Vb2RJocV9_LQ", + "outputId": "37f5dc74-2512-47b2-c135-f34c10afdcf4" + }, + "outputs": [], + "source": [ + "from langchain_google_vertexai import VertexAIEmbeddings\n", + "\n", + "embedding = VertexAIEmbeddings(\n", + " model_name=\"textembedding-gecko@latest\", project=PROJECT_ID\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e1tl0aNx7SWy" + }, + "source": [ + "### Initialize a default AlloyDBVectorStore" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "z-AZyzAQ7bsf" + }, + "outputs": [], + "source": [ + "from langchain_google_alloydb_pg import AlloyDBVectorStore\n", + "\n", + "store = await AlloyDBVectorStore.create(\n", + " engine=engine,\n", + " table_name=TABLE_NAME,\n", + " embedding_service=embedding,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Add texts" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import uuid\n", + "\n", + "all_texts = [\"Apples and oranges\", \"Cars and airplanes\", \"Pineapple\", \"Train\", \"Banana\"]\n", + "metadatas = [{\"len\": len(t)} for t in all_texts]\n", + "ids = [str(uuid.uuid4()) for _ in all_texts]\n", + "\n", + "await store.aadd_texts(all_texts, metadatas=metadatas, ids=ids)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete texts" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "await store.adelete([ids[1]])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Search for documents" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "query = \"I'd like a fruit.\"\n", + "docs = await store.asimilarity_search(query)\n", + "print(docs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Search for documents by vector" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "query_vector = embedding.embed_query(query)\n", + "docs = await store.asimilarity_search_by_vector(query_vector, k=2)\n", + "print(docs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Add a Index\n", + "Speed up vector search queries by applying a vector index. Learn more about [vector indexes](https://cloud.google.com/blog/products/databases/faster-similarity-search-performance-with-pgvector-indexes)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_alloydb_pg.indexes import IVFFlatIndex\n", + "\n", + "index = IVFFlatIndex()\n", + "await store.aapply_vector_index(index)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Re-index" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "await store.areindex() # Re-index using default index name" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Remove an index" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "await store.adrop_vector_index() # Delete index using default name" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a custom Vector Store\n", + "A Vector Store can take advantage of relational data to filter similarity searches.\n", + "\n", + "Create a table with custom metadata columns." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_alloydb_pg import Column\n", + "\n", + "# Set table name\n", + "TABLE_NAME = \"vectorstore_custom\"\n", + "\n", + "await engine.ainit_vectorstore_table(\n", + " table_name=TABLE_NAME,\n", + " vector_size=768, # VertexAI model: textembedding-gecko@latest\n", + " metadata_columns=[Column(\"len\", \"INTEGER\")],\n", + ")\n", + "\n", + "\n", + "# Initialize AlloyDBVectorStore\n", + "custom_store = await AlloyDBVectorStore.create(\n", + " engine=engine,\n", + " table_name=TABLE_NAME,\n", + " embedding_service=embedding,\n", + " metadata_columns=[\"len\"],\n", + " # Connect to a existing VectorStore by customizing the table schema:\n", + " # id_column=\"uuid\",\n", + " # content_column=\"documents\",\n", + " # embedding_column=\"vectors\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Search for documents with metadata filter" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import uuid\n", + "\n", + "# Add texts to the Vector Store\n", + "all_texts = [\"Apples and oranges\", \"Cars and airplanes\", \"Pineapple\", \"Train\", \"Banana\"]\n", + "metadatas = [{\"len\": len(t)} for t in all_texts]\n", + "ids = [str(uuid.uuid4()) for _ in all_texts]\n", + "await store.aadd_texts(all_texts, metadatas=metadatas, ids=ids)\n", + "\n", + "# Use filter on search\n", + "docs = await custom_store.asimilarity_search_by_vector(query_vector, filter=\"len >= 6\")\n", + "\n", + "print(docs)" + ] + } + ], + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.5" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/docs/docs/integrations/vectorstores/google_cloud_sql_pg.ipynb b/docs/docs/integrations/vectorstores/google_cloud_sql_pg.ipynb new file mode 100644 index 0000000000000..509b1b985a93f --- /dev/null +++ b/docs/docs/integrations/vectorstores/google_cloud_sql_pg.ipynb @@ -0,0 +1,566 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google Cloud SQL for PostgreSQL\n", + "\n", + "> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers PostgreSQL, PostgreSQL, and SQL Server database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n", + "\n", + "This notebook goes over how to use `Cloud SQL for PostgreSQL` to store vector embeddings with the `PostgresVectorStore` class." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before you begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + " * [Enable the Cloud SQL Admin API.](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com)\n", + " * [Create a Cloud SQL instance.](https://cloud.google.com/sql/docs/postgres/connect-instance-auth-proxy#create-instance)\n", + " * [Create a Cloud SQL database.](https://cloud.google.com/sql/docs/postgres/create-manage-databases)\n", + " * [Add a User to the database.](https://cloud.google.com/sql/docs/postgres/create-manage-users)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IR54BmgvdHT_" + }, + "source": [ + "### 🦜🔗 Library Installation\n", + "Install the integration library, `langchain-google-cloud-sql-pg`, and the library for the embedding service, `langchain-google-vertexai`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "0ZITIDE160OD", + "outputId": "e184bc0d-6541-4e0a-82d2-1e216db00a2d" + }, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet langchain-google-cloud-sql-pg langchain-google-vertexai" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v40bB_GMcr9f" + }, + "source": [ + "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "v6jBDnYnNM08", + "metadata": { + "id": "v6jBDnYnNM08" + }, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "id": "yygMe6rPWxHS", + "metadata": { + "id": "yygMe6rPWxHS" + }, + "source": [ + "### 🔐 Authentication\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "* If you are using Colab to run this notebook, use the cell below and continue.\n", + "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "PTXN1_DSXj2b", + "metadata": { + "id": "PTXN1_DSXj2b" + }, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "id": "NEvB9BoLEulY", + "metadata": { + "id": "NEvB9BoLEulY" + }, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "gfkS3yVRE4_W", + "metadata": { + "cellView": "form", + "id": "gfkS3yVRE4_W" + }, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "id": "rEWWNoNnKOgq", + "metadata": { + "id": "rEWWNoNnKOgq" + }, + "source": [ + "### 💡 API Enablement\n", + "The `langchain-google-cloud-sql-pg` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5utKIdq7KYi5", + "metadata": { + "id": "5utKIdq7KYi5" + }, + "outputs": [], + "source": [ + "# enable Cloud SQL Admin API\n", + "!gcloud services enable sqladmin.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "id": "f8f2830ee9ca1e01", + "metadata": { + "id": "f8f2830ee9ca1e01" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "id": "OMvzMWRrR6n7", + "metadata": { + "id": "OMvzMWRrR6n7" + }, + "source": [ + "### Set Cloud SQL database values\n", + "Find your database values, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "irl7eMFnSPZr", + "metadata": { + "id": "irl7eMFnSPZr" + }, + "outputs": [], + "source": [ + "# @title Set Your Values Here { display-mode: \"form\" }\n", + "REGION = \"us-central1\" # @param {type: \"string\"}\n", + "INSTANCE = \"my-pg-instance\" # @param {type: \"string\"}\n", + "DATABASE = \"my-database\" # @param {type: \"string\"}\n", + "TABLE_NAME = \"vector_store\" # @param {type: \"string\"}" + ] + }, + { + "cell_type": "markdown", + "id": "QuQigs4UoFQ2", + "metadata": { + "id": "QuQigs4UoFQ2" + }, + "source": [ + "### PostgreSQLEngine Connection Pool\n", + "\n", + "One of the requirements and arguments to establish Cloud SQL as a vector store is a `PostgreSQLEngine` object. The `PostgreSQLEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n", + "\n", + "To create a `PostgreSQLEngine` using `PostgreSQLEngine.from_instance()` you need to provide only 4 things:\n", + "\n", + "1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n", + "1. `region` : Region where the Cloud SQL instance is located.\n", + "1. `instance` : The name of the Cloud SQL instance.\n", + "1. `database` : The name of the database to connect to on the Cloud SQL instance.\n", + "\n", + "By default, [IAM database authentication](https://cloud.google.com/sql/docs/postgres/iam-authentication#iam-db-auth) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n", + "\n", + "For more informatin on IAM database authentication please see:\n", + "* [Configure an instance for IAM database authentication](https://cloud.google.com/sql/docs/postgres/create-edit-iam-instances)\n", + "* [Manage users with IAM database authentication](https://cloud.google.com/sql/docs/postgres/add-manage-iam-users)\n", + "\n", + "Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/postgres/built-in-authentication) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `PostgreSQLEngine.from_instance()`:\n", + "* `user` : Database user to use for built-in database authentication and login\n", + "* `password` : Database password to use for built-in database authentication and login.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\"**Note**: This tutorial demonstrates the async interface. All async methods have corresponding sync methods.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_pg import PostgreSQLEngine\n", + "\n", + "engine = await PostgreSQLEngine.afrom_instance(\n", + " project_id=PROJECT_ID, region=REGION, instance=INSTANCE, database=DATABASE\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "D9Xs2qhm6X56" + }, + "source": [ + "### Initialize a table\n", + "The `PostgresVectorStore` class requires a database table. The `PostgreSQLEngine` engine has a helper method `init_vectorstore_table()` that can be used to create a table with the proper schema for you." + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "id": "avlyHEMn6gzU" + }, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_pg import PostgreSQLEngine\n", + "\n", + "await engine.ainit_vectorstore_table(\n", + " table_name=TABLE_NAME,\n", + " vector_size=768, # Vector size for VertexAI model(textembedding-gecko@latest)\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an embedding class instance\n", + "\n", + "You can use any [LangChain embeddings model](https://python.langchain.com/docs/integrations/text_embedding/).\n", + "You may need to enable Vertex AI API to use `VertexAIEmbeddings`. We recommend setting the embedding model's version for production, learn more about the [Text embeddings models](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/text-embeddings)." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5utKIdq7KYi5", + "metadata": { + "id": "5utKIdq7KYi5" + }, + "outputs": [], + "source": [ + "# enable Vertex AI API\n", + "!gcloud services enable aiplatform.googleapis.com" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Vb2RJocV9_LQ", + "outputId": "37f5dc74-2512-47b2-c135-f34c10afdcf4" + }, + "outputs": [], + "source": [ + "from langchain_google_vertexai import VertexAIEmbeddings\n", + "\n", + "embedding = VertexAIEmbeddings(\n", + " model_name=\"textembedding-gecko@latest\", project=PROJECT_ID\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e1tl0aNx7SWy" + }, + "source": [ + "### Initialize a default PostgresVectorStore" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "z-AZyzAQ7bsf" + }, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_pg import PostgresVectorStore\n", + "\n", + "store = await PostgresVectorStore.create( # Use .create() to initialize an async vector store\n", + " engine=engine,\n", + " table_name=TABLE_NAME,\n", + " embedding_service=embedding,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Add texts" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import uuid\n", + "\n", + "all_texts = [\"Apples and oranges\", \"Cars and airplanes\", \"Pineapple\", \"Train\", \"Banana\"]\n", + "metadatas = [{\"len\": len(t)} for t in all_texts]\n", + "ids = [str(uuid.uuid4()) for _ in all_texts]\n", + "\n", + "await store.aadd_texts(all_texts, metadatas=metadatas, ids=ids)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete texts" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "await store.adelete([ids[1]])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Search for documents" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "query = \"I'd like a fruit.\"\n", + "docs = await store.asimilarity_search(query)\n", + "print(docs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Search for documents by vector" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "query_vector = embedding.embed_query(query)\n", + "docs = await store.asimilarity_search_by_vector(query_vector, k=2)\n", + "print(docs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Add a Index\n", + "Speed up vector search queries by applying a vector index. Learn more about [vector indexes](https://cloud.google.com/blog/products/databases/faster-similarity-search-performance-with-pgvector-indexes)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_pg.indexes import IVFFlatIndex\n", + "\n", + "index = IVFFlatIndex()\n", + "await store.aapply_vector_index(index)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Re-index" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "await store.areindex() # Re-index using default index name" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Remove an index" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "await store.aadrop_vector_index() # Delete index using default name" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a custom Vector Store\n", + "A Vector Store can take advantage of relational data to filter similarity searches.\n", + "\n", + "Create a table with custom metadata columns." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_pg import Column\n", + "\n", + "# Set table name\n", + "TABLE_NAME = \"vectorstore_custom\"\n", + "\n", + "await engine.ainit_vectorstore_table(\n", + " table_name=TABLE_NAME,\n", + " vector_size=768, # VertexAI model: textembedding-gecko@latest\n", + " metadata_columns=[Column(\"len\", \"INTEGER\")],\n", + ")\n", + "\n", + "\n", + "# Initialize PostgresVectorStore\n", + "custom_store = await PostgresVectorStore.create(\n", + " engine=engine,\n", + " table_name=TABLE_NAME,\n", + " embedding_service=embedding,\n", + " metadata_columns=[\"len\"],\n", + " # Connect to a existing VectorStore by customizing the table schema:\n", + " # id_column=\"uuid\",\n", + " # content_column=\"documents\",\n", + " # embedding_column=\"vectors\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Search for documents with metadata filter" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import uuid\n", + "\n", + "# Add texts to the Vector Store\n", + "all_texts = [\"Apples and oranges\", \"Cars and airplanes\", \"Pineapple\", \"Train\", \"Banana\"]\n", + "metadatas = [{\"len\": len(t)} for t in all_texts]\n", + "ids = [str(uuid.uuid4()) for _ in all_texts]\n", + "await store.aadd_texts(all_texts, metadatas=metadatas, ids=ids)\n", + "\n", + "# Use filter on search\n", + "docs = await custom_store.asimilarity_search_by_vector(query_vector, filter=\"len >= 6\")\n", + "\n", + "print(docs)" + ] + } + ], + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.5" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/docs/docs/integrations/vectorstores/google_memorystore_redis.ipynb b/docs/docs/integrations/vectorstores/google_memorystore_redis.ipynb new file mode 100644 index 0000000000000..bd1419e299840 --- /dev/null +++ b/docs/docs/integrations/vectorstores/google_memorystore_redis.ipynb @@ -0,0 +1,429 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google Memorystore for Redis\n", + "\n", + "> [Google Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) is a fully-managed service that is powered by the Redis in-memory data store to build application caches that provide sub-millisecond data access. Extend your database application to build AI-powered experiences leveraging Memorystore for Redis's Langchain integrations.\n", + "\n", + "This notebook goes over how to use [Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) to store vector embeddings with the `MemorystoreVectorStore` class.\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-memorystore-redis-python/blob/main/docs/vector_store.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Pre-reqs" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + "* [Create a Memorystore for Redis instance](https://cloud.google.com/memorystore/docs/redis/create-instance-console). Ensure that the version is greater than or equal to 7.2." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "\n", + "The integration lives in its own `langchain-google-memorystore-redis` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%pip install -upgrade --quiet langchain-google-memorystore-redis" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🔐 Authentication\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "* If you are using Colab to run this notebook, use the cell below and continue.\n", + "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Usage" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize a Vector Index" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "import redis\n", + "from langchain_google_memorystore_redis import (\n", + " DistanceStrategy,\n", + " HNSWConfig,\n", + " RedisVectorStore,\n", + ")\n", + "\n", + "# Connect to a Memorystore for Redis instance\n", + "redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n", + "\n", + "# Configure HNSW index with descriptive parameters\n", + "index_config = HNSWConfig(\n", + " name=\"my_vector_index\", distance_strategy=DistanceStrategy.COSINE, vector_size=128\n", + ")\n", + "\n", + "# Initialize/create the vector store index\n", + "RedisVectorStore.init_index(client=redis_client, index_config=index_config)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prepare Documents\n", + "\n", + "Text needs processing and numerical representation before interacting with a vector store. This involves:\n", + "\n", + "* Loading Text: The TextLoader obtains text data from a file (e.g., \"state_of_the_union.txt\").\n", + "* Text Splitting: The CharacterTextSplitter breaks the text into smaller chunks for embedding models." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain_community.document_loaders import TextLoader\n", + "\n", + "loader = TextLoader(\"./state_of_the_union.txt\")\n", + "documents = loader.load()\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", + "docs = text_splitter.split_documents(documents)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Add Documents to the Vector Store\n", + "\n", + "After text preparation and embedding generation, the following methods insert them into the Redis vector store." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Method 1: Classmethod for Direct Insertion\n", + "\n", + "This approach combines embedding creation and insertion into a single step using the from_documents classmethod:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_community.embeddings.fake import FakeEmbeddings\n", + "\n", + "embeddings = FakeEmbeddings(size=128)\n", + "redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n", + "rvs = RedisVectorStore.from_documents(\n", + " docs, embedding=embeddings, client=redis_client, index_name=\"my_vector_index\"\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Method 2: Instance-Based Insertion\n", + "This approach offers flexibility when working with a new or existing RedisVectorStore:\n", + "\n", + "* [Optional] Create a RedisVectorStore Instance: Instantiate a RedisVectorStore object for customization. If you already have an instance, proceed to the next step.\n", + "* Add Text with Metadata: Provide raw text and metadata to the instance. Embedding generation and insertion into the vector store are handled automatically." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "rvs = RedisVectorStore(\n", + " client=redis_client, index_name=\"my_vector_index\", embeddings=embeddings\n", + ")\n", + "ids = rvs.add_texts(\n", + " texts=[d.page_content for d in docs], metadatas=[d.metadata for d in docs]\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Perform a Similarity Search (KNN)\n", + "\n", + "With the vector store populated, it's possible to search for text semantically similar to a query. Here's how to use KNN (K-Nearest Neighbors) with default settings:\n", + "\n", + "* Formulate the Query: A natural language question expresses the search intent (e.g., \"What did the president say about Ketanji Brown Jackson\").\n", + "* Retrieve Similar Results: The `similarity_search` method finds items in the vector store closest to the query in meaning." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pprint\n", + "\n", + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "knn_results = rvs.similarity_search(query=query)\n", + "pprint.pprint(knn_results)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Perform a Range-Based Similarity Search\n", + "\n", + "Range queries provide more control by specifying a desired similarity threshold along with the query text:\n", + "\n", + "* Formulate the Query: A natural language question defines the search intent.\n", + "* Set Similarity Threshold: The distance_threshold parameter determines how close a match must be considered relevant.\n", + "* Retrieve Results: The `similarity_search_with_score` method finds items from the vector store that fall within the specified similarity threshold." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "rq_results = rvs.similarity_search_with_score(query=query, distance_threshold=0.8)\n", + "pprint.pprint(rq_results)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Perform a Maximal Marginal Relevance (MMR) Search\n", + "\n", + "MMR queries aim to find results that are both relevant to the query and diverse from each other, reducing redundancy in search results.\n", + "\n", + "* Formulate the Query: A natural language question defines the search intent.\n", + "* Balance Relevance and Diversity: The lambda_mult parameter controls the trade-off between strict relevance and promoting variety in the results.\n", + "* Retrieve MMR Results: The `max_marginal_relevance_search` method returns items that optimize the combination of relevance and diversity based on the lambda setting." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "mmr_results = rvs.max_marginal_relevance_search(query=query, lambda_mult=0.90)\n", + "pprint.pprint(mmr_results)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use the Vector Store as a Retriever\n", + "\n", + "For seamless integration with other LangChain components, a vector store can be converted into a Retriever. This offers several advantages:\n", + "\n", + "* LangChain Compatibility: Many LangChain tools and methods are designed to directly interact with retrievers.\n", + "* Ease of Use: The `as_retriever()` method converts the vector store into a format that simplifies querying." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "retriever = rvs.as_retriever()\n", + "results = retriever.invoke(query)\n", + "pprint.pprint(results)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Clean up" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete Documents from the Vector Store\n", + "\n", + "Occasionally, it's necessary to remove documents (and their associated vectors) from the vector store. The `delete` method provides this functionality." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "rvs.delete(ids)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete a Vector Index\n", + "\n", + "There might be circumstances where the deletion of an existing vector index is necessary. Common reasons include:\n", + "\n", + "* Index Configuration Changes: If index parameters need modification, it's often required to delete and recreate the index.\n", + "* Storage Management: Removing unused indices can help free up space within the Redis instance.\n", + "\n", + "Caution: Vector index deletion is an irreversible operation. Be certain that the stored vectors and search functionality are no longer required before proceeding." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "# Delete the vector index\n", + "RedisVectorStore.drop_index(client=redis_client, index_name=\"my_vector_index\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/vectorstores/google_spanner.ipynb b/docs/docs/integrations/vectorstores/google_spanner.ipynb new file mode 100644 index 0000000000000..1f585f49ac2a9 --- /dev/null +++ b/docs/docs/integrations/vectorstores/google_spanner.ipynb @@ -0,0 +1,380 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google Spanner\n", + "> [Cloud Spanner](https://cloud.google.com/spanner) is a highly scalable database that combines unlimited scalability with relational semantics, such as secondary indexes, strong consistency, schemas, and SQL providing 99.999% availability in one easy solution.\n", + "\n", + "This notebook goes over how to use `Spanner` for Vector Search with `SpannerVectorStore` class." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before You Begin\n", + "\n", + "To run this notebook, you will need to do the following:\n", + "\n", + " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + " * [Create a Spanner instance](https://cloud.google.com/spanner/docs/create-manage-instances)\n", + " * [Create a Spanner database](https://cloud.google.com/spanner/docs/create-manage-databases)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🦜🔗 Library Installation\n", + "The integration lives in its own `langchain-google-spanner` package, so we need to install it." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], + "source": [ + "%pip install --upgrade --quiet langchain-google-spanner" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# # Automatically restart kernel after installs so that your environment can access the new packages\n", + "# import IPython\n", + "\n", + "# app = IPython.Application.instance()\n", + "# app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🔐 Authentication\n", + "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n", + "\n", + "* If you are using Colab to run this notebook, use the cell below and continue.\n", + "* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ☁ Set Your Google Cloud Project\n", + "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n", + "\n", + "If you don't know your project ID, try the following:\n", + "\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n", + "\n", + "PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id\n", + "!gcloud config set project {PROJECT_ID}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 💡 API Enablement\n", + "The `langchain-google-spanner` package requires that you [enable the Spanner API](https://console.cloud.google.com/flows/enableapi?apiid=spanner.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# enable Spanner API\n", + "!gcloud services enable spanner.googleapis.com" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set Cloud Spanner database values\n", + "Find your database values, in the [Cloud Spanner Instances page](https://console.cloud.google.com/spanner?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# @title Set Your Values Here { display-mode: \"form\" }\n", + "INSTANCE = \"my-instance\" # @param {type: \"string\"}\n", + "DATABASE = \"my-database\" # @param {type: \"string\"}\n", + "TABLE_NAME = \"vectors_search_data\" # @param {type: \"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize a table\n", + "The `SpannerVectorStore` class instance requires a database table with id, content and embeddings columns. \n", + "\n", + "The helper method `init_vector_store_table()` that can be used to create a table with the proper schema for you." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_spanner import SecondaryIndex, SpannerVectorStore, TableColumn\n", + "\n", + "SpannerVectorStore.init_vector_store_table(\n", + " instance_id=INSTANCE,\n", + " database_id=DATABASE,\n", + " table_name=TABLE_NAME,\n", + " id_column=\"row_id\",\n", + " metadata_columns=[\n", + " TableColumn(name=\"metadata\", type=\"JSON\", is_null=True),\n", + " TableColumn(name=\"title\", type=\"STRING(MAX)\", is_null=False),\n", + " ],\n", + " secondary_indexes=[\n", + " SecondaryIndex(index_name=\"row_id_and_title\", columns=[\"row_id\", \"title\"])\n", + " ],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an embedding class instance\n", + "\n", + "You can use any [LangChain embeddings model](https://python.langchain.com/docs/integrations/text_embedding/).\n", + "You may need to enable Vertex AI API to use `VertexAIEmbeddings`. We recommend setting the embedding model's version for production, learn more about the [Text embeddings models](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/text-embeddings)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# enable Vertex AI API\n", + "!gcloud services enable aiplatform.googleapis.com" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_vertexai import VertexAIEmbeddings\n", + "\n", + "embeddings = VertexAIEmbeddings(\n", + " model_name=\"textembedding-gecko@latest\", project=PROJECT_ID\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### SpannerVectorStore\n", + "\n", + "To initialize the `SpannerVectorStore` class you need to provide 4 required arguments and other arguments are optional and only need to pass if it's different from default ones\n", + "\n", + "1. `instance_id` - The name of the Spanner instance\n", + "1. `database_id` - The name of the Spanner database\n", + "1. `table_name` - The name of the table within the database to store the documents & their embeddings.\n", + "1. `embedding_service` - The Embeddings implementation which is used to generate the embeddings." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "db = SpannerVectorStore(\n", + " instance_id=INSTANCE,\n", + " database_id=DATABASE,\n", + " table_name=TABLE_NAME,\n", + " ignore_metadata_columns=[],\n", + " embedding_service=embeddings,\n", + " metadata_json_column=\"metadata\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 🔐 Add Documents\n", + "To add documents in the vector store." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import uuid\n", + "\n", + "from langchain_community.document_loaders import HNLoader\n", + "\n", + "loader = HNLoader(\"https://news.ycombinator.com/item?id=34817881\")\n", + "\n", + "documents = loader.load()\n", + "ids = [str(uuid.uuid4()) for _ in range(len(documents))]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 🔐 Search Documents\n", + "To search documents in the vector store with similarity search." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "db.similarity_search(query=\"Explain me vector store?\", k=3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 🔐 Search Documents\n", + "To search documents in the vector store with max marginal relevance search." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "db.max_marginal_relevance_search(\"Testing the langchain integration with spanner\", k=3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 🔐 Delete Documents\n", + "To remove documents from the vector store, use the IDs that correspond to the values in the `row_id`` column when initializing the VectorStore." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "db.delete(ids=[\"id1\", \"id2\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 🔐 Delete Documents\n", + "To remove documents from the vector store, you can utilize the documents themselves. The content column and metadata columns provided during VectorStore initialization will be used to find out the rows corresponding to the documents. Any matching rows will then be deleted." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "db.delete(documents=[documents[0], documents[1]])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}