RAG Collections

What is a RAG Collection?

Retrieval-Augmented Generation (RAG) collections empower designers to efficiently manage and organize data from various sources, including files and URLs. By leveraging AI-powered semantic analysis, these collections enhance the context of stored data, making information retrieval more relevant and accurate.

These collections are designed to serve as a powerful data source in the FlowGenie Studio, functioning as an organized database of information for the Genies. By integrating external information sources and FlowGenie, RAG collections offer a seamless and efficient solution for managing both structured and unstructured data within a process. RAG Collections offer the following key benefits:

  • Improved Efficiency: Designers can create and configure RAG collections within the Designer tab, add documents and URLs.

  • AI Analysis and Data Extraction: ProcessMaker AI analyzes the data sources and extracts.

  • Enhanced Contextual Insights: AI-driven semantic analysis refines question-answering capabilities, delivering precise and contextually relevant responses that boost productivity and user experience.

  • Integration with FlowGenie: Genies can utilize RAG collections for enhanced question answering. Users can configure match thresholds and retrieval limits.

  • Flexibility for External Systems: API and data connector support enable external systems to interact with RAG collections, ensuring smooth data flow,and updates.

  • Seamless Integration: Files added through processes are automatically incorporated into a RAG collection, streamlining processes and enhancing workflow efficiency.

Watch the following product tour to learn more about RAG Collections.

Creating and using a RAG Collection is a three step process:

  1. Create a RAG Collection

  2. Train a Genie using the RAG Collection

  3. Add the Genie into a process


Create a RAG Collection

Follow these steps to create a RAG collection:

  1. Navigate to the Designer tab.

  2. Hover over the Collections icon, and select New Collection.

  3. In the Name setting, enter the name of the Collection. This name must be unique from all other Collections. This is a required setting.

  4. In the Description setting, enter the description of the Collection. This is a required setting.

  5. From the Type setting, select RAG Collection.

  6. Click Save to create an empty RAG Collection.

  7. Click the +Record button to add a data source. Select from one of the following options:

    1. Use the Upload File option to add a file as a data source.

    2. Use the Web URL option to retrieve data from a publicly accessible URL.

  8. Click Add Source to add it to the collection. The Status column of the added source shows that the file is processing and changes to complete when data has been extracted.

    1. Pending: Indicates that the file is waiting to be processed.

    2. Processing: Indicates that the file is currently being processed.

    3. Completed: Indicates that the file was successfully processed and the metadata has been extracted.

  9. After a source is successfully processed, it appears as a collection record.

  10. Hover over the collection record, and select the View icon to review the highlights and keywords extracted from the data source.

Semantic Analysis in a RAG Collection

  • When a source is added to a RAG Collection, the system generates a contextual summary.

  • For file uploads, the following file types are supported: pdf, docx, txt, md, html, htm, py, js, jsx, ts, tsx, vue, php, java, cpp, c, h, cs, rb, go, rs, swift, kt, scala, xml, xls, xlsx, pptx, csv, json


Train a Genie with a RAG Collection

Follow these steps to use RAG Collections in a Genie. For detailed information on FlowGenie, see FlowGenie documentation.

  1. Create a new Genie or edit an existing one.

  2. From the Sources setting in the left menu, select a RAG Collection.

    Only fully analyzed RAG Collections are shown in the list.

  3. Configure the Match Threshold which defines the minimum similarity score for document relevance. Select a value closer to 0 for vague matching and closer to 1 for exact matching results.

  4. Set the Limit which denotes the number of relevant chunks to retrieve (1-25). Select a higher chunk value for higher volume of data retrieval.

  5. Once configured, the Genie can provide answers based on the RAG collection’s context.


Add the Genie to a Process

Once the Genie has been trained to use a RAG Collection, the next step is to add the Genie to a Process. For complete details on using a Genie in a process, see Add a Genie to a Process.