Automating AI Document OCR with Elementum and Snowflake

Overview

This workflow enables you to automatically extract text content from documents (PDFs, images, etc.) stored in Snowflake stages using Snowflake’s AI_PARSE_DOCUMENT capability with OCR mode and Elementum’s Automation System. The Snowflake AI OCR workflow consists of nine main steps:

Create a Snowflake stage for document files
Create an AI OCR stored procedure in Snowflake
Create a Snowflake view for stage files
Import the stored procedure into Elementum via CloudLink
Import the view as an Elementum table
Build a Data Mine to monitor for new or changed documents
Create an automation triggered by the Data Mine
Process documents using the Run Function action to call your OCR procedure
Add additional actions to work with the extracted text

This workflow leverages Snowflake AI capabilities to extract text from documents without moving your files outside of your data environment. The OCR processing is orchestrated through Elementum within your Snowflake environment, keeping your data secure and centralized.

Prerequisites

Before starting this workflow, ensure you have:

Snowflake access with permissions to create stages, views, and stored procedures
Elementum CloudLink configured and connected to your Snowflake instance
Documents uploaded to a Snowflake stage (e.g., PDFs, images)
Directory Table enabled on your Snowflake stage for file listing and metadata access
Snowflake AI features enabled in your account for AI_PARSE_DOCUMENT functionality
Understanding of Elementum Tables, Data Mining, and Automation System

Step 1: Create Snowflake Stage

First, create a Snowflake stage for your documents with encryption enabled and directory table enabled. Execute this SQL in your Snowflake environment:

USE DATABASE YOUR_DATABASE;
USE SCHEMA YOUR_SCHEMA;

-- Create internal stage with directory table and encryption enabled
CREATE OR REPLACE STAGE DOCUMENT_STAGE 
  DIRECTORY = (ENABLE = TRUE) 
  ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

Step 2: Create AI OCR Stored Procedure

Create a stored procedure that uses Snowflake’s AI_PARSE_DOCUMENT function to extract text from documents.

CREATE OR REPLACE PROCEDURE AI_OCR_FROM_STAGE_SP(FILE_PATH STRING)
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS OWNER
AS
$$
  var sql = `
    SELECT TO_VARCHAR(
      AI_PARSE_DOCUMENT(
        TO_FILE('@YOUR_DATABASE.YOUR_SCHEMA.DOCUMENT_STAGE', ?),
        OBJECT_CONSTRUCT('mode', 'OCR')
      )
    ) AS response
  `;
  var stmt = snowflake.createStatement({
    sqlText: sql,
    binds: [FILE_PATH]
  });
  var rs = stmt.execute();
  if (rs.next()) {
    return rs.getColumnValue(1); // response
  } else {
    return null;
  }
$$;

Understanding the Stored Procedure

FILE_PATH: Takes the relative path of the file within the stage
TO_FILE(): References the file in the Snowflake stage
AI_PARSE_DOCUMENT(): Snowflake’s AI function that processes the document
mode: 'OCR': Specifies OCR mode for text extraction
Returns: JSON string with extracted content and metadata

The response structure looks like this:

{
  "content": "Extracted text content from the document...",
  "metadata": {
    "pageCount": 1
  }
}

Grant CloudLink Access to Stored Procedure

Ensure your Elementum CloudLink role has permission to execute the stored procedure:

GRANT USAGE ON PROCEDURE YOUR_DATABASE.YOUR_SCHEMA.AI_OCR_FROM_STAGE_SP(STRING)
TO ROLE YOUR_CLOUDLINK_ROLE;

Step 3: Create Snowflake View from Stage

Create a Snowflake view that provides access to your stage files with metadata. Execute this SQL in your Snowflake environment:

CREATE OR REPLACE VIEW DOCUMENT_STAGE_VIEW AS 
SELECT RELATIVE_PATH,
       SIZE,
       LAST_MODIFIED,
       MD5
FROM DIRECTORY(@DOCUMENT_STAGE);

Understanding the View Components

RELATIVE_PATH: File path within the stage (used to identify files for OCR processing)
SIZE: File size in bytes
LAST_MODIFIED: Timestamp of last file modification
MD5: File hash for integrity checking

Step 4: Import Stored Procedure into CloudLink

Before building your automation, import the stored procedure into Elementum through CloudLink to make it available for use.

Navigate to your CloudLink connection settings
Click on “Functions”
Select the database and schema where your stored procedure is located
Find your AI_OCR_FROM_STAGE_SP stored procedure in the list
Optionally rename it for easier identification in automations
Click “Save” to make it available for use in automations

Once saved, the stored procedure will appear in the Run Function action dropdown when building automations.

Step 5: Import View as Elementum Table

Once your Snowflake view is created, import it into Elementum as a table.

Navigate to Tables → Explore Data → CloudLink
Select your Snowflake connection and choose the view you created
Click “Create Table” and fill out the details

Step 6: Build Data Mine for Document Monitoring

Create a Data Mine to automatically detect when new documents arrive or existing documents change.

In your table, go to Data Mining → Create Data Mine → Logic-Based Rules Mining
Identifying Columns: Select RELATIVE_PATH, LAST_MODIFIED, and MD5

These columns work together to track individual files across Data Mine runs, detect when files are modified or replaced, and ensure accurate state management (ON/OFF transitions).

Matching Criteria: Set filters for file types or conditions (optional - e.g., only .pdf files)
Name and Schedule: Give it a name and set check frequency

Step 7: Create Automation with Data Mine Trigger

Build an automation that processes documents when the Data Mine detects them.

Your automation will follow this logical flow: Data Mine Trigger → Run OCR Function → Process Extracted Text (e.g., store content in a record, trigger AI analysis)

Navigate to Automations → Create Automation
Add Data Mine Trigger and select your Data Mine
Set trigger option to “Trigger when data meets requirement”

Step 8: Process Documents Using Run Function Action

Add a Run Function action to your automation to OCR documents using the stored procedure.

Run Function action details:

Function: Select your AI_OCR_FROM_STAGE_SP stored procedure from CloudLink
Parameters:
- FILE_PATH: $RELATIVE_PATH (from the Data Mine trigger)

Variable Reference: The $RELATIVE_PATH variable comes from the Data Mine trigger, providing access to all fields from the matching stage file record.

The Run Function action will return a JSON response containing:

content: The extracted text content from the document
metadata.pageCount: Number of pages processed

Step 9: Work with the OCR Results

After the Run Function action completes, subsequent actions in your automation will have access to the OCR results.

Example: Storing OCR Results

Add an Update Record or Create Record action to store the extracted text in an Elementum record for future reference and searchability.

Example: AI Analysis of Extracted Text

Add an AI Action to analyze, summarize, or categorize the extracted text content using your configured AI provider.

Example: Conditional Processing

Add Conditional Logic to route documents based on extracted content (e.g., if certain keywords are detected, assign to specific team members).

Summary

This workflow provides a powerful way to automatically extract text from documents stored in Snowflake stages:

Snowflake Stage stores your document files with encryption and directory tracking
AI OCR Stored Procedure leverages Snowflake’s AI_PARSE_DOCUMENT for text extraction
Snowflake View makes stage files accessible with metadata
CloudLink Functions imports the stored procedure for use in automations
Elementum Table brings stage file information into your workspace
Data Mine automatically detects new or changed documents
Automation orchestrates the OCR processing workflow
Run Function Action executes the OCR procedure on each document
Additional Actions enable text analysis, storage, and intelligent workflow automation

By following this guide, you can create a robust, automated document processing system that transforms your Snowflake stage into an intelligent OCR pipeline, enabling your business to automatically extract and process text from documents as they arrive.

Appendix: Complete Quick Setup

Use the following SQL to create a complete OCR processing setup in Snowflake. Replace the ALL_CAPS placeholders with your actual values.

Complete Setup Script

USE DATABASE DATABASE_NAME;
USE SCHEMA SCHEMA_NAME;

-- Create internal stage with directory table and encryption enabled
CREATE OR REPLACE STAGE DOCUMENT_STAGE 
  DIRECTORY = (ENABLE = TRUE) 
  ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

-- Create AI OCR stored procedure
CREATE OR REPLACE PROCEDURE AI_OCR_FROM_STAGE_SP(FILE_PATH STRING)
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS OWNER
AS
$$
  var sql = `
    SELECT TO_VARCHAR(
      AI_PARSE_DOCUMENT(
        TO_FILE('@DATABASE_NAME.SCHEMA_NAME.DOCUMENT_STAGE', ?),
        OBJECT_CONSTRUCT('mode', 'OCR')
      )
    ) AS response
  `;
  var stmt = snowflake.createStatement({
    sqlText: sql,
    binds: [FILE_PATH]
  });
  var rs = stmt.execute();
  if (rs.next()) {
    return rs.getColumnValue(1);
  } else {
    return null;
  }
$$;

-- Create view for stage files
CREATE OR REPLACE VIEW DOCUMENT_STAGE_VIEW AS 
SELECT RELATIVE_PATH,
       SIZE,
       LAST_MODIFIED,
       MD5
FROM DIRECTORY(@DOCUMENT_STAGE);

Grant CloudLink Permissions

Ensure your Elementum CloudLink role has the necessary permissions to access the stage, view, and stored procedure.

GRANT USAGE ON DATABASE DATABASE_NAME TO ROLE CLOUDLINK_ROLE;
GRANT USAGE ON SCHEMA DATABASE_NAME.SCHEMA_NAME TO ROLE CLOUDLINK_ROLE;
GRANT USAGE ON STAGE DATABASE_NAME.SCHEMA_NAME.DOCUMENT_STAGE TO ROLE CLOUDLINK_ROLE;
GRANT SELECT ON VIEW DATABASE_NAME.SCHEMA_NAME.DOCUMENT_STAGE_VIEW TO ROLE CLOUDLINK_ROLE;
GRANT USAGE ON PROCEDURE DATABASE_NAME.SCHEMA_NAME.AI_OCR_FROM_STAGE_SP(STRING) TO ROLE CLOUDLINK_ROLE;

Upload Test Document

Upload a test document to verify the stage and OCR processing are working correctly:

-- Using SnowSQL CLI
PUT file://path/to/test-document.pdf @DATABASE_NAME.SCHEMA_NAME.DOCUMENT_STAGE 
    OVERWRITE=TRUE 
    AUTO_COMPRESS=FALSE;

You can also upload files through the Snowflake web interface by navigating to your stage and using the “Upload Files” option.

Test the OCR Procedure

Test your stored procedure directly in Snowflake:

CALL AI_OCR_FROM_STAGE_SP('test-document.pdf');

You should receive a JSON response with the extracted text content and metadata.

Additional Resources

Snowflake AI_PARSE_DOCUMENT Documentation
Accessing Files from Snowflake Stages - For workflows that need to download files
Automation System - Learn more about building automations
Data Mining - Deep dive into Data Mine capabilities

Overview

Capabilities

Agent Management

Agent Integrations

Automating AI Document OCR with Elementum and Snowflake

Overview

Prerequisites

Step 1: Create Snowflake Stage

Step 2: Create AI OCR Stored Procedure

Step 3: Create Snowflake View from Stage

Step 4: Import Stored Procedure into CloudLink

Step 5: Import View as Elementum Table

Step 6: Build Data Mine for Document Monitoring

Step 7: Create Automation with Data Mine Trigger

Step 8: Process Documents Using Run Function Action

Step 9: Work with the OCR Results

Summary

Appendix: Complete Quick Setup

Additional Resources

Overview

Capabilities

Agent Management

Agent Integrations

​Overview

​Prerequisites

​Step 1: Create Snowflake Stage

​Step 2: Create AI OCR Stored Procedure

​Step 3: Create Snowflake View from Stage

​Step 4: Import Stored Procedure into CloudLink

​Step 5: Import View as Elementum Table

​Step 6: Build Data Mine for Document Monitoring

​Step 7: Create Automation with Data Mine Trigger

​Step 8: Process Documents Using Run Function Action

​Step 9: Work with the OCR Results

​Summary

​Appendix: Complete Quick Setup

​Additional Resources

Overview

Prerequisites

Step 1: Create Snowflake Stage

Step 2: Create AI OCR Stored Procedure

Step 3: Create Snowflake View from Stage

Step 4: Import Stored Procedure into CloudLink

Step 5: Import View as Elementum Table

Step 6: Build Data Mine for Document Monitoring

Step 7: Create Automation with Data Mine Trigger

Step 8: Process Documents Using Run Function Action

Step 9: Work with the OCR Results

Summary

Appendix: Complete Quick Setup

Additional Resources