Overview
This workflow enables you to automatically extract text content from documents (PDFs, images, etc.) stored in Snowflake stages using Snowflake’s AI_PARSE_DOCUMENT capability with OCR mode and Elementum’s Automation System. The Snowflake AI OCR workflow consists of nine main steps:- Create a Snowflake stage for document files
- Create an AI OCR stored procedure in Snowflake
- Create a Snowflake view for stage files
- Import the stored procedure into Elementum via CloudLink
- Import the view as an Elementum table
- Build a Data Mine to monitor for new or changed documents
- Create an automation triggered by the Data Mine
- Process documents using the Run Function action to call your OCR procedure
- Add additional actions to work with the extracted text
This workflow leverages Snowflake AI capabilities to extract text from documents without moving your files outside of your data environment. The OCR processing is orchestrated through Elementum within your Snowflake environment, keeping your data secure and centralized.
Prerequisites
Before starting this workflow, ensure you have:- Snowflake access with permissions to create stages, views, and stored procedures
- Elementum CloudLink configured and connected to your Snowflake instance
- Documents uploaded to a Snowflake stage (e.g., PDFs, images)
- Directory Table enabled on your Snowflake stage for file listing and metadata access
- Snowflake AI features enabled in your account for AI_PARSE_DOCUMENT functionality
- Understanding of Elementum Tables, Data Mining, and Automation System
Step 1: Create Snowflake Stage
First, create a Snowflake stage for your documents with encryption enabled and directory table enabled. Execute this SQL in your Snowflake environment:Step 2: Create AI OCR Stored Procedure
Create a stored procedure that uses Snowflake’s AI_PARSE_DOCUMENT function to extract text from documents.Understanding the Stored Procedure
Understanding the Stored Procedure
FILE_PATH: Takes the relative path of the file within the stageTO_FILE(): References the file in the Snowflake stageAI_PARSE_DOCUMENT(): Snowflake’s AI function that processes the documentmode: 'OCR': Specifies OCR mode for text extraction- Returns: JSON string with extracted content and metadata
Grant CloudLink Access to Stored Procedure
Grant CloudLink Access to Stored Procedure
Ensure your Elementum CloudLink role has permission to execute the stored procedure:
Step 3: Create Snowflake View from Stage
Create a Snowflake view that provides access to your stage files with metadata. Execute this SQL in your Snowflake environment:Understanding the View Components
Understanding the View Components
RELATIVE_PATH: File path within the stage (used to identify files for OCR processing)SIZE: File size in bytesLAST_MODIFIED: Timestamp of last file modificationMD5: File hash for integrity checking
Step 4: Import Stored Procedure into CloudLink
Before building your automation, import the stored procedure into Elementum through CloudLink to make it available for use.- Navigate to your CloudLink connection settings
- Click on “Functions”
- Select the database and schema where your stored procedure is located
- Find your
AI_OCR_FROM_STAGE_SPstored procedure in the list - Optionally rename it for easier identification in automations
- Click “Save” to make it available for use in automations
Once saved, the stored procedure will appear in the Run Function action dropdown when building automations.
Step 5: Import View as Elementum Table
Once your Snowflake view is created, import it into Elementum as a table.- Navigate to Tables → Explore Data → CloudLink
- Select your Snowflake connection and choose the view you created
- Click “Create Table” and fill out the details
Step 6: Build Data Mine for Document Monitoring
Create a Data Mine to automatically detect when new documents arrive or existing documents change.- In your table, go to Data Mining → Create Data Mine → Logic-Based Rules Mining
- Identifying Columns: Select
RELATIVE_PATH,LAST_MODIFIED, andMD5
- Matching Criteria: Set filters for file types or conditions (optional - e.g., only
.pdffiles) - Name and Schedule: Give it a name and set check frequency
Step 7: Create Automation with Data Mine Trigger
Build an automation that processes documents when the Data Mine detects them.Your automation will follow this logical flow: Data Mine Trigger → Run OCR Function → Process Extracted Text (e.g., store content in a record, trigger AI analysis)
- Navigate to Automations → Create Automation
- Add Data Mine Trigger and select your Data Mine
- Set trigger option to “Trigger when data meets requirement”
Step 8: Process Documents Using Run Function Action
Add a Run Function action to your automation to OCR documents using the stored procedure.Run Function action details:
- Function: Select your
AI_OCR_FROM_STAGE_SPstored procedure from CloudLink - Parameters:
FILE_PATH:$RELATIVE_PATH(from the Data Mine trigger)
content: The extracted text content from the documentmetadata.pageCount: Number of pages processed
Step 9: Work with the OCR Results
After the Run Function action completes, subsequent actions in your automation will have access to the OCR results.Example: Storing OCR Results
Example: Storing OCR Results
Add an Update Record or Create Record action to store the extracted text in an Elementum record for future reference and searchability.
Example: AI Analysis of Extracted Text
Example: AI Analysis of Extracted Text
Add an AI Action to analyze, summarize, or categorize the extracted text content using your configured AI provider.
Example: Conditional Processing
Example: Conditional Processing
Add Conditional Logic to route documents based on extracted content (e.g., if certain keywords are detected, assign to specific team members).
Summary
This workflow provides a powerful way to automatically extract text from documents stored in Snowflake stages:- Snowflake Stage stores your document files with encryption and directory tracking
- AI OCR Stored Procedure leverages Snowflake’s AI_PARSE_DOCUMENT for text extraction
- Snowflake View makes stage files accessible with metadata
- CloudLink Functions imports the stored procedure for use in automations
- Elementum Table brings stage file information into your workspace
- Data Mine automatically detects new or changed documents
- Automation orchestrates the OCR processing workflow
- Run Function Action executes the OCR procedure on each document
- Additional Actions enable text analysis, storage, and intelligent workflow automation
Appendix: Complete Quick Setup
Use the following SQL to create a complete OCR processing setup in Snowflake. Replace theALL_CAPS placeholders with your actual values.
Complete Setup Script
Complete Setup Script
Grant CloudLink Permissions
Grant CloudLink Permissions
Ensure your Elementum CloudLink role has the necessary permissions to access the stage, view, and stored procedure.
Upload Test Document
Upload Test Document
Upload a test document to verify the stage and OCR processing are working correctly:
You can also upload files through the Snowflake web interface by navigating to your stage and using the “Upload Files” option.
Test the OCR Procedure
Test the OCR Procedure
Test your stored procedure directly in Snowflake:You should receive a JSON response with the extracted text content and metadata.
Additional Resources
- Snowflake AI_PARSE_DOCUMENT Documentation
- Accessing Files from Snowflake Stages - For workflows that need to download files
- Automation System - Learn more about building automations
- Data Mining - Deep dive into Data Mine capabilities