e6data

docs.e6data.com

e6data is a lakehouse compute engine, which runs high concurrency SQL analytics & AI workloads at 5-10x faster speed and over 50% lower TCO.

llms.txt

Product Documentation

Welcome to e6data: e6data is a lakehouse compute engine built to run high concurrency, complex SQL analytics and AI workloads—10x faster, 60% cheaper, zero data movement.
Introduction to e6data: e6data is a lakehouse compute engine built to run high concurrency, complex SQL analytics and AI workloads—10x faster, 60% cheaper, zero data movement.
Concepts: This section introduces the set of fundamental concepts and terminology used by e6data. Understanding these will help to use e6data effectively.
Architecture: Understand how e6data is structured
e6data in VPC Deployment Model
Connect to e6data serverless compute
Hybrid Data Lakehouse
Get Started: Get ready to start querying!
Sign Up: This article will help you to create your e6data account
Setup: Setting up your cloud for e6data
AWS Setup: The page provides setup guides for deploying e6data on AWS.
In VPC Deployment (AWS)
Prerequisite Infrastructure
Infrastructure & Permissions for e6data
Setup Kubernetes Components
Setup using Terraform in AWS
Update a AWS Terraform for your Workspace
AWS PrivateLink and e6data
VPC Peering | e6data on AWS: Manage and deploy AWS resources using the AWS provider in Terraform. Ensure correct setup using the official configuration guidelines.
Connect to e6data serverless compute (AWS)
Configuring Secure Access
Overview
Deployment Guide
CloudFormation Script
Workspace Creation: This page outlines the porcess of creating a workspace for connecting to e6data serverless compute
Catalog Creation: The page outlines the process of creating a catalog for connecting to e6data serverless compute
Glue Metastore
Hive Metastore
Unity Catalog
Cluster Creation: This page outlines the process for creating a e6data servlerss compute cluster
GCP Setup: The page provides setup guides for deploying e6data on GCP.
In VPC Deployment (GCP): The Page Outlines the setps & Prerequistes for in VPC deployment for GCP
Prerequisite Infrastructure
Infrastructure & Permissions for e6data
Setup Kubernetes Components
Setup using Terraform in GCP: Deploying an e6data Workspace in GCP using Terraform
Update a GCP Terraform for your Workspace
Configure GCS Access for Serverless Compute (GCP)
Prerequisites
FAQ's and Troubleshooting
Azure Setup
In VPC Deployment (Azure): Deploying e6data Workspace in Microsoft Azure using Terraform
Prerequisite Infrastructure
Infrastructure & Permissions for e6data
Setup Kubernetes Components
Setup using Terraform in AZURE: Deploying e6data Workspace in Microsoft Azure using Terraform
Update a AZURE Terraform for your Workspace
Configure Azure Storage Access for Serverless Compute (Azure)
Steps to be Performed by Customer Account
FAQ's and Troubleshooting
Workspaces: Understanding Workspaces in e6data
Create Workspaces
Enable/Disable Workspaces
Update a Workspace
Delete a Workspace
Catalogs: Understanding Catalogs in e6data
Create Catalogs
Hive Metastore: Managing connections to Hive Metastore from e6data
Connect to a Hive Metastore
Edit a Hive Metastore Connection
Delete a Hive Metastore Connection
Glue Metastore: Creating e6data catalog using Glue Metastore
Connect to a Glue Metastore
Edit a Glue Metastore Connection
Delete a Glue Metastore Connection
Unity Catalog: Creating e6data catalog using Unity catalog
Connect to Unity Catalog
Edit Unity Catalog
Delete Unity Catalog
Apache Polaris
Connect to Apache Polaris
Edit Polaris Catalog
Delete Polaris Catalog
Cross-account Catalog Access
Configure Cross-account Catalog to Access AWS Hive Metastore
Configure Cross-account Catalog to Access Unity Catalog
Configure Cross-account Catalog to Access AWS Glue
Configure Cross-account Catalog to Access GCP Hive Metastore
Manage Catalogs
Privileges: Understanding Privileges in Catalog
Access Control
Column Masking
Row Filter
Table Formats
Delta Lake
Connect to Catalog
Apache Iceberg
Connect to Catalog
Apache Hudi
Connect to Catalog
Clusters: Configure a cluster
Edit & Delete Clusters: Change the configuration of cluster
Suspend & Resume Clusters
Cluster Size: Understand the cluster sizes offered during cluster creation.
Load Based Sizing: This page describes about load based sizing in the cluster.
Auto Suspension: This page describes about the auto suspension feature in the cluster.
Query Timeout: This page describes about the query timeout feature in the cluster.
Monitoring
Connection Info
Pools: This page outlines about Pools
Delete Pools: This page outlines Pool deletion and Pool permissions.
Query Editor: Run queries using the e6data Query Editor
Editor Pane: This page explains the features and functionalities of the Editor Pane in e6data's Query Editor.
Results Pane: This page explains how to view and interpret query results in the e6data Query Editor.
Schema Explorer: This page explains how to navigate and interact with the Schema Explorer in the Query Editor.
Data Preview: This page explains how to preview dataset samples within the Query Editor.
Notebook: Run queries using the e6data Query Notebook
Editor Pane: This page explains the features and functionalities of the Notebook Editor Pane in e6data.
Results Pane: This page explains how to view and interpret query results in the e6data Notebook.
Schema Explorer: This page explains how to navigate and interact with the Schema Explorer in the Notebook.
Data Preview: This page explains how to preview dataset samples within the Notebook
Query History: This page explains how to view, manage, and track past queries executed in the e6data Query Editor.
Query Count API: Get insights into query execution volume over time in e6data.
Connectivity: This page explains the network and integration options available for connecting to e6data.
IP Sets: This page explains how to manage allowed IP ranges for secure access to e6data.
Endpoints: This page explains how to configure and manage network endpoints for connecting to e6data.
Cloud Resources: This page explains how to configure and manage cloud-based resources for e6data connectivity and storage.
Network Firewall: The e6data Network Firewall feature allows users to manage IP whitelisting, enabling or restricting access to e6data at the cluster level.
Access Control: Overview of Access Control mechanisms in e6data.
Users: Managing user accounts.
Groups: Grouping users for easier management.
Roles: Assign roles and permission to users.
Permissions: Manage user access efficiently by grouping permissions in e6data.
Policies: Define and enforce access control rules.
Single Sign-On (SSO): Enable seamless and secure authentication for e6data using SSO.
AWS SSO: Configure AWS Single Sign-On for secure authentication and access management.
Okta: Set up Okta for seamless Single Sign-On (SSO)
AWS Cognito Integration (OAuth 2.0)
MICROSOFT ENTRA ID: Configure Single Sign-On using Microsoft My Apps for streamlined authentication.
Icons for IdP: Customize identity provider (IdP) icons for better user recognition.
Service Accounts: Manage automated access with dedicated service accounts.
Multi-Factor Authentication (Beta): Enhance security with an additional verification layer.
Usage and Cost Management: Usage and Cost Management tracks resource usage and optimizes costs for efficient operations.
Audit Log: Audit Log feature of e6data , will help you navigate and utilise the Audit Logs effectively to track administrative actions related to workspaces, catalogs, and clusters.
User Settings: Manage profile details,
Profile: This page describes about the user profile.
Personal Access Tokens (PAT): This page describes authentication using e6data personal access tokens.
Advanced Features: Access extended settings and configurations
Cross-Catalog & Cross-Schema Querying: Execute queries across multiple catalogs and schemas seamlessly.
Supported Data Types: This document contains the datatypes supported by e6data
SQL Command Reference: e6data supports the following categories of functions:
Query Syntax: Guidelines and structure for writing queries effectively.
General functions: Commonly used functions for data processing.
Aggregate Functions: Aggregate functions operate on multiple sets of values and return a single value.
Mathematical Functions & Operators: This page contains the Mathematical functions and operators supported by e6data.
Arithematic Operators: Perform mathematical operations in queries.
Rounding and Truncation Functions: Adjust numerical values by rounding or truncating.
Exponential and Root Functions: Perform exponential calculations and extract roots of numbers.
Trigonometric Functions: Compute sine, cosine, tangent, and other trigonometric values.
Logarithmic Functions: Calculate logarithms using various bases for numerical analysis.
String Functions: This document contains the string functions supported by e6data.
Date-Time Functions: This document contains the date-time functions supported by e6data
Constant Functions: Return fixed values that remain unchanged in computations.
Conversion Functions: Transform data types and formats for compatibility.
Date Truncate Function: Trim date and time values to a specified precision.
Addition and Subtraction Functions: Perform arithmetic operations on numerical and date values.
Extraction Functions: Retrieve specific components from dates, times, and strings.
Format Functions: Modify the appearance of dates, numbers, and text values
Timezone Functions: Convert and manipulate timestamps across different time zones.
Conditional Expressions: Execute logic-based operations based on specified conditions.
Conversion Functions: This page contains the explicit conversion functions supported by e6data.
Window Functions: This page contains window functions supported by e6data.
Comparison Operators & Functions: This page contains the Comparison operators supported by e6data.
Logical Operators: This page contains logical operators supported by e6data.
Statistical Functions: Uncategorized additional functions supported by e6data
Bitwise Functions: Bitwise functions supported by e6data
Array Functions: Array functions supported by e6data
Regular Expression Functions: Perform pattern matching and text manipulation using regex.
Generate Functions: Create sequences, arrays, or structured data dynamically.
Cardinality Estimation Functions: pproximate the number of unique elements in a dataset.
JSON Functions: Parse, manipulate, and extract data from JSON structures.
Checksum Functions: Generate and verify hash values for data integrity.
Unload Function (Copy into): Export query results to external storage efficiently.
Struct Functions: Work with structured data by creating and manipulating nested fields.
Geospatial Functions
Equivalent Functions & Operators: Compare values and expressions for equality and similarity.
Connectors & Drivers: Integrate with external systems using supported connectors and drivers.
DBeaver: Connect and interact with databases using the DBeaver SQL client.
DbVisualiser: Access and manage databases using the DbVisualizer tool.
Apache Superset: Visualize and explore data with interactive dashboards and charts.
Jupyter Notebook: Execute queries and analyze data within an interactive notebook environment.
Tableau Cloud: Connect and visualize data using Tableau’s cloud-based analytics platform.
Tableau Desktop: Analyze and visualize data locally with Tableau’s desktop application.
Power BI: Power BI is a Microsoft business intelligence platform that allows users to visualize and analyze data from various sources, including SQL databases.
Setting up Power BI on-premises Gateway: Set up Power BI Gateway to connect Power BI with e6data for secure and seamless reporting.
Metabase: Explore and visualize data using an open-source business intelligence tool.
Zeppelin: Perform interactive data analytics with Apache Zeppelin notebooks.
Python Connector: Integrate and interact with data using the Python API.
Performance and Integration Guide
Code Samples: Python code snippets to carry out common operations on e6data
JDBC Driver: Connect to databases using the Java Database Connectivity (JDBC) standard.
Code Samples: Java code snippets to carry out common operations on e6data via JDBC Driver
API Support: Access and interact with data programmatically using REST APIs.
Configure Cluster Ingress: Securely enabling ingress to e6data clusters for external services
ALB Ingress in Kubernetes: Configuring ALB Ingress in Kubernetes
GCE Ingress in Kubernetes: Configuring GCE Ingress In Kubernetes
Ingress-Nginx in Kubernetes: Configuring Ingress-Nginx in Kubernetes
PySpark Compatibility
Getting started
Code samples
DataFrame Operations
SQL Functions
Security & Trust: Ensure data protection, compliance, and secure access controls.
Best Practices: Best practices to manage your e6data deployment
AWS Best Practices
Features & Responsibilities Matrix: Define roles and access levels for various features.
Data Protection Addendum(DPA): DATA PROTECTION ADDENDUM
Tutorials and Best Practices: This page helps you to understand on how to use e6data platform and
How to configure HIVE metastore if you don't have one?: This article will guide you on how to set up a HIVE metastore in case you don't have a metastore.
How-To Videos: Tutorial videos on how to carry out common operations in the e6data platform.
Known Limitations: Identify current constraints and restrictions in the system.
SQL Limitations: Understand constraints and unsupported features in SQL execution.
Other Limitations: Recognize additional constraints affecting functionality and performance.
Restart Triggers: Manage and configure conditions for automatic process restarts.
Cloud Provider Limitations: Understand restrictions imposed by different cloud platforms.
Error Codes: This page consists of all the errors that are displayed on the screen.
General Errors: Identify and resolve common system and user-related issues.
User Account Errors: Troubleshoot authentication, access, and profile-related issues.
Workspace Errors: Diagnose and resolve issues related to workspace setup and usage.
Catalog Errors: Address issues encountered while managing catalogs and metadata.
Cluster Errors: Troubleshoot failures and performance issues in cluster operations.
Data Governance Errors: Resolve issues related to access control, policies, and compliance.
Query History Errors: Address issues with logging, tracking, and retrieving query history.
Query Editor Errors: Troubleshoot issues related to query execution and interface functionality.
Pool Errors: Identify and resolve issues affecting resource pooling and allocation.
Connectivity Errors: Troubleshoot network and connection-related issues.
Terms & Condition: Understand the rules and policies governing usage and access.
Privacy Policy: Learn how data is collected, stored, and protected.
Cookie Policy: Understand how cookies are used for functionality and analytics.
FAQs: Find answers to common questions about features, usage, and troubleshooting.
Workspace Setup: Frequently Asked Questions about Workspace
Security: Frequently Asked Questions about Security
Catalog Privileges: Frequently Asked Questions about Catalog Privileges
Services Utilised for e6data Deployment: This page outlines the key services and resources required for deploying e6data on various cloud platforms.
AWS supported regions: This page lists the AWS regions where e6data can be deployed, ensuring optimal performance and compliance with regional requirements.
GCP supported regions: This article lists the regions supported by e6data in GCP
AZURE supported regions: This article lists the regions supported by e6data in AZURE.
Release Notes & Updates: New features, announcements & bug fixes
6th August 2025
23rd July 2025
6th Sept 2024: Enhanced Data Analyst role
6th June 2024: Latest update- MFA and service accounts.
18th April 2024: This page covers the latest updates to e6data, including improved connectivity, security, and query editing features.
9th April 2024: This page explains the latest updates to e6data, focusing on schema behavior within the Schema Explorer for improved navigation and management.
30th March 2024: This page covers recent e6data updates, including catalog privileges for column masking and row filtering, query history enhancements, and support for deletion vectors in Iceberg tables.
16th March 2024: This page highlights the new feature for exporting query history to CSV, enhancing data analysis and reporting capabilities on the e6data platform.
14th March 2024: This page covers the latest updates to e6data, including the new DataExport role, improved resource selection in catalog privileges, and the addition of client-perceived time in query history.
12th March 2024: recent e6data updates, including enhanced cluster connection info, improved connectivity, and catalog refresh capabilities for Data Analyst roles.
2nd March 2024: Covers the latest e6data updates, including catalog privileges (Beta), support for liquid clustering, new functions, and the impersonation feature for the Metabase BI tool via Apache Ranger.
10th February 2024: This page introduces the new gateway connectivity feature, enabling seamless external client connections through endpoints on the e6data platform.
3rd February 2024: This page covers the new connectivity and notebook features on the e6data platform.
17th January 2024: This page highlights the introduction of new functionalities on the e6data platform.
9th January 2024: This page covers the latest e6data enhancements, including new functionalities, bug fixes, and performance optimizations.
3rd January 2024: This page outlines the platform enhancement and catalog auto-refresh feature
18th December 2023: This page covers backend and platform enhancements, including improved security for monitoring components and enhanced access control.
12th December 2023: This page introduces the platform enhancement, including the new Find and Replace feature in the query editor.
9th December 2023: Plugin, platform, engine enhancements.
4th December 2023: Improvements to query Editor session handling.
27th November 2023: UI enhancements.
8th September 2023: User roles and privileges.
4th September 2023: Covers the new features, including the ability to enable and disable workspaces, along with UI enhancements.
26th August 2023: covers pod metrics, workspace updates, editable query history views, and UI enhancements.
21st August 2023: This page covers the new query editor roles, query resume, customizable views, catalog editing, and external access to clusters.
19th July 2023: This page highlights query editor enhancements, schema search, an enhanced query run button, and visual/UI improvements.
23rd May 2023: Covers the introduction of the Workspace Admin role, multi-catalog support, SQL optimizations, bug fixes, and known limitations.
5th May 2023: page covers tab management, cluster tag improvements, SQL optimizations, bug fixes, and known limitations.
28th April 2023: This page highlights data preview and query editor UI improvements.
19th April 2023: This page covers IP allowlisting for external access.
15th April 2023: This page highlights cross-account catalog access, execution planner, and query editor enhancements.
10th April 2023: Covers auto save in query editor and SSO login via e6data portal and Google IdP.
30th March 2023: This page covers SSO support, AWS S3 Gateway Endpoints, and parquet data pruning.

Agent Instructions: Querying This Documentation

If you need additional information, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on a page URL with the ask query parameter:

GET https://docs.e6data.com/product-documentation/welcome-to-e6data.md?ask=<question>

The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.