Galileo
docs.galileo.ai
AI & Machine LearningInspecting Your ML Data Needs a Purpose-built Tool
llms.txt
Galileo
Docs
- Apply Bulk Annotation
- Create Annotation Rating
- Create Annotation Template
- Create Log Record Annotation Rating
- Delete Annotation Rating
- Delete Annotation Template
- Delete Log Record Annotation Rating
- Get Annotation Rating
- Get Annotation Template
- Get Log Record Annotation Rating
- List Annotation Templates
- Reorder Annotation Templates
- Update Annotation Template
- Create Api Key
- Delete Api Key
- Get Api Keys
- Get Token
- Login Api Key
- Login Email
- Login Social
- Refresh Token
- Saml Acs
- Saml Login
- Saml Metadata
- Verify Email
- Autogen Llm Scorer: Autogenerate an LLM scorer configuration.
- Compute Health Score Endpoint: Compute the health score metric for a metrics testing run.
- Create
- Create Code Scorer Version
- Create Llm Scorer Version
- Create Luna Scorer Version
- Create Preset Scorer Version: Create a preset scorer version.
- Delete Scorer
- Get Scorer
- Get Scorer Health Scores: Return all persisted health scores for a scorer against a dataset, ordered by version ASC.
- Get Scorer Version Code
- Get Scorer Version Or Latest
- Get Validate Code Scorer Task Result: Poll for a code-scorer validation task result (returns status/result).
- List All Versions For Scorer
- List Projects For Scorer Route: List all projects associated with a specific scorer.
- List Projects For Scorer Version Route: List all projects associated with a specific scorer version.
- List Scorers With Filters
- List Tags
- Manual Llm Validate
- Restore Scorer Version: List all scorers.
- Update
- Validate Code Scorer: Validate a code scorer with optional simple input/output test.
- Validate Code Scorer Dataset: Validate a code scorer against dataset rows.
- Validate Code Scorer Log Record: Validate a code scorer using actual log records.
- Validate Llm Scorer Dataset
- Validate Llm Scorer Log Record
- Write Scorer Version Health Score: Persist the health score for a scorer version against a dataset.
- Bulk Delete Datasets: Delete multiple datasets in bulk.
- Create Dataset: Creates a standalone dataset.
- Create Group Dataset Collaborators: Share a dataset with groups.
- Create User Dataset Collaborators
- Delete Dataset
- Delete Group Dataset Collaborator: Remove a group's access to a dataset.
- Delete User Dataset Collaborator: Remove a user's access to a dataset.
- Download Dataset
- Extend Dataset Content: Extends the dataset content
- Get Dataset
- Get Dataset Content
- Get Dataset Synthetic Extend Status
- Get Dataset Version Content
- List Dataset Projects
- List Datasets
- List Group Dataset Collaborators: List the groups with which the dataset has been shared.
- List User Dataset Collaborators: List the users with which the dataset has been shared.
- Preview Dataset
- Query Dataset Content
- Query Dataset Versions
- Query Datasets
- Update Dataset
- Update Dataset Content: Update the content of a dataset.
- Update Dataset Version
- Update Group Dataset Collaborator: Update the sharing permissions of a group on a dataset.
- Update User Dataset Collaborator: Update the sharing permissions of a user on a dataset.
- Upsert Dataset Content: Rollback the content of a dataset to a previous version.
- Create Experiment: Create a new experiment for a project.
- Delete Experiment: Delete a specific experiment.
- Experiments Available Columns: Procures the column information for experiments.
- Get Experiment: Retrieve a specific experiment.
- Get Experiment Metrics: Retrieve metrics for a specific experiment.
- Get Experiments Metrics: Retrieve metrics for all experiments in a project.
- Get Metric Settings
- List Experiments: Retrieve all experiments for a project.
- List Experiments Paginated: Retrieve all experiments for a project with pagination.
- Search Experiments: Search experiments for a project.
- Update Experiment: Update a specific experiment.
- Update Metric Settings
- Apply Bulk Feedback V2
- Create Feedback Rating V2
- Create Feedback Template V2
- Delete Feedback Rating V2
- Delete Feedback Template
- Get Feedback Rating V2
- Get Feedback Template V2
- List Feedback Templates V2
- Reorder Feedback Templates
- Update Feedback Template
- Add User To Group
- Create Group
- Delete Group
- Delete Group Member
- Get Group
- Get Group Roles
- List Current User Groups
- List Group Members
- List Groups
- Update Group
- Update Group Member
- Healthcheck
- Create Group Integration Collaborators: Share an integration with groups.
- Create or update a named custom integration
- Create or update Anthropic integration: Create or update an Anthropic integration for this user from Galileo.
- Create or update AWS Bedrock integration: Create or update an AWS integration for this user from Galileo.
- Create or update AWS SageMaker integration: Create or update an AWS integration for this user from Galileo.
- Create or update Azure integration: Create or update an Azure integration for this user from Galileo.
- Create or update custom integration
- Create or update Databricks integration: Create or update a databricks integration for this user from Galileo.
- Create or update Databricks integration (legacy): Create or update a databricks integration for this user from Galileo.
- Create Or Update Integration Selection: Create or update an integration selection for this user from Galileo.
- Create or update Mistral integration: Create or update an Mistral integration for this user from Galileo.
- Create or update NVIDIA integration: Create or update an NVIDIA integration for this user from Galileo.
- Create or update OpenAI integration: Create or update an OpenAI integration for this user from Galileo.
- Create or update Vegas Gateway integration: Create or update a Vegas Gateway integration for this user from Galileo.
- Create or update Vertex AI integration: Create or update a Google Vertex AI integration for a user.
- Create or update Writer integration: Create or update a Writer integration for a user.
- Create User Integration Collaborators
- Delete Group Integration Collaborator: Remove a group's access to an integration.
- Delete User Integration Collaborator: Remove a user's access to an integration.
- Get Databases For Cluster
- Get Databricks Catalogs
- Get Integration: Gets the integration data formatted for the specified integration.
- Get Integration Status: Checks if the integration status is active or not.
- List Available Integrations: List all of the available integrations to be created in Galileo.
- List Group Integration Collaborators: List the groups with which the integration has been shared.
- List User Integration Collaborators: List the users with which the integration has been shared.
- Update Group Integration Collaborator: Update the sharing permissions of a group on an integration.
- Update User Integration Collaborator: Update the sharing permissions of a user on an integration.
- Create Log Stream: Create a new log stream for a project.
- Delete Log Stream: Delete a specific log stream.
- Get Log Stream: Retrieve a specific log stream.
- Get Metric Settings
- List Log Streams: Retrieve all log streams for a project.
- List Log Streams Paginated: Retrieve all log streams for a project paginated.
- Search Log Streams: Search log streams for a project.
- Update Log Stream: Update a specific log stream.
- Update Metric Settings
- Get Logstream Insights Token Usages
- Delete By Metadata: Delete traces/sessions across all projects in the organization by metadata filters.
- Get Org Job Status: Get the status of an organization-level job.
- Create Group Project Collaborators: Share a project with groups.
- Create Project: Create a new project.
- Create User Project Collaborators: Share a project with users.
- Delete Group Project Collaborator: Remove a group's access to a project.
- Delete Project: Deletes a project and all associated runs and objects.
- Delete User Project Collaborator: Remove a user's access to a project.
- Get Collaborator Roles
- Get Project
- Get Projects V2: Gets projects optimized for V2 with pagination and server-side run counts.
- List Group Project Collaborators: List the groups with which the project has been shared.
- List User Project Collaborators: List the users with which the project has been shared.
- Update Group Project Collaborator: Update the sharing permissions of a group on a project.
- Update Project
- Update User Project Collaborator: Update the sharing permissions of a user on a project.
- Invoke
- Get Settings
- Upsert Insights Config
- Create Or Verify User: Create a new system user with an email and password.
- Create Or Verify User Social: Create a user using a social login provider.
- Count Sessions
- Count Spans
- Count Traces: This endpoint may return a slightly inaccurate count due to the way records are filtered before deduplication.
- Create Session
- Delete Sessions: Delete all session records that match the provided filters.
- Delete Spans: Delete all span records that match the provided filters.
- Delete Traces: Delete all trace records that match the provided filters.
- Export Records
- Get Aggregated Trace View
- Get Session
- Get Span
- Get Trace
- Log Spans
- Log Traces
- Metrics Testing Available Columns
- Query Custom Metrics
- Query Metrics
- Query Metrics V2: Same as /metrics/search but returns metrics with node-type counts: trace (requests_count), session_count, and span_count in aggregate_metrics and in each bucket, similar to /metrics/custom_search.
- Query Partial Sessions
- Query Partial Spans
- Query Partial Traces
- Query Sessions
- Query Spans
- Query Traces
- Recompute Metrics
- Sessions Available Columns
- Spans Available Columns
- Traces Available Columns
- Update Span: Update a span with the given ID.
- Update Trace: Update a trace with the given ID.
- Create Section
- Create Widget
- Delete Dashboard
- Delete Section: Delete section. If ungroup=True, keep widgets by moving them to dashboard top-level (clear section_id).
- Delete Widget
- Duplicate Dashboard
- Favorite Dashboard
- Get Trends
- List Dashboards
- Unfavorite Dashboard
- Update Section
- Update Trends
- Update Widget
- Current User
- Delete User
- Get User
- Get User Roles: Get all user roles.
- Invite Users
- List Users Paginated
- Update User
- Experiments API Guide: Learn how to set up projects, metrics, datasets, and experiments using Galileo's REST API
- Overview: Learn how to get started with the Galileo REST API
- Access Control: Control access to projects via role-based access control and groups in Galileo
- Annotations Overview
- Integration Costs
- Model Pricing Settings
- Compare Experiments: Learn how to compare multiple experiment runs in Galileo
- Run Experiments in Playgrounds: Learn about running experiments in the Galileo console using playgrounds and datasets
- Log Stream Metrics: Learn how to configure metrics for Log streams, including managing sampling rates
- Multimodal Observability: Log, inspect, and evaluate images, audio, and documents alongside text in your traces
- Overview: Core Observability concepts in Galileo
- Sessions Overview: Learn about log sessions in Galileo
- Create and Use Sessions: Learn to create and use Sessions in Galileo
- Fine-Tuning Luna-2 Models: Understand the requirements and process for fine-tuning Luna-2 models based off your real-world scenarios
- Luna-2 Overview: Discover Galileo's Luna-2 Evaluation model, reducing the latency and cost for metric evaluations
- Action Advancement: Understand how to measure and optimize the effectiveness of your AI agent's actions
- Action Completion: Understand how to measure whether your agent actually accomplished a user's goals
- Agent Efficiency: Learn how to measure the efficiency of your agentic workflows
- Agent Flow: Learn how to measure the correctness and coherence of an agentic trajectory by validating it against user-specified natural language tests
- Agentic Metrics: Understand and evaluate the performance of AI agents using Galileo's agentic metrics
- Conversation Quality: Learn how to measure the quality of a conversation that a user has with a chatbot
- User Intent Change: Learn how to measure if users are using your agent system for different intents across multi-turn conversation
- Reasoning Coherence: Evaluate whether an agent’s reasoning steps are logically consistent and aligned with its plan
- Tool Error: Detect and analyze tool execution errors in AI agents using Galileo Guardrail Metrics to ensure reliable tool usage in agentic workflows
- Tool Selection Quality: Evaluate tool selection quality in AI agents using Galileo Guardrail Metrics to ensure agents choose appropriate tools with correct parameters
- Improve LLM-as-a-Judge Metrics with Autotune: Use Autotune to turn feedback into prompt improvements that make LLM-as-a-judge metrics more accurate for your use case.
- Composite Metrics: Learn how to create composite metrics that leverage other metrics to perform advanced evaluations
- Custom Code-Based Metrics: Learn how to create, register, and use custom code-based metrics to evaluate your LLM applications
- Custom LLM-as-a-Judge Metrics: Learn how to create evaluation metrics using LLMs to judge the quality of responses
- LLM-as-a-Judge Prompt Engineering Guide: Learn best practices for prompt engineering with custom LLM-as-a-judge metrics
- BLEU and ROUGE: Evaluate sequence-to-sequence model performance using BLEU and ROUGE metrics to measure n-gram overlap between generated and target outputs
- Expression and Readability Metrics: Assess the style, tone, and clarity of your AI's generated content using Galileo's expression and readability metrics
- Tone: Analyze and optimize the emotional tone of AI responses using Galileo's Tone Metric to ensure appropriate emotional context and user engagement
- How LLM-as-a-judge is Calculated: Understand how Galileo uses LLM judges to calculate metrics for AI system performance assessment
- Metrics Comparison: Explore Galileo's comprehensive out-of-the-box metrics for evaluating and improving AI system performance across multiple dimensions
- Model Confidence Metrics: Understand your AI's certainty in its responses with Galileo's model confidence metrics
- Prompt Perplexity: Measure and optimize prompt quality using Galileo's Prompt Perplexity Metric to improve model performance and response generation
- Uncertainty: Measure and analyze model confidence in AI outputs using Galileo's Uncertainty Metric to identify potential hallucinations and improve response quality
- Interruption Detection: Detect turn-taking violations in audio-based agent conversations, including agent overlap, premature agent barge-in, and user barge-in
- Multimodal Quality Metrics: Understand and evaluate multimodal elements (visual and audio) quality using Galileo's multimodal quality metrics
- Visual Fidelity: Evaluate whether a generated image in an LLM span complies with every applicable provided brand rule based on visible evidence
- Visual Quality: Evaluate whether the quality of an input / PDF in an LLM span is sufficient for the associated text prompt task to be reliably performed
- Metrics Overview: Explore Galileo's comprehensive metrics framework for evaluating and improving AI system performance across multiple dimensions
- Chunk Attribution Utilization: Understand how to measure and optimize the impact of retrieved chunks in your RAG pipeline
- Completeness: Measure how thoroughly your model's response covers the relevant information available in the provided context
- Context Adherence: Understand Galileo's Context Adherence Metric
- RAG Metrics: Evaluate retrieval and generation quality in your RAG pipeline using Galileo's RAG metrics
- Chunk Relevance: Understand how to measure and optimize the relevance of retrieved chunks to user queries in your RAG pipeline
- Chunk Relevance (Deprecated): Understand how to measure and optimize the relevance of retrieved chunks to user queries in your RAG pipeline (deprecated - use Chunk Relevance instead)
- Context Precision: Measure the percentage of relevant chunks in your retrieved context to evaluate retrieval quality in RAG systems
- Context Relevance (Query Adherence): Understand how to measure the relevance of context provided to user queries
- Precision @ K: Evaluate the precision of your retrieval system at specific rank positions to optimize Top K and ranking strategies
- Correctness: Evaluate factual accuracy in AI outputs using Galileo Guardrail Metrics to detect and prevent hallucinations in your AI systems
- Ground Truth Adherence: Measure semantic equivalence between model outputs and reference answers using Galileo's Guardrail Metrics to ensure alignment with expected responses
- Instruction Adherence: Assess instruction adherence in AI outputs using Galileo Guardrail Metrics to ensure prompt-driven models generate precise and actionable results
- Response Quality Metrics: Evaluate how correctly, consistently, and in line with ground truth your AI follows instructions and answers user queries
- PII: Detect and protect personally identifiable information in AI systems using Galileo's PII Metric to identify sensitive data and implement appropriate safeguards
- Prompt Injection: Detect and prevent security vulnerabilities in AI systems using Galileo's Prompt Injection Metric to identify malicious inputs and protect your applications
- Safety and Compliance Metrics: Identify risks, harmful content, and compliance issues in your AI with Galileo's safety and compliance metrics
- Sexism: Detect and prevent sexist content in AI systems using Galileo's Sexism Metric to identify and mitigate biased responses
- Toxicity: Detect and prevent toxic content in AI systems using Galileo's Toxicity Metric to identify and mitigate harmful responses
- SQL Adherence: Evaluate whether generated SQL queries semantically align with the user's natural language intent
- SQL Correctness: Evaluate whether generated SQL queries are syntactically valid and adhere to the provided database schema
- SQL Efficiency: Evaluate whether generated SQL queries are structured efficiently and avoid performance anti-patterns
- SQL Injection: Detect SQL injection attacks and security vulnerabilities in generated SQL queries
- Text-to-SQL Metrics: Evaluate the quality, safety, and efficiency of AI-generated SQL queries using Galileo's Text-to-SQL metrics
- Projects: Organizational Units in Galileo
- Runtime Protection: Learn the concepts behind runtime protection to safeguard applications with runtime monitoring using Luna-2 or custom code-based metrics
- Instrument LangGraph Agents with OpenTelemetry: Learn how to add comprehensive observability to your LangGraph agents using OpenTelemetry and Galileo
- Overview
- Monitor LangChain Agents with Galileo: Learn how to build and monitor a LangChain AI Agent using Galileo for tracing and observability
- Weather Vibes Agent: Learn how to build an Agentic System for a smart weather application in a Python-based tech stack
- Add Domain-Specific Custom Metrics to your Application: Learn how to create custom LLM-as-a-Judge metrics to evaluate domain-specific applications within Galileo
- Add Evaluations to a Multi-Agent LangGraph Application: Learn how to add evaluations to a multi-agent LangGraph chat bot using Galileo
- Build a RAG Application with Elasticsearch, LangGraph, and Galileo: Guide to using Elasticsearch with LangGraph for the Chatbot RAG app, logging to Galileo
- MongoDB Atlas Integration for Retrieval-Augmented Generation (RAG): Guide to using MongoDB Atlas Vector Search with LangGraph agents logging to Galileo
- Build a Stripe AI Agent with Galileo Agent Reliability: Learn how to create a complete AI agent that integrates with Stripe's payment processing API while using Galileo for AI Agent Reliability
- Evaluate Your Traces: Learn how to evaluate metrics for your logged trace with Galileo, and improve your application
- Run an Experiment: Learn how to run your first experiment using prompts and datasets
- Galileo MCP Server: Learn how to integrate Galileo's Model Context Protocol (MCP) server with AI-enabled IDEs like Cursor and VS Code
- Log Your First Trace: Learn how to log your first trace with Galileo
- Multi-agent banking chatbot sample: Get started with the multi-agent banking chatbot sample project powered by LangGraph, with RAG using Pinecone
- Preset Metric Examples: Explore curated Log Streams that show Galileo’s out-of-the-box metrics in action
- Overview: Learn how to get started with the Galileo sample projects that are included in every new account
- Simple Chatbot: Get started with the simple chatbot sample project
- Basic Agentic AI Example: Learn how to implement a basic agentic AI system using Galileo and OpenAI
- Log Using the OpenAI Wrapper: Learn how to integrate and use OpenAI's API with Galileo's wrapper client
- Log Using the Log Decorator: Learn how to use the Galileo log decorator to log functions to traces
- Log MCP Server Tool Calls: Learn how to log tool calls when calling MCP servers from your AI application
- Create Traces and Spans: Learn how to create log traces and spans manually in your AI apps
- Set Up Alerts on Logs: Learn how to set up alerts and be automatically notified when things go wrong
- Fixing Hallucinations and Factual Errors: Learn how to identify and address hallucinations and factual errors in your AI models
- Handling Ignored Instructions: Learn how to handle ignored instructions and ensure that your AI models follow your instructions
- Reducing Hesitation and Uncertainty: Learn how to reduce hesitation and uncertainty in your AI models
- Run experiments with OpenTelemetry: Learn how to run experiments using OpenTelemetry-instrumented frameworks with Galileo
- Run an experiment against a RAG app: Learn how to run experiments against RAG applications
- Evaluate Metrics with the Luna-2 Model: Learn how to evaluate metrics cheaper and faster using the Luna-2 model
- Use Luna-2 in Your Experiments: Learn how to use Luna-2 metrics when running experiments in code
- Create a Local Metric: Learn how to create a local metric in Python to use in your experiments
- Add runtime protection to a simple chatbot: Learn how to add runtime protection to a simple chatbot application
- Basic RAG Example: Learn how to implement a basic Retrieval-Augmented Generation (RAG) system using Galileo and OpenAI
- Completeness in RAG Systems: Learn how to ensure that your RAG systems provide complete answers using the Galileo completeness metric
- Maximizing Chunk Utilization: Learn how to boost your AI model's performance by fully leveraging retrieved text chunks
- Preventing Out of Context Information: Learn how to prevent out of context information from being generated by your AI models
- Add Galileo to a CrewAI Application: Learn how to add logging and evaluations with Galileo to an existing CrewAI application
- Integrate NVIDIA NIM with Galileo: Learn how to connect your self-hosted NVIDIA NIM (NVIDIA Inference Microservices) to Galileo for comprehensive LLM performance assessment, playground experimentation, and enhanced generative AI model capabilities
- OpenAI Agent Integration: Get hands on integrating Galileo into an agentic app using the OpenAI Agents SDK
- Log with OpenTelemetry, LangGraph, and OpenAI: Learn how to integrate Galileo with OpenTelemetry and OpenInference for comprehensive observability and tracing.
- Eval Engineering for AI Developers: Learn about eval engineering in our free 5-part course
- Complete Mastering Gen-AI Series: Download our complete Mastering Gen-AI e-book series
- Luna Studio: Choose the Luna Studio UI or SDK path for building, training, and deploying custom Luna metrics.
- Overview: Deploy trained Luna metrics from the Luna Studio SDK to Galileo.
- Register metric in Galileo: Register your trained metric in the Galileo platform.
- Config file: Complete reference for the data generation section of the config.
- Metric Input Types: Understanding the different input types of metrics supported by Luna Studio
- Metric Output Types: Understanding the different output types of metrics supported by Luna Studio
- Overview: Generate a labeled training dataset for your metric.
- Run Labelling only: Label an existing training dataset using your metric definition.
- Overview: End-to-end guide for generating data and training your Luna metric with the Luna Studio SDK.
- Prerequisites: Dataset preparation to get optimal results from Luna fine-tuning
- Config file: Complete reference for the training section.
- Metric Input Types: Understanding the different input types of metrics supported by Luna Studio
- Metric Output Types: Understanding the different output types of metrics supported by Luna Studio
- Evaluate: Run evaluation on a trained Luna metric.
- Post Training artifacts: What training produces and which artifacts are saved.
- Overview: Fine-tune a Luna metric from a labelled dataset.
- Understanding Config: Understand the high-level structure of the run config file.
- Installation: Install the Luna Studio SDK and optional extras.
- Luna Studio SDK: Use the Luna Studio SDK to generate data, fine-tune custom Luna metrics, and deploy them to Galileo.
- Full Sessions: Work with full session-level Luna metrics across multi-turn conversations.
- Full traces: Work with full trace-level Luna metrics in advanced or label-only workflows.
- List of Trace inputs / outputs only: Model session-like metrics from repeated input/output pairs.
- LLM spans with RAG: Train a document-grounded Luna metric using retrieved context.
- LLM spans with tools (Agentic): Train a Luna metric that evaluates tool-aware assistant behavior.
- LLM spans without RAG: Build a span-level Luna metric from multi-field examples without retrieved documents.
- Overview: Hands-on Luna Studio SDK guides for each supported metric input category.
- Using a preset metric: Use a packaged preset metric for a single-field trace-style input.
- Retriever spans: Train a Luna metric that evaluates retrieval quality or document sufficiency.
- Trace input / output only: Train a trace-style Luna metric from a single serialized field.
- Availability and deployment: Luna Studio is part of the enterprise tier of Galileo and is deployed by Galileo into your own cluster or cloud.
- Core concepts: How projects, runs, metrics, datasets, and base models fit together in Luna Studio.
- Add a dataset: Reference for the three dataset sources: Upload, Fetch from URL, and Import from Galileo.
- Datasets overview: Manage test sets and training sets across all your projects.
- Test sets: Hand-labelled datasets used to evaluate fine-tuned metrics.
- Training sets: The dataset used to fine-tune the base model during a run.
- Dataset validation: Schema and content checks Luna Studio runs on every dataset, and how to fix common errors.
- Luna Studio UI: Use the Luna Studio web app to create, train, and register custom Luna metrics.
- Galileo integration: The single API key that unlocks Galileo platform features in Luna Studio.
- LLM providers: Choose and configure the right LLM integration for the models you want to use in Luna Studio.
- Integrations overview: Connect model providers and Galileo so your whole org can use them in Luna Studio.
- Custom metrics: Define a metric with a custom LLM-as-judge prompt inside the run creation flow.
- Metrics overview: Browse and register custom metrics across your project.
- Prerequisites: Dataset preparation to get optimal results from Luna fine-tuning
- Projects overview: Top-level containers for training runs tied to a GenAI use case.
- Training runs home: The main project workspace: status summary, filters, and the runs table.
- Quickstart: Sign up, configure your first integration, run a training, and register the resulting metric — end-to-end in about 15 minutes.
- FAQ: Common questions about Luna Studio — datasets, models, integrations, and more.
- Troubleshooting: Common errors and how to recover.
- Run lifecycle: The five statuses every training run flows through, and what each one means.
- Creating a new run: Four steps to pick a metric and launch a fine-tuning job.
- Step 1 — Metric: Pick a predefined metric template or write a custom LLM-as-judge prompt.
- Step 2 — Test set: Pick or upload the labelled dataset Luna Studio will evaluate the fine-tuned metric against.
- Step 3 — Training set: Generate a training set from your test set or add your own training logs.
- Step 4 — Config and launch: Review the run summary and launch training.
- Register a metric: Publish a fine-tuned metric to the Galileo metrics store.
- Common Errors Guide
- Error Catalog: Complete reference of Galileo platform error codes, messages, and recommended actions.
- FAQ
- Where Do I Find My Project Keys?
- Troubleshooting
- Release Notes: Recent updates and enhancements to Galileo
- Datasets: Learn how to create and manage datasets for use in your experiments with our SDKs
- Experiment Groups: Organize related experiment runs in the Galileo console and with the SDK
- Experiments Basics: Learn how to use datasets and experiments in code or the Galileo console to evaluate and improve your application
- Prompts: Learn how to create and use prompt templates in experiments
- Run Experiments in Code: Learn how to run experiments in Galileo using the Galileo SDKs
- Run Experiments in Unit Tests: Learn how to run experiments in unit tests that you can use during development, or in your CI/CD pipelines
- Distributed Tracing (Beta): Log traces across multiple services
- Distributed Tracing with OpenTelemetry: Stitch traces across two agents in different processes using OTel and the GalileoSpanProcessor.
- Galileo Context: Manage trace context and control logging behavior with the Galileo Context Manager
- Galileo Logger: Get granular control over logging with the GalileoLogger class
- Log Decorator: Easily capture function inputs and outputs as spans in your traces
- Instrumentation: Learn the basics of instrumenting your application with Galileo using the Galileo SDKs
- Tags and Metadata: Learn to use tags and metadata in Galileo logging and monitoring
- Experiment Metrics: Learn how to use metrics in your experiments
- Overview: An overview of the Galileo SDKs
- Invoke runtime protection: Learn how to invoke runtime protection in code using the Galileo SDK
- Rules: Learn about defining rules for runtime protection
- Rulesets: Learn about defining rulesets for runtime protection
- Stages: Learn about defining stages for runtime protection to be used during different stages in your application workflow
- agent_control
- collaborator
- configuration
- dataset
- datasets
- decorator
- exceptions
- experiment
- experiment_tags
- experiments
- export
- bridge
- base_async_handler
- base_handler
- handler
- async_handler
- handler
- tool
- handler
- integration
- job_progress
- log_stream
- log_streams
- logger
- task_handler
- utils
- metric
- metrics
- tracing
- model
- extractors
- response_generator
- otel
- project
- projects
- prompt
- prompts
- protect
- provider
- runs
- scorers
- search
- base
- column
- exceptions
- experiment_result
- filter
- project_resolver
- query_result
- sort
- utils
- stages
- traces
- tracing
- types
- datasets
- exception_handling
- telemetry_toggle
- env_helpers
- exceptions
- headers_data
- log_config
- metrics
- projects
- prompts
- retrievers
- serialization
- singleton
- span_utils
- uuid_utils
- validations
- Overview: Get started using the Galileo Python SDK
- A2A: Trace A2A (Agent2Agent) protocol interactions with Galileo for multi-agent observability
- Logging: Learn about using the Galileo CrewAI logging integration to log agent traces
- Experiments: Learn how to use CrewAI in an experiment
- Google ADK (Native): Learn how to use the native galileo-adk package for automatic tracing of Google ADK agents
- Command and Send: Learn how Galileo logs LangGraph Command and Send types for advanced control flow
- Experiments: Learn how to use LangChain or LangGraph in an experiment
- Logging: Learn about using the Galileo LangChain and LangGraph logging integration to log agent traces
- Middleware: Learn about using GalileoMiddleware for automatic logging of LangChain agents
- Runtime protection: Learn about using Galileo runtime protection with LangChain and LangGraph
- AWS Bedrock Inference Profiles: Learn how to use AWS Bedrock Inference Profiles with Galileo in a custom deployment.
- Custom Model Integrations: Learn about custom integrations, and how to set them up in Galileo
- OpenAI Agents SDK: Learn how to send traces from the OpenAI Agents SDK to Galileo for evaluation
- OpenAI SDK: Learn about the Galileo OpenAI integration
- Overview: Learn how to integrate Galileo with OpenTelemetry and OpenInference for comprehensive observability and tracing
- Google ADK (OpenTelemetry): Learn how to integrate a Google ADK project with Galileo using OpenTelemetry and OpenInference
- OpenTelemetry Integration Recommendations: Guidelines for instrumenting your spans to ensure they are valid and properly processed by Galileo's OTLP provider.
- Mastra: Learn how to integrate a Mastra project with Galileo using OpenTelemetry
- Microsoft Agent Framework: Learn how to integrate a Microsoft Agent Framework project with Galileo using OpenTelemetry
- Pydantic AI: Learn how to integrate a Pydantic AI project with Galileo using OpenTelemetry
- Custom Spans: Create OpenTelemetry spans enriched with Galileo-specific attributes using the start_galileo_span context manager.
- Strands Agents: Learn how to integrate a Strands Agents project with Galileo using OpenTelemetry
- Vercel AI SDK: Learn how to integrate a Vercel AI SDK project with Galileo using OpenTelemetry
- Integrations Overview: Learn about the Galileo integrations with third-party SDKs to automatically log your applications
- Dataset
- Datasets
- GalileoApiClient
- GalileoCallback
- GalileoEvaluateApiClient
- GalileoEvaluateWorkflow
- GalileoLogger
- Jobs
- Projects
- ScorerSettings
- Scorers
- RecordType
- addRowsToDataset
- convertDatasetRowToRecord
- createCodeScorerVersion
- createCustomCodeMetric
- createCustomLlmMetric
- createDataset
- createExperiment
- createLlmScorerVersion
- createLogStream
- createMetricConfigs
- createProject
- createPrompt
- createPromptTemplate
- deleteDataset
- deleteMetric
- deleteProject
- deletePrompt
- deleteScorer
- deserializeInputFromString
- enableMetrics
- exportRecords
- extendDataset
- flush
- flushAll
- getAllLoggers
- getDataset
- getDatasetContent
- getDatasetMetadata
- getDatasetVersion
- getDatasetVersionHistory
- getDatasets
- getExperiment
- getExperiments
- getJob
- getJobProgress
- getLogStream
- getLogStreams
- getLogger
- getMetrics
- getProject
- getProjectWithEnvFallbacks
- getProjects
- getPrompt
- getPromptTemplate
- getPromptTemplates
- getPrompts
- init
- listDatasetProjects
- listProjectUserCollaborators
- log
- logScorerJobsStatus
- renderPrompt
- reset
- resetAll
- runExperiment
- shareProjectWithUser
- unshareProjectWithUser
- updateProjectUserCollaborator
- updatePrompt
- validateCodeScorer
- wrapOpenAI
- DatasetContent
- DatasetFormat
- DatasetRow
- JobProgress
- ListDatasetResponse
- LogRecordsMetricsQueryRequest
- LogRecordsQueryFilter
- LogRecordsSortClause
- SyntheticDatasetExtensionRequest
- AgentSpan
- BaseSpan
- BaseStep
- Document
- LlmMetrics
- LlmSpan
- RetrieverSpan
- StepWithChildSpans
- ToolSpan
- Trace
- WorkflowSpan
- InputType
- Models
- OutputType
- ScorerTypes
- isDocument
- isLlmSpanAllowedInputType
- isLlmSpanAllowedOutputType
- isMessage
- isRetrieverSpanAllowedOutputType
- isStepAllowedInputType
- isStepAllowedOutputType
- AgentSpanOptions
- BaseSpanOptions
- BaseStepOptions
- CreateCustomLlmMetricParams
- CreateLlmScorerVersionParams
- CustomizedScorer
- DatasetRecord
- DatasetRecordOptions
- DeleteMetricParams
- LlmMetricsOptions
- LlmSpanOptions
- LocalMetricConfig
- Metric
- MetricsOptions
- PromptRunSettings
- RegisteredScorer
- RetrieverSpanOptions
- ScorersConfiguration
- StepWithChildSpansOptions
- ToolSpanOptions
- TraceOptions
- WorkflowSpanOptions
- ChainPollTemplate
- ChunkMetaDataValueType
- CreateJobResponse
- ExperimentDatasetRequest
- ListPromptTemplateResponse
- LlmSpanAllowedInputType
- LlmSpanAllowedOutputType
- MetricValueType
- ModelType
- Project
- ProjectAction
- ProjectActionOpenAPI
- ProjectCollectionParams
- ProjectCollectionParamsOpenAPI
- ProjectCreate
- ProjectCreateOpenAPI
- ProjectCreateOptions
- ProjectCreateResponse
- ProjectCreateResponseOpenAPI
- ProjectDeleteResponse
- ProjectDeleteResponseOpenAPI
- ProjectOpenAPI
- ProjectPaginatedResponse
- ProjectPaginatedResponseOpenAPI
- ProjectScopeOptions
- ProjectUpdate
- ProjectUpdateOpenAPI
- ProjectUpdateResponse
- ProjectUpdateResponseOpenAPI
- PromptListOptions
- PromptTemplate
- PromptTemplateOpenAPI
- PromptTemplateVersion
- PromptTemplateVersionOpenAPI
- RecomputeLogRecordsMetricsRequest
- RecomputeLogRecordsMetricsRequestOpenAPI
- RenderPromptTemplateOptions
- RenderTemplateRequest
- RenderTemplateRequestOpenAPI
- RenderTemplateResponse
- RenderTemplateResponseOpenAPI
- RetrieverSpanAllowedOutputType
- Scorer
- ScorerConfig
- ScorerConfigOpenAPI
- ScorerDefaults
- ScorerResponse
- ScorerResponseOpenAPI
- ScorerVersion
- SessionCreateRequest
- SessionCreateRequestOpenAPI
- SessionCreateResponse
- SessionCreateResponseOpenAPI
- SingleMetricValue
- Span
- StepAllowedInputType
- StepAllowedOutputType
- StringData
- StringDataOpenAPI
- SyntheticDatasetExtensionRequestOpenAPI
- SyntheticDatasetExtensionResponse
- SyntheticDatasetExtensionResponseOpenAPI
- ToolCall
- ToolCallFunction
- UpdatePromptOptions
- UserCollaborator
- UserCollaboratorCreate
- UserCollaboratorCreateOpenAPI
- UserCollaboratorOpenAPI
- createScorerOptions
- AgentType
- GalileoMetrics
- MessageRole
- ProjectTypes
- StepType
- Overview: Get started using the Galileo TypeScript SDK
- SSO Integration: Learn how to setup SSO for your Galileo cluster
- What Is Galileo?
OpenAPI Specs
Related
Perplexity AI is an AI-powered search engine that provides direct answers to user queries by leveraging large language models.
/llms.txt
1,173 tokens
/llms-full.txt
40,087 tokens
AI & Machine Learning
Shop Dell's laptops, Monitors, Computers, Storage Solutions & Servers for your home and business. Buy online!
/llms.txt
8,533 tokens
AI & Machine Learning
Create the most realistic speech with our AI audio in 1000s of voices and 32 languages. Pioneering research in Text to Speech and AI Voice Generation
/llms.txt
23,168 tokens
/llms-full.txt
1,020,683 tokens
AI & Machine Learning
Get started with the Model Context Protocol (MCP).
/llms.txt
3,315 tokens
/llms-full.txt
223,365 tokens
AI & Machine Learning
The Voice AI Platform: TTS Models, Voice Agents, & More.
/llms.txt
1,738 tokens
/llms-full.txt
56,923 tokens
AI & Machine Learning