Transform Business Questions Into Code
The Text2SQL & Python feature represents the future of data analysis, where technical barriers disappear and anyone can harness the full power of databases and statistical programming through natural conversation.
Code Generation Capabilities
SQL Query Generation
Transform business questions into sophisticated database queries:
- Complex Joins: Multi-table relationships handled automatically
- Window Functions: Advanced analytics like running totals and rankings
- Aggregations: Group-by operations with intelligent filtering
- Performance Optimization: Indexed queries that run efficiently
Mongo Aggregation Generation
Transform business questions into sophisticated mongo aggregations:
Python Script Creation
Generate statistical analysis and data manipulation code:
- Data Cleaning: Handle missing values, outliers, and transformations
- Statistical Modeling: Regression, classification, and clustering algorithms
- Visualization: Professional charts with matplotlib and seaborn
- Time Series: Forecasting and trend analysis
Code Quality
Real-World Examples
Business Intelligence Queries
Question: "Show me percentage of users with email with .edu extension in it"
Generated SQL:
SELECT
DATE_TRUNC('month', order_date) as month,
product_category,
SUM(revenue) as monthly_revenue,
LAG(SUM(revenue)) OVER (
PARTITION BY product_category
ORDER BY DATE_TRUNC('month', order_date)
) as previous_month_revenue
FROM orders o
JOIN products p ON o.product_id = p.id
WHERE order_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY month, product_category
ORDER BY month DESC, monthly_revenue DESC;
Statistical Analysis
Question: "How many users in our database has a .edu extension in their emails?"
Generated Python:
# Get the users dataset
dataset = [ds for ds in existing_datasets if ds.dataset_id == '68b819eee96e2fa1686a319e'][0]
df = dataset.data
# Filter for users with .edu email extensions
edu_users = df[df['email'].str.contains(r'\.edu$', case=False, na=False)]
# Count total .edu users
edu_count = len(edu_users)
total_users = len(df)
# Create summary dataframe
summary_df = pd.DataFrame({
'Category': ['Users with .edu emails', 'Total users', 'Percentage with .edu'],
'Count': [edu_count, total_users, f"{(edu_count/total_users)*100:.2f}%"]
})
print(f"Users with .edu email extensions: {edu_count}")
print(f"Total users: {total_users}")
print(f"Percentage: {(edu_count/total_users)*100:.2f}%")
# Show some examples of .edu emails if any exist
if edu_count > 0:
print("\nExamples of .edu email addresses:")
print(edu_users[['display_name', 'email']].head(10))
computed_dataframe.append(summary_df)
Advanced Features
Multi-Language Support
The AI can generate code in multiple languages and formats:
- SQL Dialects: MySQL, PostgreSQL, Snowflake, BigQuery
- Python Libraries: pandas, NumPy, scikit-learn, matplotlib
- MongoDb Aggregations: For mongo databases
Code Optimization
Every generated query and script is optimized for performance:
- Query Planning: Efficient execution paths and index usage
- Memory Management: Optimized data loading and processing
- Vectorization: Efficient operations on large datasets
- Error Handling: Robust code that handles edge cases