๐๐ป MER-Factory ๐๐ป
Your automated factory for constructing Multimodal Emotion Recognition and Reasoning (MERR) datasets
๐ Project Roadmap
MER-Factory is under active development with new features being added regularly - check our roadmap and welcome contributions!
Quick Overview
MER-Factory is a Python-based, open-source framework designed for the Affective Computing community. It automates the creation of unified datasets for training Multimodal Large Language Models (MLLMs) by extracting multimodal features and leveraging LLMs to generate detailed analyses and emotional reasoning summaries.
๐ Key Features
- Multi-Pipeline Architecture: Support for AU, Audio, Video, Image, and full MER processing.
- Flexible Analysis Tasks: Choose between MERR and Sentiment Analysis.
- Flexible Model Integration: Works with OpenAI, Google Gemini, Ollama, and Hugging Face models.
- Scalable Processing: Async/concurrent processing for large datasets.
- Scientific Foundation: Based on Facial Action Coding System (FACS) and latest research.
- Easy CLI Interface: Simple command-line usage with comprehensive options.
- Interactive Tools: Web-based dashboard for data curation and configuration management.
๐ Processing Types
Pipeline | Description | Use Case |
---|---|---|
AU | Facial Action Unit extraction and description. | Facial expression analysis. |
Audio | Speech transcription and tonal analysis. | Audio emotion analysis. |
Video | Comprehensive video content description. | Video emotion analysis. |
Image | Static image emotion recognition. | Image-based emotion analysis. |
MER | Complete multimodal pipeline. | Full emotion reasoning datasets. |
๐ฏ Analysis Task Types
The --task
argument allows you to specify the analysis goal.
Task | --task argument |
Description |
---|---|---|
MERR | "MERR" |
(Default) Performs detailed analysis with MER. |
Sentiment Analysis | "Sentiment Analysis" |
Performs sentiment-focused analysis (positive, negative, neutral). |
๐ Example Outputs
Check out real examples of what MER-Factory produces:
Architecture Overview
- CLI Framework: Utilizes Typer for a robust and user-friendly command-line interface.
- Workflow Management: Employs LangGraph to enable stateful and dynamic processing pipelines.
- Facial Analysis: Integrates OpenFace for precise Facial Action Units extraction.
- Media Processing: Leverages FFmpeg for advanced audio and video manipulation tasks.
- AI Integration: Features a pluggable architecture supporting multiple LLM providers.
- Concurrency: Implements Asyncio for efficient and scalable parallel processing.
Getting Started
Ready to dive in? Hereโs what you need to know:
- Prerequisites - Install FFmpeg and OpenFace
- Installation Guide - Set up MER-Factory
- Basic Usage - Your first emotion recognition pipeline
- Model Configuration - Choose and configure your AI models
- Advanced Features - Explore all capabilities
Community & Support
- ๐ Technical Documentation - Deep dive into system architecture
- ๐ง API Reference - Complete function and class documentation
- ๐ก Examples - Real-world usage examples and tutorials
- ๐ Issues & Bug Reports - GitHub Issues
- ๐ฌ Discussions - GitHub Discussions
Advancing together with the Affective Computing community.