Job Description
Are you ready to architect the future?
Nexus Horizon Systems is seeking a visionary Senior 2026-Ready AI/ML Infrastructure Engineer to lead our next generation of intelligent systems. As we prepare for the technological leap into 2026, we need an expert who can build resilient, scalable, and secure infrastructure for the AI models of tomorrow.
In this role, you will bridge the gap between cutting-edge research and production-grade engineering, ensuring our platforms are not just functional today, but future-proof for the decade ahead.
Why join us?
- Work on projects that define the AI landscape of 2026.
- Competitive compensation and equity packages.
- Flexible remote-first culture with state-of-the-art equipment.
- Opportunity to mentor the next generation of engineering talent.
Responsibilities
- Architect Scalable MLOps Pipelines: Design and implement end-to-end machine learning infrastructure, including data ingestion, model training, and deployment pipelines optimized for 2026 standards.
- Infrastructure Modernization: Lead the migration and optimization of legacy systems to modern cloud-native architectures (Kubernetes, Serverless) to support high-throughput AI workloads.
- Performance Tuning: Continuously monitor, optimize, and scale GPU clusters and compute resources to ensure sub-millisecond latency for critical AI inference.
- Security & Compliance: Implement rigorous security protocols and data governance frameworks to protect sensitive AI models and training data.
- Cross-Functional Leadership: Collaborate with data scientists, researchers, and product managers to translate complex requirements into technical solutions.
- Disaster Recovery: Develop and maintain robust backup and recovery strategies to ensure business continuity.
Qualifications
- Education: Bachelor’s degree in Computer Science, Software Engineering, or a related field; Master’s degree preferred.
- Experience: 5+ years of experience in backend engineering or MLOps, with a focus on AI infrastructure.
- Programming: Proficiency in Python, PyTorch, TensorFlow, or similar deep learning frameworks.
- Cloud Expertise: Deep knowledge of cloud platforms (AWS, GCP, or Azure) and containerization technologies (Docker, Kubernetes).
- System Design: Strong understanding of distributed systems, microservices, and high-availability architectures.
- Problem Solving: Demonstrated ability to troubleshoot complex technical issues and optimize system performance under pressure.