Job Description
We are defining the technological landscape for the year 2026 and beyond. Nexus Horizon is seeking a visionary Senior AI Infrastructure Engineer to architect the resilient backbone of our next-generation artificial intelligence systems. You will be responsible for designing scalable, high-performance infrastructure that supports rapid model deployment and data processing at an unprecedented scale.
In this role, you will bridge the gap between cutting-edge AI research and robust engineering practices, ensuring our systems are ready for the demands of the future.
Responsibilities
- Architect and maintain high-performance GPU clusters and distributed computing environments optimized for Large Language Models (LLMs).
- Design fault-tolerant infrastructure capable of handling exabyte-scale data growth and traffic spikes.
- Implement advanced auto-scaling strategies and edge computing solutions for global latency reduction.
- Collaborate with ML researchers to optimize model inference latency and resource utilization.
- Oversee cloud migration strategies and hybrid cloud architectures using AWS, GCP, or Azure.
- Ensure end-to-end security, compliance, and data sovereignty for all 2026 roadmap initiatives.
Qualifications
- 8+ years of experience in systems engineering, DevOps, or infrastructure architecture with a focus on AI/ML.
- Deep expertise in Python, Kubernetes, Docker, Terraform, and CI/CD pipelines.
- Proven experience managing large-scale production environments with 10k+ nodes.
- Strong understanding of AI/ML frameworks (PyTorch, TensorFlow) and hardware acceleration (NVIDIA GPUs).
- Experience with high-availability systems, database optimization, and cloud-native technologies.
- Masterβs degree in Computer Science, Engineering, or a related technical field.