NexGen Cloud is a rapidly growing IaaS company focused on providing innovative cloud solutions and infrastructure services. Our GPU cloud infrastructure solutions accelerate development in industries such as Artificial Intelligence & Machine Learning, VFX & Rendering, Data Science & IoT, and Computer Aided Engineering & MDO.
We are dedicated to helping our clients navigate the complexities of the digital world and achieve success through cutting-edge, scalable, secure and affordable solutions.
At the company's heart stands a group of very talented, experienced, and motivated individuals who want to make a positive change and a lasting impact on the tech world.
Position summary
We are seeking a highly skilled and experienced Senior HPC Engineer to join our growing team at NexGen Cloud. The successful candidate will be responsible for designing, implementing, and maintaining complex High Performance Computing infrastructure solutions. This is a fantastic opportunity for a talented engineer with a passion for cloud technologies to contribute to the ongoing success and growth of NexGen Cloud.
Key Responsibilities:
- Design, deploy, and manage highly available, scalable, and secure HPC infrastructure solutions, including our HPC-aaS offering
- Design, develop and support Hyperstack and InfraHub API development as related to our HPC services
- Develop and maintain infrastructure-as-code (IaC) templates and scripts to automate deployment, management, and monitoring processes.
- Collaborate closely with cross-functional teams, including architects, developers, and product managers, to develop and implement innovative infrastructure strategies that meet business requirements.
- Troubleshoot and resolve complex infrastructure issues, ensuring optimal performance and reliability.
- Lead and mentor junior infrastructure engineers, fostering a culture of continuous learning and improvement.
- Continuously research and stay up-to-date with the latest cloud technologies, tools, and best practices to drive innovation and efficiency within the team.
Essential Skills:
- 7+ years of experience in HPC infrastructure, administration and service
- Understanding of data center infrastructure
- Experience with GPU technologies through AIML, Mining, Rendering or other application.
- Hands on experience with High Performance Computing administration, upgrades and scale-out operations.
- Hands on experience with high performance parallel / distributed storage and networking for HPC
- Experience in using Ansible to manage routine operations.
- You have worked as part of a team spanning across multiple time zones and multiple disciplines.
- Fluent written English and at least conversational spoken English
- In-depth understanding of cloud architecture principles, networking, security, and performance optimization.
- Strong knowledge of Linux/Unix systems, virtualization technologies, and containerization platforms like Docker and Kubernetes.
- Excellent problem-solving, analytical, and communication skills.
- Strong teamwork and collaboration abilities, with a commitment to fostering a positive work environment.
Desirable skills:
- Experience with Openstack cloud, Ceph and WEKA storage
- Exposure to Jira and Confluence
- Capability to write technical documentation.
- Experience writing Ansible playbooks.
- Ability to write scriptstools in Python
- Git and CI/CD pipeline management
What We Offer:
- Competitive salary
- Opportunity to work with a diverse team of talented professionals who are passionate about technology and innovation.
- A collaborative and supportive work environment that encourages professional growth and development.
- Exposure to cutting-edge technologies and the opportunity to make a significant impact on the future of cloud computing.
Join our team and become a part of the NexGen Cloud Team, where innovation, collaboration, and growth are at the heart of everything we do. If you are a passionate, talented, and motivated individual looking to make a difference, apply now!