Operational lead/coordinator
Spectraforce
Cambridge, Massachusetts
a month ago
Job Description
Job Title: Operational lead/coordinator
Job Location: Cambridge, MA (02139)
Duration: 12 months
Job description:
Operational lead/coordinator for BR compute operational management.
Acting as the lead/coordinator for the BR computes including NVIDIA DGX for large AI model operations, including but not limited to:
To support the activities the following skills are required:
At SPECTRAFORCE, we are committed to maintaining a workplace that ensures fair compensation and wage transparency in adherence with all applicable state and local laws. This position’s starting pay is: $67.89/hr.
Job Location: Cambridge, MA (02139)
Duration: 12 months
Job description:
Operational lead/coordinator for BR compute operational management.
Acting as the lead/coordinator for the BR computes including NVIDIA DGX for large AI model operations, including but not limited to:
- Provide users with access to platform ensuring completion of mandatory training
- Scheduling deployments
- Product Operations, including adherence to security compliance practices and documentations.
- Platform usage and reporting, through collaboration with corporate IT ,to determine metrics to be implemented, and reporting back to governance board
- Platform operations and monitoring, managing the user facing communications for issues and working in collaboration with internal and external partners on the resolution
- Support users on AI model operations should they face issues, through support to navigate IT landscape services or facilitating support from vendor platform provider
- Support environment setup/cleanup, ensuring user adherence to project model and data off-boarding according to expectations for platform usage
- Create and publicize platform related training and keep materials updated
To support the activities the following skills are required:
- Leadership & problem solving: Co-lead the operationalization of the environment, collaborating to establish SOP's & guidelines, navigating ambiguity, and adapting to evolving systems
- Technical Knowledge: Familiarity with interfacing and services with data warehouses. Proficiency in Docker, Kubernetes, and SSH to assist users with container setup, port forwarding, and interactive access.
- Strong knowledge of cloud platforms, with preference for NVIDIA (DGX) & scheduling tools, including RunAI.
- Resource Management: Ability to monitor and manage GPU and storage resources, ensuring efficient usage and addressing any underutilization.
- Data Management: Familiarity with Data Warehousing; perform data upload and cleanup on the computing platform.
- User Support and Training: Experience in providing technical training and support, particularly in using Docker and SSH, to help users manage their code and data independently.
- Coordination and Documentation: Skills in creating detailed documentation and knowledge base articles, and coordinating with DDIT to streamline the onboarding process.
- Operational tasks: Capability to handle technical operations tasks such as deleting containers and images from the registry and assigning resources on the cluster.
- 5 years experience, given the ambiguity of getting a new platform off the ground, and a educational background of BA/BS in a technical field (or scientific with significant technical experience).
Applicant Notices & Disclaimers
- For information on benefits, equal opportunity employment, and location-specific applicant notices, click here
At SPECTRAFORCE, we are committed to maintaining a workplace that ensures fair compensation and wage transparency in adherence with all applicable state and local laws. This position’s starting pay is: $67.89/hr.