Node Maintenance

What to do if a node goes down

While we do try to ensure a seamless user experience with our GPU clusters, sometimes a node needs maintenance or replacing. If you are experiencing a node or cluster outage, please do the following:

If you have a shared slack channel with Together AI, please let our team know and create a ticket on the slack message. If you are experiencing 50-75% of your cluster capacity degraded, please selected Sev0 or Sev1 on your ticket form so that we make begin troubleshooting quickly. If it is one to two nodes, let us know in Slack and select a Sev2.

If you do not have a shared slack channel please fill out the priority support form that that has been provided to you by our team and follow the same severity guidelines as above.