Learn more about the parameters you can configure when running inference.
max_tokens
to be 1
, such that in the output we only have one token. However, if we have a more complicated query (e.g., information extraction), we may want more than one token and at the same time keep the output short and relevant. In this case we can use stop
words as seen in the example below:
max_tokens
to be 100
, as we are expecting several words describing this review, and at the same time we set the stop
to be \n\n
such that the model will stop when it sees this stop word, and semantically this means that this sentence is over. In this case, we will receive a response like this (only the output field is shown, and other fields are omitted):