NVIDIA Nemotron Chat Model node#

Use the NVIDIA Nemotron Chat Model node to access NVIDIA Nemotron models with conversational agents. The node works with Nemotron models hosted on build.nvidia.com and with self-hosted NVIDIA Inference Microservices (NIM).

On this page, you'll find the node parameters for the NVIDIA Nemotron Chat Model node and links to more resources.

Credentials

You can find authentication information for this node here.

Parameter resolution in sub-nodes

Sub-nodes behave differently to other nodes when processing multiple items using an expression.

Most nodes, including root nodes, take any number of items as input, process these items, and output the results. You can use expressions to refer to input items, and the node resolves the expression for each item in turn. For example, given an input of five name values, the expression {{ $json.name }} resolves to each name in turn.

In sub-nodes, the expression always resolves to the first item. For example, given an input of five name values, the expression {{ $json.name }} always resolves to the first name.

Node parameters#

Model#

Select the Nemotron model to use to generate the completion.

n8n dynamically loads Nemotron models from the endpoint configured in your credential. If n8n can't reach the endpoint, it falls back to a curated list of well-known Nemotron model IDs.

Node options#

Use these options to further refine the node's behavior.

Frequency Penalty#

Use this option to control the chances of the model repeating itself. Higher values reduce the chance of the model repeating itself.

Maximum Number of Tokens#

Enter the maximum number of tokens used, which sets the completion length. Use -1 for the model default.

Response Format#

Choose Text or JSON. JSON ensures the model returns valid JSON. When you choose JSON, include the word json in your prompt in the chain or agent.

Presence Penalty#

Use this option to control the chances of the model talking about new topics. Higher values increase the chance of the model talking about new topics.

Sampling Temperature#

Use this option to control the randomness of the sampling process. A higher temperature creates more diverse sampling, but increases the risk of hallucinations.

Timeout#

Enter the maximum request time in milliseconds.

Max Retries#

Enter the maximum number of times to retry a request.

Top P#

Use this option to set the probability the completion should use. Use a lower value to ignore less probable options.

Templates and examples#

Browse NVIDIA Nemotron Chat Model integration templates, or search all templates

Refer to NVIDIA's build catalogue for the list of Nemotron models and to the NIM documentation for guidance on self-hosting. As the NVIDIA API is OpenAI-spec compatible, you can refer to LangChain's OpenAI documentation for more information about the underlying client.

View n8n's Advanced AI documentation.

This page was