CSS (Closed-Source Software) Inference Service

This service serves closed source models via a CSSInferenceWorkflow object, which encapsulates the backend, preprocessing, and postprocessing logic

Infernet Configuration

The service can be configured as part of the overall Infernet configuration in config.json. For documentation on the overall configuration, consult the Infernet Node documentation

{
    "log_path": "infernet_node.log",
    //...... contents abbreviated
    "containers": [
        {
            "id": "css_inference_service",
            "image": "ritualnetwork/css_inference_service:latest",
            "external": true,
            "port": "3000",
            "allowed_delegate_addresses": [],
            "allowed_addresses": [],
            "allowed_ips": [],
            "command": "--bind=0.0.0.0:3000 --workers=2",
            "env": {
                "OPENAI_API_KEY": "...",
                "PERPLEXITYAI_API_KEY": "...",
                "GOOSEAI_API_KEY": "...",
                "RETRY_PARAMS": "{\"tries\": 3, \"delay\": 3, \"backoff\": 2, \"max_delay\": null, \"jitter\": [0.5, 1.5]}"
            }
        }
    ]
}

Supported Providers

The service supports three providers, each requiring an API key specified as an environment variable:

PERPLEXITYAI_API_KEY - API key for PerplexityAI
GOOSEAI_API_KEY - API key for GooseAI
OPENAI_API_KEY - API key for OpenAI

Environment Variables

OPENAI_API_KEY

Description: The API key for OpenAI (optional)

PERPLEXITYAI_API_KEY

Description: The API key for PerplexityAI (optional)

GOOSEAI_API_KEY

Description: The API key for GooseAI (optional)

RETRY_PARAMS

Description: The retry parameters for the inference workflow.

tries

Description: The number of retries for the inference workflow.
Default: 3

delay

Description: The delay (in seconds) between retries.
Default: 3

max_delay

Description: The maximum delay (in seconds) between retries.
Default: null

backoff

Description: The backoff (in seconds) between retries.
Default: 2

jitter

Description: The jitter (in seconds) to add to requests.
Default: [0.5, 1.5]

Usage

Offchain requests to the service can be initiated with python or cli by utilizing the infernet_client library, as well as with HTTP requests against the Infernet Node directly (using a client like cURL).

The schema format of an infernet_client JobRequest looks like the following:

class JobRequest(TypedDict):
    """Job request.

    Attributes:
        containers: The list of container names.
        data: The data to pass to the containers.
    """

    containers: list[str]
    data: dict[str, Any]
    requires_proof: NotRequired[bool]

Also, the schema format of a infernet_client JobResult looks like the following:

class JobResult(TypedDict):
    """Job result.

    Attributes:
        id: The job ID.
        status: The job status.
        result: The job result.
        intermediate: Job result from intermediate containers.
    """

    id: str
    status: JobStatus
    result: Optional[ContainerOutput]
    intermediate: NotRequired[list[ContainerOutput]]


class ContainerOutput(TypedDict):
    """Container output.

    Attributes:
        container: The container name.
        output: The output of the container.
    """

    container: str
    output: Any

Offchain (web2) Request

Please Note: The examples below assume that you have an Infernet Node running locally on port 4000.

PythonCLIcURL

from infernet_client.node import NodeClient

client = NodeClient("http://127.0.0.1:4000")
job_id = await client.request_job(
    "css_inference_service",
    {
        "provider": "OPENAI",
        "endpoint": "completions",
        "model": "gpt-4",
        "params": {
            "endpoint": "completions",
            "messages": [
                {
                    "role": "user",
                    "content": "give me an essay about cats"
                }
            ]
        },
        "extra_args": {
            "max_tokens": 10,
            "temperature": 0.5
        }
    }
)

result: str = (await client.get_job_result_sync(job_id))["result"]["output"]

# Note that the sync flag is optional and will wait for the job to complete.
# If you do not pass the sync flag, the job will be submitted and you will receive a job id, which you can use to get the result later.
infernet-client job -c css_inference_service -i input.json --sync

where input.json looks like this:

{
    "provider": "OPENAI",
    "endpoint": "completions",
    "model": "gpt-4",
    "params": {
        "endpoint": "completions",
        "messages": [
            {
                "role": "user",
                "content": "give me an essay about cats"
            }
        ]
    },
    "extra_args": {
        "max_tokens": 10,
        "temperature": 0.5
    }
}

curl -X POST http://127.0.0.1:4000/api/jobs \
    -H "Content-Type: application/json" \
    -d '{"containers": ["css_inference_service"], "data": {"model": "gpt-4", "endpoint": "completions", "provider": "OPENAI", "params": {"endpoint": "completions", "messages": [{"role": "user", "content": "give me an essay about cats"}]}}}'

Onchain (web3) Subscription

You will need to import the infernet-sdk in your requesting contract. In this example we showcase the Callback pattern, which is an example of a one-off subscription. Please refer to the infernet-sdk documentation for further details.

Input requests should be passed in as an encoded byte string. Here is an example of how to generate this for a CSS inference request:

from infernet_ml.utils.css_mux import ConvoMessage
from infernet_ml.utils.codec.css import (
    CSSEndpoint,
    CSSProvider,
    encode_css_completion_request,
)

provider = CSSProvider.OPENAI
endpoint = CSSEndpoint.completions
model = "gpt-3.5-turbo-16k"
messages = [
    ConvoMessage(role="user", content="give me an essay about cats")
]

encoded = encode_css_completion_request(provider, endpoint, model, messages)

You then can pass this encoded byte string as an input to the contract function, let's say your contract has a function called getLLMResponse(bytes calldata input):

function getLLMResponse(bytes calldata input) public {
    redundancy = 1;
    paymentToken = address(0);
    paymentAmount = 0;
    wallet = address(0);
    verifier = address(0);
    _requestCompute(
        "my-css-inference-service",
        input,
        redundancy,
        paymentToken,
        paymentAmount,
        wallet,
        verifier
    );
}

You can call this function with the encoded bytestring from python like so:

from web3 import Web3

# Assuming you have a contract instance
contract = w3.eth.contract(address=contract_address, abi=contract_abi)

# Call the function, `encoded` here is the same as the one generated above
tx_hash = contract.functions.getLLMResponse.call(encoded).transact()

Delegated Subscription Request

Please note: The examples below assume that you have an Infernet Node running locally on port 4000.

PythonCLI

from infernet_client.node import NodeClient
from infernet_client.chain_utils import Subscription, RPC

sub = Subscription(
    owner="0x...",
    active_at=int(time()),
    period=0,
    frequency=1,
    redundancy=1,
    containers=["css_inference_service"],
    lazy=False,
    verifier=ZERO_ADDRESS,
    payment_amount=0,
    payment_token=ZERO_ADDRESS,
    wallet=ZERO_ADDRESS,
)

client = NodeClient("http://127.0.0.1:4000")
nonce = random.randint(0, 2**32 - 1)
await client.request_delegated_subscription(
    sub=sub,
    rpc=RPC("http://127.0.0.1:8545")
    coordinator_address=global_config.coordinator_address,
    expiry=int(time() + 10),
    nonce=nonce,
    private_key="0x...",
    data={
        "provider": "OPENAI",
        "endpoint": "completions",
        "model": "gpt-4",
        "params": {
            "endpoint": "completions",
            "messages": [
                {
                    "role": "user",
                    "content": "give me an essay about cats"
                }
            ]
        },
        "extra_args": {
            "max_tokens": 10,
            "temperature": 0.5
        }
    },
)

infernet-client sub --rpc_url http://some-rpc-url.com --address 0x19f...xJ7 --expiry 1713376164 --key key-file.txt \
    --params params.json --input input.json
# Success: Subscription created.

where params.json looks like this:

and where input.json looks like this:

href="#__codelineno-12-1">{ "owner": "0x00Bd138aBD7....................", // Subscription Owner "active_at": 0, // Instantly active "period": 3, // 3 seconds between intervals "frequency": 2, // Process 2 times "redundancy": 2, // 2 nodes respond each time "containers": ["css_inference_service"], // comma-separated list of containers "lazy": false, "verifier": "0x0000000000000000000000000000000000000000", "payment_amount": 0, "payment_token": "0x0000000000000000000000000000000000000000", "wallet": "0x0000000000000000000000000000000000000000", class="p">} href="#__codelineno-13-1">{ "provider": "OPENAI", "endpoint": "completions", "model": "gpt-4", "params": { "endpoint": "completions", "messages": [ {"role": "user", "content": "give me an essay about cats"} ], }, "extra_args": { "max_tokens": 10, "temperature": 0.5, }, class="p">}