CSS (Closed-Source Software) Inference Service
This service serves closed source models via
a CSSInferenceWorkflow
object, which encapsulates the backend, preprocessing, and postprocessing logic
Infernet Configuration
The service can be configured as part of the overall Infernet configuration
in config.json
. For documentation on the overall configuration,
consult the Infernet Node documentation
{
"log_path": "infernet_node.log",
//...... contents abbreviated
"containers": [
{
"id": "css_inference_service",
"image": "ritualnetwork/css_inference_service:latest",
"external": true,
"port": "3000",
"allowed_delegate_addresses": [],
"allowed_addresses": [],
"allowed_ips": [],
"command": "--bind=0.0.0.0:3000 --workers=2",
"env": {
"OPENAI_API_KEY": "...",
"PERPLEXITYAI_API_KEY": "...",
"GOOSEAI_API_KEY": "...",
"RETRY_PARAMS": "{\"tries\": 3, \"delay\": 3, \"backoff\": 2, \"max_delay\": null, \"jitter\": [0.5, 1.5]}"
}
}
]
}
Supported Providers
The service supports three providers, each requiring an API key specified as an environment variable:
PERPLEXITYAI_API_KEY
- API key for PerplexityAIGOOSEAI_API_KEY
- API key for GooseAIOPENAI_API_KEY
- API key for OpenAI
Environment Variables
OPENAI_API_KEY
- Description: The API key for OpenAI (optional)
PERPLEXITYAI_API_KEY
- Description: The API key for PerplexityAI (optional)
GOOSEAI_API_KEY
- Description: The API key for GooseAI (optional)
RETRY_PARAMS
- Description: The retry parameters for the inference workflow.
tries
- Description: The number of retries for the inference workflow.
- Default:
3
delay
- Description: The delay (in seconds) between retries.
- Default:
3
max_delay
- Description: The maximum delay (in seconds) between retries.
- Default:
null
backoff
- Description: The backoff (in seconds) between retries.
- Default:
2
jitter
- Description: The jitter (in seconds) to add to requests.
- Default:
[0.5, 1.5]
Usage
Offchain requests to the service can be initiated with python
or cli
by utilizing the infernet_client
library, as well as with HTTP requests against the Infernet Node directly (using a client
like cURL
).
The schema format of
an infernet_client
JobRequest
looks like the following:
class JobRequest(TypedDict):
"""Job request.
Attributes:
containers: The list of container names.
data: The data to pass to the containers.
"""
containers: list[str]
data: dict[str, Any]
requires_proof: NotRequired[bool]
Also, the schema format of
a infernet_client
JobResult
looks like the following:
class JobResult(TypedDict):
"""Job result.
Attributes:
id: The job ID.
status: The job status.
result: The job result.
intermediate: Job result from intermediate containers.
"""
id: str
status: JobStatus
result: Optional[ContainerOutput]
intermediate: NotRequired[list[ContainerOutput]]
class ContainerOutput(TypedDict):
"""Container output.
Attributes:
container: The container name.
output: The output of the container.
"""
container: str
output: Any
Offchain (web2) Request
Please Note: The examples below assume that you have an Infernet Node running
locally on port 4000
.
from infernet_client.node import NodeClient
client = NodeClient("http://127.0.0.1:4000")
job_id = await client.request_job(
"css_inference_service",
{
"provider": "OPENAI",
"endpoint": "completions",
"model": "gpt-4",
"params": {
"endpoint": "completions",
"messages": [
{
"role": "user",
"content": "give me an essay about cats"
}
]
},
"extra_args": {
"max_tokens": 10,
"temperature": 0.5
}
}
)
result: str = (await client.get_job_result_sync(job_id))["result"]["output"]
# Note that the sync flag is optional and will wait for the job to complete.
# If you do not pass the sync flag, the job will be submitted and you will receive a job id, which you can use to get the result later.
infernet-client job -c css_inference_service -i input.json --sync
where input.json
looks like this:
curl -X POST http://127.0.0.1:4000/api/jobs \
-H "Content-Type: application/json" \
-d '{"containers": ["css_inference_service"], "data": {"model": "gpt-4", "endpoint": "completions", "provider": "OPENAI", "params": {"endpoint": "completions", "messages": [{"role": "user", "content": "give me an essay about cats"}]}}}'
Onchain (web3) Subscription
You will need to import the infernet-sdk
in your requesting contract. In this example
we showcase the Callback
pattern, which is an example of a one-off subscription. Please refer to
the infernet-sdk
documentation for
further details.
Input requests should be passed in as an encoded byte string. Here is an example of how to generate this for a CSS inference request:
from infernet_ml.utils.css_mux import ConvoMessage
from infernet_ml.utils.codec.css import (
CSSEndpoint,
CSSProvider,
encode_css_completion_request,
)
provider = CSSProvider.OPENAI
endpoint = CSSEndpoint.completions
model = "gpt-3.5-turbo-16k"
messages = [
ConvoMessage(role="user", content="give me an essay about cats")
]
encoded = encode_css_completion_request(provider, endpoint, model, messages)
You then can pass this encoded byte string as an input to the contract function, let's
say your contract has a function called getLLMResponse(bytes calldata input)
:
function getLLMResponse(bytes calldata input) public {
redundancy = 1;
paymentToken = address(0);
paymentAmount = 0;
wallet = address(0);
verifier = address(0);
_requestCompute(
"my-css-inference-service",
input,
redundancy,
paymentToken,
paymentAmount,
wallet,
verifier
);
}
You can call this function with the encoded bytestring from python like so:
from web3 import Web3
# Assuming you have a contract instance
contract = w3.eth.contract(address=contract_address, abi=contract_abi)
# Call the function, `encoded` here is the same as the one generated above
tx_hash = contract.functions.getLLMResponse.call(encoded).transact()
Delegated Subscription Request
Please note: The examples below assume that you have an Infernet Node running locally
on port 4000
.
from infernet_client.node import NodeClient
from infernet_client.chain_utils import Subscription, RPC
sub = Subscription(
owner="0x...",
active_at=int(time()),
period=0,
frequency=1,
redundancy=1,
containers=["css_inference_service"],
lazy=False,
verifier=ZERO_ADDRESS,
payment_amount=0,
payment_token=ZERO_ADDRESS,
wallet=ZERO_ADDRESS,
)
client = NodeClient("http://127.0.0.1:4000")
nonce = random.randint(0, 2**32 - 1)
await client.request_delegated_subscription(
sub=sub,
rpc=RPC("http://127.0.0.1:8545")
coordinator_address=global_config.coordinator_address,
expiry=int(time() + 10),
nonce=nonce,
private_key="0x...",
data={
"provider": "OPENAI",
"endpoint": "completions",
"model": "gpt-4",
"params": {
"endpoint": "completions",
"messages": [
{
"role": "user",
"content": "give me an essay about cats"
}
]
},
"extra_args": {
"max_tokens": 10,
"temperature": 0.5
}
},
)
infernet-client sub --rpc_url http://some-rpc-url.com --address 0x19f...xJ7 --expiry 1713376164 --key key-file.txt \
--params params.json --input input.json
# Success: Subscription created.
where params.json
looks like this:
{
"owner": "0x00Bd138aBD7....................", // Subscription Owner
"active_at": 0, // Instantly active
"period": 3, // 3 seconds between intervals
"frequency": 2, // Process 2 times
"redundancy": 2, // 2 nodes respond each time
"containers": ["css_inference_service"], // comma-separated list of containers
"lazy": false,
"verifier": "0x0000000000000000000000000000000000000000",
"payment_amount": 0,
"payment_token": "0x0000000000000000000000000000000000000000",
"wallet": "0x0000000000000000000000000000000000000000",
}
input.json
looks like this: