Skip to content

HF Inference Client Service

This service serves models via a HFInferenceClientWorkflow object, encapsulating the backend, preprocessing, and postprocessing logic.

Infernet Configuraton

The service can be configured as part of the overall Infernet configuration in config.json.

{
    "log_path": "infernet_node.log",
    //...... contents abbreviated
    "containers": [
        {
            "id": "hf_inference_client_service",
            "image": "ritualnetwork/hf_inference_client_service:latest",
            "external": true,
            "port": "3000",
            "allowed_delegate_addresses": [],
            "allowed_addresses": [],
            "allowed_ips": [],
            "command": "--bind=0.0.0.0:3000 --workers=2",
            "env": {
                "HF_TOKEN": "hf_token_goes_here"
            }
        }
    ]
}

Supported Tasks

This workflow supports the following Hugging Face task types

class HFTaskId(IntEnum):
    """Hugging Face task types"""

    UNSET = 0
    TEXT_GENERATION = 1
    TEXT_CLASSIFICATION = 2
    TOKEN_CLASSIFICATION = 3
    SUMMARIZATION = 4

Environment Variables

HF_TOKEN - the token to use for authentication with the Hugging Face API

Usage

Inference requests to the service that orginate offchain can be initiated with python or cli by utilizing the infernet_client package, as well as with HTTP requests against the infernet node directly (using a client like cURL).

The schema format of a infernet_client job request looks like the following:

class JobRequest(TypedDict):
    """Job request.

    Attributes:
        containers: The list of container names.
        data: The data to pass to the containers.
    """

    containers: list[str]
    data: dict[str, Any]
    requires_proof: NotRequired[bool]

The schema format of a infernet_client job result looks like the following:

class JobResult(TypedDict):
    """Job result.

    Attributes:
        id: The job ID.
        status: The job status.
        result: The job result.
        intermediate: Job result from intermediate containers.
    """

    id: str
    status: JobStatus
    result: Optional[ContainerOutput]
    intermediate: NotRequired[list[ContainerOutput]]


class ContainerOutput(TypedDict):
    """Container output.

    Attributes:
        container: The container name.
        output: The output of the container.
    """

    container: str
    output: Any

Web2 Request

Please note: the examples below assume that you have an infernet node running locally on port 4000.

from infernet_client.node import NodeClient

client = NodeClient("http://127.0.0.1:4000")
job_id = await client.request_job(
    "SERVICE_NAME",
    {
        # HFTaskId.TEXT_GENERATION
        "task_id": 1,
        "prompt": "What is 2+2?",
    },
)

# result should be "4"
result:str = (await client.get_job_result_sync(job_id))["result"]["output"]

# Note that the sync flag is optional and will wait for the job to complete.
# If you do not pass the sync flag, the job will be submitted and you will receive a job id, which you can use to get the result later.
infernet-client job -c SERVICE_NAME -i input.json --sync
where input.json looks like this:
{
    "task_id": 1,
    "prompt": "What is 2+2?",
}

curl -X POST http://127.0.0.1:4000/api/jobs \
    -H "Content-Type: application/json" \
    -d '{"containers": ["SERVICE_NAME"], "data": {"task_id": 1, "prompt": "What is 2+2?"}}'

Web3 Request (Onchain Subscription)

You will need to import the infernet-sdk in your requesting contract. In this example we showcase the Callback pattern, which is an example of a one-off subscription. Please refer to the infernet-sdk documentation for further details.

Input requests should be passed in as an encoded byte string. Here is an example of how to generate this for a Huggingface Task. In this example we're using the TextGeneration task, while not providing a specific model (Huggingface will use the default model) and prompting the model with a simple math question.

from infernet_ml.utils.hf_types import HFTaskId
from eth_abi.abi import encode

# The first item is the task id, the second item is the model id, and the third item is a prompt.
input_bytes = encode(
    ["uint8", "string", "string"],
    [HFTaskId.TEXT_GENERATION, "", "What's 2 + 2?"],
)

Assuming your contract inherits from the CallbackConsumer provided by infernet-sdk, you can use the following functions to request and receive compute:

import {CallbackConsumer} from "infernet-sdk/contracts/CallbackConsumer.sol";

contract MyContract is CallbackConsumer {
    function doMath(bytes calldata input) public returns (bytes32) {
        _requestCompute(
            containerId,
            input, // same encoded input as above
            1,
            address(0), // paymentToken
            0, // paymentAmount
            address(0), // wallet
            address(0) // verifier
        );
        return generatedTaskId;
    }

    function _receiveCompute(
        uint32 subscriptionId,
        uint32 interval,
        uint16 redundancy,
        address node,
        bytes calldata input,
        bytes calldata output,
        bytes calldata proof,
        bytes32 containerId,
        uint256 index
    ) internal override {
        console2.log("received output!");
        console2.logBytes(output);
    }
}

Or, you can call your container directly from your contract:

import {ContainerLookup} from "infernet-sdk/contracts/ContainerLookup.sol";

contract MyContract {
    function doMath() public returns (bytes32) {
        container.requestCompute(
            "my-container-id",
            abi.encode(0, "", "What's 2+2?"), // same encoded input as above.
            // Here, 0 corresponds to the task id: TEXT_GENERATION
            1,
            address(0), // paymentToken
            0, // paymentAmount
            address(0), // wallet
            address(0) // verifier
        );
    }
}

Delegated Subscription Request

Please note: the examples below assume that you have an infernet node running locally on port 4000.

from infernet_client.node import NodeClient
from infernet_client.chain_utils import Subscription, RPC

sub = Subscription(
    owner="0x...",
    active_at=int(time()),
    period=0,
    frequency=1,
    redundancy=1,
    containers=["SERVICE_NAME"],
    lazy=False,
    verifier=ZERO_ADDRESS,
    payment_amount=0,
    payment_token=ZERO_ADDRESS,
    wallet=ZERO_ADDRESS,
)

client = NodeClient("http://127.0.0.1:4000")
nonce = random.randint(0, 2**32 - 1)
await client.request_delegated_subscription(
    sub=sub,
    rpc=RPC("http://127.0.0.1:8545")
    coordinator_address=global_config.coordinator_address,
    expiry=int(time() + 10),
    nonce=nonce,
    private_key="0x...",
    data={
        "task_id": 1,
        "prompt": "What is 2+2?",
    },
)

infernet-client sub --rpc_url http://some-rpc-url.com --address 0x19f...xJ7 --expiry 1713376164 --key key-file.txt \
    --params params.json --input input.json
# Success: Subscription created.
where params.json looks like this:
{
    "owner": "0x00Bd138aBD7....................", // Subscription Owner
    "active_at": 0, // Instantly active
    "period": 3, // 3 seconds between intervals
    "frequency": 2, // Process 2 times
    "redundancy": 2, // 2 nodes respond each time
    "containers": ["SERVICE_NAME"], // comma-separated list of containers
    "lazy": false,
    "verifier": "0x0000000000000000000000000000000000000000",
    "payment_amount": 0,
    "payment_token": "0x0000000000000000000000000000000000000000",
    "wallet": "0x0000000000000000000000000000000000000000",
}
and where input.json looks like this:
{
"task_id": 1,
    "prompt": "What is 2+2?",
}