Environment

Keramik can be setup in a local or eks environment.

Local Environment

Requires

When using a local environment, you will need to create a cluster

EKS

Requires

Once these are installed, you will need to login with aws cli via sso

aws configure sso

You will need to use https://3box.awsapps.com/start/ for the sso url with region us-east-2. Use account Benchmarking with role AWSAdministratorAccess. It is recommended to rename the profile to keramik or benchmarking.

You can now find namespaces with

aws eks update-kubeconfig --region=us-east-1 --profile=keramik --name=benchmarking-ceramic

When using an eks environment, you do not need to create a cluster. You will need to setup a network.

Creating a Cluster

Kind (Kubernetes in Docker) runs a local k8s cluster. Create and initialize a new kind cluster using this configuration:

# kind.yaml
---
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
  MaxUnavailableStatefulSet: true

This configuration enables a feature that allows stateful sets to more rapidly redeploy pods on changes. While not required to use keramik it makes deploying and mutating networks significantly faster.

# Create a new kind cluster (i.e. local k8s)
kind create cluster --config kind.yaml

Now you will need to deploy Keramik to the cluster.

Deploy Keramik

To deploy keramik, we will need to deploy custom resource definitions (CRDs) and apply the Keramik operator.

Deploy CRDS

Custom resource definitions tell k8s about our network and simulation resources. When deploying a new cluster and anytime they change you need to apply them:

cargo run --bin crdgen | kubectl apply -f -

Deploy Keramik Operator

The last piece to running Keramik is the operator itself. Apply the operator into the keramik namespace.

# Create keramik namespace
kubectl create namespace keramik
# Apply the keramik operator
kubectl apply -k ./k8s/operator/

Once that is complete, you can now setup a network.

Setting Up a Network

With the operator running we can now define a Ceramic network.

Place the following network definition into the file small.yaml.

# small.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: <unique-name>-small
spec:
  replicas: 2
  # Required if you plan to run a simulation
  monitoring:
    namespaced: true

The <unique-name> can be any unique string, your initials are a good default if you are deploying the network to a cloud cluster.

Apply this network definition to the k8s cluster:

kubectl apply -f small.yaml

After a minute or two you should have a functioning Ceramic network.

Checking the status of the network

Check the status of the network:

export NETWORK_NAME=<unique-name>-small
kubectl describe network $NETWORK_NAME

Keramik places each network into its own namespace named after the name of the network. You can default your context to this namespace using:

kubectl config set-context --current --namespace=keramik-$NETWORK_NAME

Inspect the pods within the network using:

kubectl get pods

HINT: Use tools like k9s to interactively manage your network.

When your pods are ready, you can run a simulation. If you are running locally, be patient as the first time you setup a network you will need to download several images.

HINT: Use tools like kubectx or kubie to work with multiple namespaces and contexts.

When you're finished, you can tear down your network with the following command:

kubectl delete network $NETWORK_NAME

Simulation

To run a simulation, first define a simulation. Available simulation types are

  • ipfs-rpc - A simple simulation that writes and reads to IPFS
  • ceramic-simple - A simple simulation that writes and reads events to two different streams, a small and large model
  • ceramic-write-only - A simulation that only performs updates on two different streams
  • ceramic-new-streams - A simulation that only creates new streams
  • ceramic-model-reuse - A simulation that reuses the same model and queries instances across workers
  • recon-event-sync - A simulation that creates events for Recon to sync at a fixed rate (~300/s by default). Designed for a 2 node network but should work on any.
  • cas-benchmark - A simulation that benchmarks the CAS network.
  • cas-anchoring-benchmark - A simulation that benchmarks the Ceramic with anchoring enabled.

Using one of these scenarios, we can then define the configuration for that scenario:

# basic.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Simulation
metadata:
  name: basic
  # Must be the same namespace as the network to test
  namespace: keramik-<unique-name>-small
spec:
  scenario: ceramic-simple
  devMode: true # optional to remove container resource limits and requirements for local benchmarking
  users: 10
  runTime: 4

If you want to run it against a defined network, set the namespace to the same as the network. in this example the namespace is set to the same network applied when the network was setup. Additionally, you can define the scenario you want to run, the number of users to run for each node, and the number of minutes it will run.

Before running the simulation make sure the network is ready and has monitoring enabled.

kubectl describe network <unique-name>-small

You should see that the number of Ready Replicas is the same as the Replicas. Example simplified output of a ready network:

Name:         nc-small
...
  Ready Replicas:  2
  Replicas:        2
...

Once ready, apply this simulation defintion to the k8s cluster:

kubectl apply -f basic.yaml

Keramik will first start all the metrics and tracing resources, once ready it will start the simulation by first starting the simulation manager and then all the workers. The manager and workers will stop once the simulation is complete.

You can then analyze the results of the simulation.

If you want to rerun a simulation with no changes, you can delete the simulation and reapply it.

kubectl delete -f basic.yaml

Simulating Specific Versions

Often you will want to run a simulation against a specific version of software. To do this you will need to build the image and configure your network to run that image.

Example Custom JS-Ceramic Image

Use this example network definition with a custom js-ceramic image.

# custom-js-ceramic.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: custom-js-ceramic
spec:
  replicas: 2
  monitoring:
    namespaced: true
  ceramic:
    - image: ceramicnetwork/composedb:dev
      imagePullPolicy: IfNotPresent
kubectl apply -f custom-js-ceramic.yaml

You can also run mixed networks and various other advanced configurations.

Example Custom IPFS Image

Use this example network definition with a custom IPFS image.

# custom-ipfs.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: custom-ipfs
spec:
  replicas: 2
  monitoring:
    namespaced: true
  ceramic:
    - ipfs:
        rust:
          image: ceramicnetwork/rust-ceramic:dev
          imagePullPolicy: IfNotPresent
kubectl apply -f custom-ipfs.yaml

Example Custom CAS Api Url Network Spec

Use this example in the network definition while using cas-benchmark or cas-anchoring-benchmark. This is specifically for testing against the CAS dev network.

# custom-cas-api.yaml
---
apiVersion: keramik.3box.io/v1alpha1
kind: Network
metadata:
  name: ceramic-benchmark
spec:
  ceramic:
    - env:
        CERAMIC_RECON_MODE: "true"
      ipfs:
        rust:
          env:
            CERAMIC_ONE_RECON: "true"
  casApiUrl: https://cas-dev-direct.3boxlabs.com
  networkType: dev-unstable
  privateKeySecret: ceramic-v4-dev
  ethRpcUrl: ""
kubectl apply -f custom-cas-api.yaml

Example Custom Simulation for Ceramic Anchoring Benchmark

Use this example to run a simulation which uses the CAS Api defined in the network spec. anchorWaitTime: Wait time in seconds for how long we want to wait after streams have been created to check when they have been anchored. This should be a high number like 30-40 minutes. throttleRequests: Number of requests to send per second.

# ceramic-anchoring-benchamrk.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Simulation
metadata:
  name: basic
  # Must be the same namespace as the network to test
  namespace: keramik-ceramic-benchmark
spec:
  scenario: ceramic-anchoring-benchmark
  users: 16
  runTime: 60
  throttleRequests: 100
  anchorWaitTime: 2400
kubectl apply -f ceramic-anchoring-benchamrk.yaml

Example Custom Simulation for cas-benchmark

Use this example to run a simulation you can pass in the the cas-api-url, the network-type, and the private secret key in the spec. By default the casNetwork and casController are set to run against cas-dev-direct Api.

casNetwork: The url of the CAS network to run the simulation against.

casController: The private key of the controller DID to use for the simulation.

# cas-benchmark.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Simulation
metadata:
  name: basic
  # Must be the same namespace as the network to test
  namespace: keramik-ceramic-benchmark
spec:
  scenario: ceramic-anchoring-benchmark
  users: 16
  runTime: 60
  throttleRequests: 100
  casNetwork: "https://cas-dev-direct.3boxlabs.com"
  casController: "did:key:<secret>"
kubectl apply -f cas-benchmark.yaml

Analysis

Analysis of Keramik results depends on the purpose of the simulation. You may want to just see average latencies, or dive deeper into reported metrics. For profiling, you will want to use datadog.

Quick Log Analysis

The simulation manager provides a very quick way to analyze the logs of a simulation run. You will need to know the name of the manager pod though. You will first need to see if the simulate-manager pod has completed, by running

kubectl get pods

If the pod has completed and is no longer in that list, you can see recently terminated pods using:

kubectl get event -o custom-columns=NAME:.metadata.name | cut -d "." -f1

Once you have the name of the manager, you can retrieve its logs

kubectl logs simulate-manager-<id>

If the simulate-manager pod is not in your pod list, you may need to get logs with the --previous flag:

kubectl logs --previous simulate-manager-<id>

Analysis with DuckDB or Jupyter

First you will need to install a few things:

pip install duckdb duckdb-engine pandas jupyter jupysql matplotlib

To analyze the results of a simulation first copy the metrics-TIMESTAMP.parquet file from the otel-0 pod. First restart opentelemetry-0 pod so it writes out the parquet file footer.

kubectl delete pod opentelemetry-0
kubectl wait --for=condition=Ready pod/opentelemetry-0 # make sure pod has restarted
kubectl exec opentelemetry-0 -- ls -la /data # List files in the directly find the TIMESTAMP you need
kubectl cp opentelemetry-0:data/metrics-TIMESTAMP.parquet ./analyze/metrics.parquet
cd analyze

Use duckdb to examine the data:

duckdb
> SELECT * FROM 'metrics.parquet' LIMIT 10;

Alternatively start a jupyter notebook using analyze/sim.ipynb:

jupyter notebook

Comparing Simulation Runs

How do we conclude a simulation is better or worse that another run?

Each simulation will likely be targeting a specific result however there are common results we should expect to see.

Changes should not make correctness worse. Correctness is defined using two metrics:

  • Percentage of events successfully persisted on the node that accepted the initial write.
  • Percentage of events successfully replicated on nodes that observed the writes via the Ceramic protocol.

Changes should not make performance worse. Performance is defined using these metrics:

  • Writes/sec across all nodes in the cluster and by node
  • p50,p90,p95,p99 and p99.9 of the duration of writes across all nodes in the cluster and by node
  • Success/failure ratio of writes requests across all nodes in the cluster and by node
  • p50,p90,p95,p99 and p99.9 of duration of time to become replicated. The time from when one node accepts the write to when another node has the same write available for read.

For any simulation of the Ceramic protocol these metrics should apply. Any report about the results of a simulation should include these metrics and we compare them against the established a baseline.

Performance Analysis

In addition to the above, we can also use datadog to dive further into performance.

Datadog

Keramik can also be configured to send metrics and telemetry data to datadog.

You will first need to setup a barebones network that we can install the datadog operator into. An example barebones network from the above setup:

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: <name of network>
spec:
  replicas: 1
  datadog:
    enabled: true
    version: "unique_value"
    profilingEnabled: true

You will need to install the datadog k8s operator into the network. This requires installing helm, there doesn't seem to be any other way to install the operator without first installing helm. However once the datadog operator is installed helm is no longer needed.

helm repo add datadog https://helm.datadoghq.com
helm install my-datadog-operator datadog/datadog-operator

Now we will use that barebones network to setup secrets for datadog, and the datadog agent. Adjust the previously defined network definition to look like the following:

# Network setup
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: small
spec:
  replicas: 2
  datadog:
    enabled: true
    version: "unique_value"
    profilingEnabled: true
    
# Secrets Setup
---
apiVersion: v1
kind: Secret
metadata:
  name: datadog-secret
type: Opaque
stringData:
  api-key: <Datadog API Key Secret>
  app-key: <Datadog Application Key Secret>
    
# Datadog Agent setup
---
kind: DatadogAgent
apiVersion: datadoghq.com/v2alpha1
metadata:
  name: datadog
spec:
  global:
    kubelet:
      tlsVerify: false
    site: us3.datadoghq.com
    credentials:
      apiSecret:
        secretName: datadog-secret
        keyName: api-key
      appSecret:
        secretName: datadog-secret
        keyName: app-key
  override:
    clusterAgent:
      image:
        name: gcr.io/datadoghq/cluster-agent:latest
    nodeAgent:
      image:
        name: gcr.io/datadoghq/agent:latest
  features:
    npm:
      enabled: true
    apm:
      enabled: true
      hostPortConfig:
        enabled: true

The Datadog API Key is found at the organization level, and should be the secret associated with the API Key. The Datadog application key can be found at the organization or user level, and should be the secret associated with the application key.

You can now apply this with

kubectl apply -f network.yaml

Note If you are running locally, you will need to restart your CAS and Ceramic pods using

kubectl delete pod ceramic-0 ceramic-1 cas-0

where the ceramic pods will depend on the replicas used. Make sure you delete all Ceramic and CAS pods. This only needs to be done the

Anytime you need to change the network, change this file, then reapply it with

kubectl apply -f network.yaml

Telemetry data sent to datadog will have two properties to uniquely identifiy the data from other keramik networks.

  • env - this is set based on the namespace of the keramik network.
  • version - specified in the datadog config, may be any unique value.

Cleanup

kubectl delete -f network.yaml
helm delete my-datadog-operator

Developing Keramik

When you need to add features to Keramik networks or simulations you will need to run local builds of the operator and the runner.

  • Operator - long lived process that manages the network custom resource.
  • Runner - short lived process that performs various tasks within the network (i.e. bootstrapping)

Operator

The operator automates creating and manipulating networks via custom resource definition. Any changes to the operator require that you rebuild it and load it into kind again.

docker buildx build --load -t keramik/operator:dev --target operator .
kind load docker-image keramik/operator:dev

Now we need to update the k8s operator definition to use our new image:

Edit ./k8s/operator/kustomization.yaml to use the dev tag

images:
  - name: keramik/operator
    newTag: dev

Edit ./k8s/operator/manifests/operator.yaml to use IfNotPresent for the imagePullPolicy.

# ...
      containers:
      - name: keramik-operator
        image: "keramik/operator"
        imagePullPolicy: IfNotPresent
# ...

Update the CRD definitions and apply the Keramik operator:

cargo run --bin crdgen | kubectl apply -f -
kubectl apply -k ./k8s/operator/

See the operator background for details on certain design patterns of the operator.

Runner

The runner is a utility for running various jobs to initialize the network and run workloads against it. Currently the runner provides two utilites:

  • Bootstrap nodes
  • Run simulations

If you intend to develop either of these features you will need to build the runner image and configure your network or simulation to use your local image.

Build and Load the Runner Image

The runner is a utility for running various jobs to initialize the network and run workloads against it. Any changes to the runner require that you rebuild it and load it into kind again.

docker buildx build --load -t keramik/runner:dev --target runner .
kind load docker-image keramik/runner:dev

Setup network with Runner Image

To use a custom runner image when you setup your network, you will need to adjust the yaml you use to specify how to bootstrap the runner.

# small.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: small
spec:
  replicas: 2
  # Use custom runner image for bootstrapping
  bootstrap:
    image: keramik/runner:dev
    imagePullPolicy: IfNotPresent

Setup simulation with Runner Image

You will also need to specify the image in your simulation yaml.

# Custom runner
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Simulation
metadata:
  name: basic
  namespace: keramik-small
spec:
  scenario: ceramic-simple
  users: 10
  runTime: 4
  image: keramik/runner:dev
  imagePullPolicy: IfNotPresent

Setup Load Generator with the runner image

# Custom load generator
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: LoadGenerator
metadata:
  name: load-gen
  namespace: keramik-lgen-demo
spec:
  scenario: "CreateModelInstancesSynced"
  runTime: 3
  image: "keramik/runner:dev"
  imagePullPolicy: "IfNotPresent"
  throttleRequests: 20
  tasks: 2

Advanced Topics

For more advanced usage of keramik, please see

Advanced CAS and Ceramic Configuration

By default, Keramik will instantiate all the resources required for a functional CAS service, including a Ganache blockchain.

You can configure the Ceramic nodes to use an external instance of the CAS instead of one inside the cluster. If using a CAS running in 3Box Labs infrastructure, you will also need to specify the Ceramic network type associated with the node, e.g. dev-unstable.

You may also specify an Ethereum RPC endpoint for the Ceramic nodes to be able to verify anchors, or set it to an empty string to clear it from the Ceramic configuration. In the latter case, the Ceramic nodes will come up but will not be able to verify anchors.

If left unspecified, networkType will default to local, ethRpcUrl to http://ganache:8545, and casApiUrl to http://cas:8081. These defaults point to an internal CAS using a local pubsub topic in a fully isolated network.

Additionally IPFS can be configured with custom images and resources for both CAS and Ceramic.

# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: small
spec:
  replicas: 2
  privateKeySecret: "small"
  networkType: "dev-unstable"
  ethRpcUrl: ""
  casApiUrl: "https://some-anchor-service.com"

Adjusting Ceramic Environment

Ceramic environment can be adjusted by specifying environment variables in the network configuration

# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: small
spec:
  replicas: 2
  ceramic:
    - env:
        CERAMIC_PUBSUB_QPS_LIMIT: "500"

Disabling AWS Functionality

Certain functionality in CAS depends on AWS services. If you are running Keramik in a non-AWS environment, you can disable this by editing the statefulset for CAS

kubectl edit statefulsets cas

and adding the following environment variables to the spec/template/spec/containers/env config:

- name: SQS_QUEUE_URL
  value: ""
- name: MERKLE_CAR_STORAGE_MODE
  value: disabled

Note statefulsets must be edited every time the network is recreated.

Image Resources

Storage

Nearly all containers (monitoring outstanding), allow configuring the peristent storage size and class. The storage class must be created out of band, but can be included. The storage configuration has two keys (size and class) and can be used like so:

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: small
spec:
  replicas: 2
  bootstrap: 
    image: keramik/runner:dev
    imagePullPolicy: IfNotPresent
  cas:
    casStorage:
      size: "3Gi"
      class: "fastDisk" # typically not set
    ipfs:
      go:
        storage:
          size: "1Gi"
    ganacheStorage:
      size: "1Gi"
    postgresStorage:
      size: "3Gi"
    localstackStorage:
      size: "5Gi"
  ceramic:
    - ipfs:
        rust: 
          storage:
            size: "3Gi"

Requests / Limits

During local benchmarking, you may not have enough resources to run the cluster. A simple "fix" is to use the devMode flag on the network and simulation specs. This will override the resource requests and limits values to be none, which means it doesn't need available resources to deploy, and can consume as much as it desires. This would be problematic in production and should only be used for testing purposes.

# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: small
spec:
  replicas: 2
  devMode: true # ceramic will require specified resources but all other containers will be unconstrained
  ceramic:
    - resourceLimits:
        cpu: "1"
        memory: "1Gi"
        storage: "1Gi"
# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: small
spec:
  replicas: 2
  ceramic:
    - resourceLimits:
        cpu: "4"
        memory: "8Gi"
        storage: "2Gi"

The above yaml will provide each ceramic pod with 4 cpu cores, 8GB of memory, and 2GB of storage. Dependent on the system you are running on you may run out of resources. You can check your resource usage with

kubectl describe nodes

You can also set resources for IPFS within ceramic similarly.

# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: small
spec:
  replicas: 2
  ceramic:
    - ipfs:
       go:
         resourceLimits:
           cpu: "4"
           memory: "8Gi"
           storage: "2Gi"
         storageClass: "fastDisk"

Additionally the storage class can be set. The storage class must be created out of band but can be referenced as above.

Setting resources for CAS is slightly different, using casResourceLimits to set CAS resources

# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: small
spec:
  replicas: 2
  cas:
    image: ceramicnetwork/ceramic-anchor-service:latest
    casResourceLimits:
      cpu: "250m"
      memory: "1Gi"

CAS API Configuration

The CAS API environment variables can be set or overridden through the network configuration.

# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: small
spec:
  replicas: 0
  cas:
    api:
      env:
        APP_PORT: "8080"

Enabling Recon

You can also use Recon for reconciliation by setting 'CERAMIC_ONE_RECON' env variable to true.

# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: small
spec:
  replicas: 2
  ceramic:
    - ipfs:
        rust:
          env:
            CERAMIC_ONE_RECON: "true"

Monitoring

You can enable monitoring on a network to deploy jaeger, prometheus and an opentelemetry collector into the network namespace. This is not the only way to monitor network resources but it is built in.

Metrics from all pods in the network will be collected.

Sample network resource with monitoring enabled.

# basic.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: network-with-monitoring
spec:
  replicas: 2
  monitoring:
    namespaced: true
    podMonitor: true

To view the metrics and traces port-forward the services:

kubectl port-forward prometheus-0 9090
kubectl port-forward jaeger-0 16686

Then navigate to http://localhost:9090 for metrics and http://localhost:16686 for traces.

Exposed Metrics

The opentelemetry collector exposes metrics on two different ports under the otel service:

  • otel:9464 - All metrics collected
  • otel:9465 - Only simulation metrics

Simulations will publish specific summary metrics about the simulation run. This is typically a collection of metrics per simulation run and is much lighter weight than all metrics from all pods in the network.

Scrape the otel:9465 endpoint if you want on the simulation metrics.

NOTE: The prometheus-0 pod will scrape all metrics so you can easily inspect all activity on the network.

Pod Monitoring

This option expects the PodMonitor custom resource definition to already be installed in the network namespace.

If podMonitor is enabled, the operator will create podmonitors.monitoring.coreos.com resources for collecting the metrics from the pods in the network.

If you're using something like the grafana cloud agent, or prometheus-operator, the podmonitors.monitoring.coreos.com will be installed already.

You can install the CRD directly from the operator:

    kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml

IPFS

The IPFS behavior used by CAS and Ceramic can be customized using the same IPFS spec.

Rust IPFS

Ceramic

Example network config that uses Rust based IPFS (i.e. ceramic-one) with its defaults for Ceramic.

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: example-vanilla-ceramic-one
spec:
  replicas: 5
  ceramic:
    - ipfs:
        rust: {}

Example network config that uses Rust based IPFS (i.e. ceramic-one) with a specific image for Ceramic.

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: example-custom-ceramic-one
spec:
  replicas: 5
  ceramic:
    - ipfs:
       rust:
         image: rust-ceramic/ceramic-one:dev
         imagePullPolicy: IfNotPresent

CAS

Example network config that uses Rust based IPFS (i.e. ceramic-one) with a specific image for CAS.

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: example-vanilla-ceramic-one
spec:
  replicas: 5
  cas:
    ipfs:
      rust: {}

Example network config that uses Rust based IPFS (i.e. ceramic-one) with a specific image for CAS.

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: example-custom-ceramic-one
spec:
  replicas: 5
  cas:
    ipfs:
     rust:
       image: rust-ceramic/ceramic-one:dev
       imagePullPolicy: IfNotPresent

Kubo IPFS

Ceramic

Example network config that uses Go based IPFS (i.e. Kubo) with its defaults for Ceramic.

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: example-vanilla-kubo
spec:
  replicas: 5
  ceramic:
    - ipfs:
        go: {}

Example network config that uses Go based IPFS (i.e. Kubo) with a specific image for Ceramic.

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: example-custom-kubo
spec:
  replicas: 5
  ceramic:
    - ipfs:
       go:
         image: ceramicnetwork/go-ipfs-daemon:develop
         imagePullPolicy: IfNotPresent

Example network config that uses Go based IPFS (i.e. Kubo) with extra configuration commands for Ceramic.

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: example-custom-kubo
spec:
  replicas: 5
  ceramic:
    - ipfs:
       go:
         image: ceramicnetwork/go-ipfs-daemon:develop
         imagePullPolicy: IfNotPresent
         commands:
           - ipfs config --json Swarm.RelayClient.Enabled false

CAS

Example network config that uses Go based IPFS (i.e. Kubo) with its defaults for CAS.

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: example-vanilla-kubo
spec:
  replicas: 5
  cas:
    ipfs:
      go: {}

Example network config that uses Go based IPFS (i.e. Kubo) with a specific image for CAS.

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: example-custom-kubo
spec:
  replicas: 5
  cas:
    ipfs:
     go:
       image: ceramicnetwork/go-ipfs-daemon:develop
       imagePullPolicy: IfNotPresent

Example network config that uses Go based IPFS (i.e. Kubo) with extra configuration commands for CAS.

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: example-custom-kubo
spec:
  replicas: 5
  cas:
    ipfs:
     go:
       image: ceramicnetwork/go-ipfs-daemon:develop
       imagePullPolicy: IfNotPresent
       commands:
         - ipfs config --json Swarm.RelayClient.Enabled false

Migration from Kubo to Ceramic One

A Kubo blockstore can be migrated to Ceramic One by specifying the migration command in the IPFS configuration.

Example network config that uses Go based IPFS (i.e. Kubo) with its defaults for Ceramic (including a default blockstore path of /data/ipfs) and the Ceramic network set to dev-unstable.

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: basic-network
spec:
  replicas: 5
  ceramic:
    - ipfs:
        go: {}
  networkType: dev-unstable

Example network config that uses Ceramic One and specifies what migration command to run before starting up the node.

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: basic-network
spec:
  replicas: 5
  ceramic:
    - ipfs:
        rust:
            migrationCmd:
                - from-ipfs
                - -i
                - /data/ipfs/blocks
                - -o
                - /data/ipfs/
                - --network
                - dev-unstable

Mixed Networks

It is possible to configure multiple sets of Ceramic nodes that different from one another. For example a network where half of the nodes are running a different version of js-ceramic or IPFS.

Examples

Mixed IPFS

The following config creates a network with half of the nodes running Rust based IPFS and the other half Go.

---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: mixed
spec:
  replicas: 5
  ceramic:
    - ipfs:
        rust: {}
    - ipfs:
        go: {}

Mixed js-ceramic

The following config creates a network with half of the nodes running a dev-0 of js-ceramic and the other half dev-1.

---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: mixed
spec:
  replicas: 5
  ceramic:
    - image: ceramicnetwork/composedb:dev-0
    - image: ceramicnetwork/composedb:dev-1

Weights

Weights can be used to determine how many replicas of each Ceramic spec are created. The total network replicas are spread across each Ceramic spec according to its relative weight.

The default weight is 1. The simplist way to get exact replica counts is to have the weights sum to the replica count. Then each Ceramic spec will have a number of replicas equal to its weight. However it can be tedious to ensure weights always add up to the replica count so this is not required.

The total replicas all Ceramic specs will always sum to the configured replica count. As such some rounding will be applied to get a good approximation of the relative weights.

Examples

Create 2/3rd nodes with dev-0 and 1/3rd with dev-1.

---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: mixed
spec:
  replicas: 3
  ceramic:
    - weight: 2
      image: ceramicnetwork/composedb:dev-0 # 2 replicas
    - image: ceramicnetwork/composedb:dev-1 # 1 replica

Create 3/4ths nodes with dev-0 and 1/4rd with dev-1.

---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: mixed
spec:
  replicas: 24
  ceramic:
    - weight: 3
      image: ceramicnetwork/composedb:dev-0 # 18 replicas
    - image: ceramicnetwork/composedb:dev-1 #  6 replicas

Create three different version each having half the previous. In this case weights do not devide evenly so a close approximation is achived.

---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: mixed
spec:
  replicas: 16
  ceramic:
    - weight: 4
      image: ceramicnetwork/composedb:dev-0 # 10 replicas
    - weight: 2
      image: ceramicnetwork/composedb:dev-1 # 4 replicas
    - weight: 1
      image: ceramicnetwork/composedb:dev-2 # 2 replicas

Specifying a Ceramic admin secret

You can choose to specify a private key for the Ceramic nodes to use as their admin secret. This will allow you to set up the corresponding DID with CAS Auth.

Leaving the private key unspecified will cause a new key to be randomly generated. This can be fine for simulation runs against CAS/Ganache running locally within the cluster but not for simulations that hit CAS running behind the AWS API Gateway. Using an unauthorized DID in that case will prevent the Ceramic nodes from starting up.

apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: small
spec:
  replicas: 2
  privateKeySecret: "small"

Note that privateKeySecret is the name of another k8s secret in the keramik namespace that has already been populated beforehand with the desired hex-encoded private key. This source secret MUST exist before it can be used to populate the Ceramic admin secret.

kubectl create secret generic small --from-literal=private-key=0e3b57bb4d269b6707019f75fe82fe06b1180dd762f183e96cab634e38d6e57b

The secret can also be created from a file containing the private key.

kubectl create secret generic small --from-file=private-key=./my_secret

Here's an example of the contents of the my_secret file. Please make sure that there are no newlines at the end of the file.

0e3b57bb4d269b6707019f75fe82fe06b1180dd762f183e96cab634e38d6e57b

Alternatively, you can use a kustomization.yml file to create the secret from a file before creating the network, and using the name of the new secret in the network configuration.

---
namespace: keramik

secretGenerator:
- name: small
  envs:
  - .env.secret

Here's an example of the contents of the .env.secret file.

private-key=0e3b57bb4d269b6707019f75fe82fe06b1180dd762f183e96cab634e38d6e57b

Operator Patterns

This document discusses some of the designs patterns of the operator.

Specs, Statuses, and Configs

The operator is responsible to managing many resources and controlling how those resources can be customized. As a result the operator adopts a specs, statuses and configs pattern.

  • Specs - Defines the desired state.
  • Statuses - Reports the current state.
  • Configs - Custom configuration to control creating a Spec.

Both specs and statuses are native concepts to Kubernetes. A spec provides the user facing API for defining their desired state. A status reports on the actual state. This code base introduces the concept of a config.

Naturally, operators wrap existing specs and hide some of their details. However some of those details should be exposed to the user. A config defines how the parts of a spec owned by the operator can be exposed. In turn the configs themselves have their own specs, i.e. the API into how to customize internal specs of the operator.

For example the bootstrap job requires JobSpec to run the job. The bootstrap job is responsible for telling new peers in the network about existing peers. Exposing the JobSpec to the user puts too much onus on the user to create a functional job. Instead we define a BootstrapSpec, a BootstrapConfig and a function that can create the necessary JobSpec given a BootstrapConfig. The BootstrapSpec is the user API for controlling the bootstrap job. The BootstrapConfig controls which properties of the JobSpec can be customized and provides sane defaults.

Let's see how this plays out in the code. Here is a simplified example of the bootstrap job that allows customizing only the image and bootstrap method:


#![allow(unused)]
fn main() {
// BootstrapSpec defines how the network bootstrap process should proceed.
#[derive(Serialize, Deserialize, Debug, PartialEq, Clone, JsonSchema)]
pub struct BootstrapSpec {
    // Note, both image and method are optional as the user
    // may want to specify only one or the other or both.
    pub image: Option<String>,
    pub method: Option<String>,
}
// BootstrapConfig defines which properties of the JobSpec can be customized.
pub struct BootstrapConfig {
    // Note, neither image nor method are optional as we need
    // valid values in order to build the JobSpec.
    pub image: String,
    pub method: String,
}
// Define clear defaults for the config.
impl Default for BootstrapConfig {
    fn default() -> Self {
        Self {
            image: "public.ecr.aws/r5b3e0r5/3box/keramik-runner".to_owned(),
            method: "ring".to_owned(),
        }
    }
}
// Implement a conversion from the spec to the config applying defaults.
impl From<BootstrapSpec> for BootstrapConfig {
    fn from(value: BootstrapSpec) -> Self {
        let default = Self::default();
        Self {
            image: value.image.unwrap_or(default.image),
            method: value.method.unwrap_or(default.method),
        }
    }
}
// Additionally implement the conversion for the case we the entire spec was left undefined.
impl From<Option<BootstrapSpec>> for BootstrapConfig {
    fn from(value: Option<BootstrapSpec>) -> Self {
        match value {
            Some(spec) => spec.into(),
            None => BootstrapConfig::default(),
        }
    }
}
// Define a function that can produce a JobSpec from a config.
pub fn bootstrap_job_spec(config: impl Into<BootstrapConfig>) -> JobSpec {
    let config: BootstrapConfig = config.into();
    // Define the JobSpec using the config, implementation elided.
}
}

Now for the operator reconcile loop we can simply add the BootstrapSpec spec to the top level NetworkSpec and construct a JobSpec to apply.


#![allow(unused)]
fn main() {
pub struct NetworkSpec {
    pub replicas: i32,
    pub bootstrap: Option<BootstrapSpec>,
    // ...
}

pub async fn reconcile(network: Arc<Network>, cx: Arc<ContextData>) -> Result<Action, Error> {
    // ...

    // Now with a single line we go from user defined spec to complete JobSpec
    let spec: JobSpec = bootstrap_job_spec(network.spec().bootstrap);
    apply_job(cx.clone(), ns, network.clone(), BOOTSTRAP_JOB_NAME, spec).await?;

    // ...
}
}

With this pattern it now becomes easy to add more functionallity to the operator by adding a new field to the config and mapping it to the spec. Additionally by defining the defaults on the config type there is one clear location where defaults are defined and applied, instead of scattering them through the implementation of the spec construction function or elsewhere.

Assembled Nodes

Another pattern the operator leverages is to assemble the set of nodes instead of relying on determistic behaviors to assume information about nodes. Assembly is more robust as it is explicit about node information.

In practice this means the operator produces a keramik-peers config map for each network. The config map contains a key peers.json which is the JSON serialization of all ready peers with their p2p address and rpc address. It is expected that other systems consume that config map in order to learn about peers in the network. The runner does exactly this inorder to bootstrap the network.

Migration Tests

The Rust Ceramic migration tests can be executed against a Keramik network by applying the configuration from /k8s/tests. The network and tests run in the keramik-migration-tests namespace but this can be easily changed.

The URLs of the Ceramic nodes in the network are injected into the test environment so that tests are able to hit the Ceramic API endpoints.

These tests are intended to cover things like Kubo vs. Rust Ceramic API correctness/compatibility, mixed network operation, longevity tests across updates and releases, etc. Eventually, they can be used to run smoke tests, additional e2e tests, etc.

Advanced Bootstrap Configuration

Disable Bootstrap

By default, Keramik will connect all IPFS peers to each other. This can be disabled using specific bootstrap configuration:

# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
  name: small
spec:
  replicas: 2
  bootstrap:
    enabled: false