Environment
Keramik can be setup in a local or eks environment.
Local Environment
Requires
When using a local environment, you will need to create a cluster
EKS
Requires
- eks - https://eksctl.io/introduction/#installation
- aws cli - https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
Once these are installed, you will need to login with aws cli via sso
aws configure sso
You will need to use https://3box.awsapps.com/start/
for the sso url with region us-east-2
. Use account
Benchmarking
with role AWSAdministratorAccess
. It is recommended to rename the profile to keramik
or benchmarking
.
You can now find namespaces with
aws eks update-kubeconfig --region=us-east-1 --profile=keramik --name=benchmarking-ceramic
When using an eks environment, you do not need to create a cluster. You will need to setup a network.
Creating a Cluster
Kind (Kubernetes in Docker) runs a local k8s cluster. Create and initialize a new kind cluster using this configuration:
# kind.yaml
---
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
MaxUnavailableStatefulSet: true
This configuration enables a feature that allows stateful sets to more rapidly redeploy pods on changes. While not required to use keramik it makes deploying and mutating networks significantly faster.
# Create a new kind cluster (i.e. local k8s)
kind create cluster --config kind.yaml
Now you will need to deploy Keramik to the cluster.
Deploy Keramik
To deploy keramik, we will need to deploy custom resource definitions (CRDs) and apply the Keramik operator.
Deploy CRDS
Custom resource definitions tell k8s about our network and simulation resources. When deploying a new cluster and anytime they change you need to apply them:
cargo run --bin crdgen | kubectl apply -f -
Deploy Keramik Operator
The last piece to running Keramik is the operator itself. Apply the operator into the keramik
namespace.
# Create keramik namespace
kubectl create namespace keramik
# Apply the keramik operator
kubectl apply -k ./k8s/operator/
Once that is complete, you can now setup a network.
Setting Up a Network
With the operator running we can now define a Ceramic network.
Place the following network definition into the file small.yaml
.
# small.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: <unique-name>-small
spec:
replicas: 2
# Required if you plan to run a simulation
monitoring:
namespaced: true
The <unique-name>
can be any unique string, your initials are a good default if you are deploying the network to a cloud cluster.
Apply this network definition to the k8s cluster:
kubectl apply -f small.yaml
After a minute or two you should have a functioning Ceramic network.
Checking the status of the network
Check the status of the network:
export NETWORK_NAME=<unique-name>-small
kubectl describe network $NETWORK_NAME
Keramik places each network into its own namespace named after the name of the network. You can default your context to this namespace using:
kubectl config set-context --current --namespace=keramik-$NETWORK_NAME
Inspect the pods within the network using:
kubectl get pods
HINT: Use tools like k9s to interactively manage your network.
When your pods are ready, you can run a simulation. If you are running locally, be patient as the first time you setup a network you will need to download several images.
HINT: Use tools like kubectx or kubie to work with multiple namespaces and contexts.
When you're finished, you can tear down your network with the following command:
kubectl delete network $NETWORK_NAME
Simulation
To run a simulation, first define a simulation. Available simulation types are
ipfs-rpc
- A simple simulation that writes and reads to IPFSceramic-simple
- A simple simulation that writes and reads events to two different streams, a small and large modelceramic-write-only
- A simulation that only performs updates on two different streamsceramic-new-streams
- A simulation that only creates new streamsceramic-model-reuse
- A simulation that reuses the same model and queries instances across workersrecon-event-sync
- A simulation that creates events for Recon to sync at a fixed rate (~300/s by default). Designed for a 2 node network but should work on any.cas-benchmark
- A simulation that benchmarks the CAS network.cas-anchoring-benchmark
- A simulation that benchmarks the Ceramic with anchoring enabled.
Using one of these scenarios, we can then define the configuration for that scenario:
# basic.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Simulation
metadata:
name: basic
# Must be the same namespace as the network to test
namespace: keramik-<unique-name>-small
spec:
scenario: ceramic-simple
devMode: true # optional to remove container resource limits and requirements for local benchmarking
users: 10
runTime: 4
If you want to run it against a defined network, set the namespace to the same as the network. in this example the namespace is set to the same network applied when the network was setup. Additionally, you can define the scenario you want to run, the number of users to run for each node, and the number of minutes it will run.
Before running the simulation make sure the network
is ready and has monitoring enabled.
kubectl describe network <unique-name>-small
You should see that the number of Ready Replicas
is the same as the Replicas
.
Example simplified output of a ready network:
Name: nc-small
...
Ready Replicas: 2
Replicas: 2
...
Once ready, apply this simulation defintion to the k8s cluster:
kubectl apply -f basic.yaml
Keramik will first start all the metrics and tracing resources, once ready it will start the simulation by first starting the simulation manager and then all the workers. The manager and workers will stop once the simulation is complete.
You can then analyze the results of the simulation.
If you want to rerun a simulation with no changes, you can delete the simulation and reapply it.
kubectl delete -f basic.yaml
Simulating Specific Versions
Often you will want to run a simulation against a specific version of software. To do this you will need to build the image and configure your network to run that image.
Example Custom JS-Ceramic Image
Use this example network
definition with a custom js-ceramic
image.
# custom-js-ceramic.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: custom-js-ceramic
spec:
replicas: 2
monitoring:
namespaced: true
ceramic:
- image: ceramicnetwork/composedb:dev
imagePullPolicy: IfNotPresent
kubectl apply -f custom-js-ceramic.yaml
You can also run mixed networks and various other advanced configurations.
Example Custom IPFS Image
Use this example network
definition with a custom IPFS
image.
# custom-ipfs.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: custom-ipfs
spec:
replicas: 2
monitoring:
namespaced: true
ceramic:
- ipfs:
rust:
image: ceramicnetwork/rust-ceramic:dev
imagePullPolicy: IfNotPresent
kubectl apply -f custom-ipfs.yaml
Example Custom CAS Api Url Network Spec
Use this example in the network definition while using cas-benchmark
or cas-anchoring-benchmark
. This is specifically for testing against the CAS dev network.
# custom-cas-api.yaml
---
apiVersion: keramik.3box.io/v1alpha1
kind: Network
metadata:
name: ceramic-benchmark
spec:
ceramic:
- env:
CERAMIC_RECON_MODE: "true"
ipfs:
rust:
env:
CERAMIC_ONE_RECON: "true"
casApiUrl: https://cas-dev-direct.3boxlabs.com
networkType: dev-unstable
privateKeySecret: ceramic-v4-dev
ethRpcUrl: ""
kubectl apply -f custom-cas-api.yaml
Example Custom Simulation for Ceramic Anchoring Benchmark
Use this example to run a simulation which uses the CAS Api defined in the network spec.
anchorWaitTime
: Wait time in seconds for how long we want to wait after streams have been created to check when they have been anchored. This should be a high number like 30-40 minutes.
throttleRequests
: Number of requests to send per second.
# ceramic-anchoring-benchamrk.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Simulation
metadata:
name: basic
# Must be the same namespace as the network to test
namespace: keramik-ceramic-benchmark
spec:
scenario: ceramic-anchoring-benchmark
users: 16
runTime: 60
throttleRequests: 100
anchorWaitTime: 2400
kubectl apply -f ceramic-anchoring-benchamrk.yaml
Example Custom Simulation for cas-benchmark
Use this example to run a simulation you can pass in the the cas-api-url, the network-type, and the private secret key in the spec. By default the casNetwork and casController are set to run against cas-dev-direct Api.
casNetwork
: The url of the CAS network to run the simulation against.
casController
: The private key of the controller DID to use for the simulation.
# cas-benchmark.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Simulation
metadata:
name: basic
# Must be the same namespace as the network to test
namespace: keramik-ceramic-benchmark
spec:
scenario: ceramic-anchoring-benchmark
users: 16
runTime: 60
throttleRequests: 100
casNetwork: "https://cas-dev-direct.3boxlabs.com"
casController: "did:key:<secret>"
kubectl apply -f cas-benchmark.yaml
Analysis
Analysis of Keramik results depends on the purpose of the simulation. You may want to just see average latencies, or dive deeper into reported metrics. For profiling, you will want to use datadog.
Quick Log Analysis
The simulation manager provides a very quick way to analyze the logs of a simulation run. You will need to know the name
of the manager pod though. You will first need to see if the simulate-manager
pod has completed, by running
kubectl get pods
If the pod has completed and is no longer in that list, you can see recently terminated pods using:
kubectl get event -o custom-columns=NAME:.metadata.name | cut -d "." -f1
Once you have the name of the manager, you can retrieve its logs
kubectl logs simulate-manager-<id>
If the simulate-manager
pod is not in your pod list, you may need to get logs with the --previous
flag:
kubectl logs --previous simulate-manager-<id>
Analysis with DuckDB or Jupyter
First you will need to install a few things:
pip install duckdb duckdb-engine pandas jupyter jupysql matplotlib
To analyze the results of a simulation first copy the metrics-TIMESTAMP.parquet file from the otel-0 pod. First restart opentelemetry-0 pod so it writes out the parquet file footer.
kubectl delete pod opentelemetry-0
kubectl wait --for=condition=Ready pod/opentelemetry-0 # make sure pod has restarted
kubectl exec opentelemetry-0 -- ls -la /data # List files in the directly find the TIMESTAMP you need
kubectl cp opentelemetry-0:data/metrics-TIMESTAMP.parquet ./analyze/metrics.parquet
cd analyze
Use duckdb to examine the data:
duckdb
> SELECT * FROM 'metrics.parquet' LIMIT 10;
Alternatively start a jupyter notebook using analyze/sim.ipynb
:
jupyter notebook
Comparing Simulation Runs
How do we conclude a simulation is better or worse that another run?
Each simulation will likely be targeting a specific result however there are common results we should expect to see.
Changes should not make correctness worse. Correctness is defined using two metrics:
- Percentage of events successfully persisted on the node that accepted the initial write.
- Percentage of events successfully replicated on nodes that observed the writes via the Ceramic protocol.
Changes should not make performance worse. Performance is defined using these metrics:
- Writes/sec across all nodes in the cluster and by node
- p50,p90,p95,p99 and p99.9 of the duration of writes across all nodes in the cluster and by node
- Success/failure ratio of writes requests across all nodes in the cluster and by node
- p50,p90,p95,p99 and p99.9 of duration of time to become replicated. The time from when one node accepts the write to when another node has the same write available for read.
For any simulation of the Ceramic protocol these metrics should apply. Any report about the results of a simulation should include these metrics and we compare them against the established a baseline.
Performance Analysis
In addition to the above, we can also use datadog to dive further into performance.
Datadog
Keramik can also be configured to send metrics and telemetry data to datadog.
You will first need to setup a barebones network that we can install the datadog operator into. An example barebones network from the above setup:
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: <name of network>
spec:
replicas: 1
datadog:
enabled: true
version: "unique_value"
profilingEnabled: true
You will need to install the datadog k8s operator into the network. This requires
installing helm
, there doesn't seem to be any other way to install the operator
without first installing helm. However once the datadog operator is installed helm is no longer needed.
helm repo add datadog https://helm.datadoghq.com
helm install my-datadog-operator datadog/datadog-operator
Now we will use that barebones network to setup secrets for datadog, and the datadog agent. Adjust the previously defined network definition to look like the following:
# Network setup
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: small
spec:
replicas: 2
datadog:
enabled: true
version: "unique_value"
profilingEnabled: true
# Secrets Setup
---
apiVersion: v1
kind: Secret
metadata:
name: datadog-secret
type: Opaque
stringData:
api-key: <Datadog API Key Secret>
app-key: <Datadog Application Key Secret>
# Datadog Agent setup
---
kind: DatadogAgent
apiVersion: datadoghq.com/v2alpha1
metadata:
name: datadog
spec:
global:
kubelet:
tlsVerify: false
site: us3.datadoghq.com
credentials:
apiSecret:
secretName: datadog-secret
keyName: api-key
appSecret:
secretName: datadog-secret
keyName: app-key
override:
clusterAgent:
image:
name: gcr.io/datadoghq/cluster-agent:latest
nodeAgent:
image:
name: gcr.io/datadoghq/agent:latest
features:
npm:
enabled: true
apm:
enabled: true
hostPortConfig:
enabled: true
The Datadog API Key is found at the organization level, and should be the secret associated with the API Key. The Datadog application key can be found at the organization or user level, and should be the secret associated with the application key.
You can now apply this with
kubectl apply -f network.yaml
Note If you are running locally, you will need to restart your CAS and Ceramic pods using
kubectl delete pod ceramic-0 ceramic-1 cas-0
where the ceramic pods will depend on the replicas used. Make sure you delete all Ceramic and CAS pods. This only needs to be done the
Anytime you need to change the network, change this file, then reapply it with
kubectl apply -f network.yaml
Telemetry data sent to datadog will have two properties to uniquely identifiy the data from other keramik networks.
env
- this is set based on the namespace of the keramik network.version
- specified in the datadog config, may be any unique value.
Cleanup
kubectl delete -f network.yaml
helm delete my-datadog-operator
Developing Keramik
When you need to add features to Keramik networks or simulations you will need to run local builds of the operator and the runner.
- Operator - long lived process that manages the network custom resource.
- Runner - short lived process that performs various tasks within the network (i.e. bootstrapping)
Operator
The operator
automates creating and manipulating networks via custom resource definition.
Any changes to the operator require that you rebuild it and load it into kind again.
docker buildx build --load -t keramik/operator:dev --target operator .
kind load docker-image keramik/operator:dev
Now we need to update the k8s operator definition to use our new image:
Edit ./k8s/operator/kustomization.yaml
to use the dev
tag
images:
- name: keramik/operator
newTag: dev
Edit ./k8s/operator/manifests/operator.yaml
to use IfNotPresent
for the imagePullPolicy
.
# ...
containers:
- name: keramik-operator
image: "keramik/operator"
imagePullPolicy: IfNotPresent
# ...
Update the CRD definitions and apply the Keramik operator:
cargo run --bin crdgen | kubectl apply -f -
kubectl apply -k ./k8s/operator/
See the operator background for details on certain design patterns of the operator.
Runner
The runner
is a utility for running various jobs to initialize the network and run workloads against it.
Currently the runner provides two utilites:
- Bootstrap nodes
- Run simulations
If you intend to develop either of these features you will need to build the runner image and configure your network or simulation to use your local image.
Build and Load the Runner Image
The runner
is a utility for running various jobs to initialize the network and run workloads against it.
Any changes to the runner require that you rebuild it and load it into kind again.
docker buildx build --load -t keramik/runner:dev --target runner .
kind load docker-image keramik/runner:dev
Setup network with Runner Image
To use a custom runner image when you setup your network, you will need to adjust the yaml you use to specify how to bootstrap the runner.
# small.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: small
spec:
replicas: 2
# Use custom runner image for bootstrapping
bootstrap:
image: keramik/runner:dev
imagePullPolicy: IfNotPresent
Setup simulation with Runner Image
You will also need to specify the image in your simulation yaml.
# Custom runner
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Simulation
metadata:
name: basic
namespace: keramik-small
spec:
scenario: ceramic-simple
users: 10
runTime: 4
image: keramik/runner:dev
imagePullPolicy: IfNotPresent
Setup Load Generator with the runner image
# Custom load generator
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: LoadGenerator
metadata:
name: load-gen
namespace: keramik-lgen-demo
spec:
scenario: "CreateModelInstancesSynced"
runTime: 3
image: "keramik/runner:dev"
imagePullPolicy: "IfNotPresent"
throttleRequests: 20
tasks: 2
Advanced Topics
For more advanced usage of keramik, please see
- Advanced CAS and Ceramic Configuration
- Monitoring
- IPFS
- Mixed Networks
- Secrets
- Operator Design
- Migration Tests
- Custom Bootstrap Configuration
Advanced CAS and Ceramic Configuration
By default, Keramik will instantiate all the resources required for a functional CAS service, including a Ganache blockchain.
You can configure the Ceramic nodes to use an external instance of the CAS instead of one inside the cluster. If using a
CAS running in 3Box Labs infrastructure, you will also need to specify the Ceramic network type associated with the
node, e.g. dev-unstable
.
You may also specify an Ethereum RPC endpoint for the Ceramic nodes to be able to verify anchors, or set it to an empty string to clear it from the Ceramic configuration. In the latter case, the Ceramic nodes will come up but will not be able to verify anchors.
If left unspecified, networkType
will default to local
, ethRpcUrl
to http://ganache:8545
,
and casApiUrl
to http://cas:8081
. These defaults point to an internal CAS using a local
pubsub topic in a fully isolated network.
Additionally IPFS can be configured with custom images and resources for both CAS and Ceramic.
# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: small
spec:
replicas: 2
privateKeySecret: "small"
networkType: "dev-unstable"
ethRpcUrl: ""
casApiUrl: "https://some-anchor-service.com"
Adjusting Ceramic Environment
Ceramic environment can be adjusted by specifying environment variables in the network configuration
# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: small
spec:
replicas: 2
ceramic:
- env:
CERAMIC_PUBSUB_QPS_LIMIT: "500"
Disabling AWS Functionality
Certain functionality in CAS depends on AWS services. If you are running Keramik in a non-AWS environment, you can disable this by editing the statefulset for CAS
kubectl edit statefulsets cas
and adding the following environment variables to the spec/template/spec/containers/env
config:
- name: SQS_QUEUE_URL
value: ""
- name: MERKLE_CAR_STORAGE_MODE
value: disabled
Note statefulsets must be edited every time the network is recreated.
Image Resources
Storage
Nearly all containers (monitoring outstanding), allow configuring the peristent storage size and class. The storage class must be created out of band, but can be included. The storage configuration has two keys (size
and class
) and can be used like so:
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: small
spec:
replicas: 2
bootstrap:
image: keramik/runner:dev
imagePullPolicy: IfNotPresent
cas:
casStorage:
size: "3Gi"
class: "fastDisk" # typically not set
ipfs:
go:
storage:
size: "1Gi"
ganacheStorage:
size: "1Gi"
postgresStorage:
size: "3Gi"
localstackStorage:
size: "5Gi"
ceramic:
- ipfs:
rust:
storage:
size: "3Gi"
Requests / Limits
During local benchmarking, you may not have enough resources to run the cluster. A simple "fix" is to use the devMode
flag on the network and simulation specs. This will override the resource requests and limits values to be none, which means it doesn't need available resources to deploy, and can consume as much as it desires. This would be problematic in production and should only be used for testing purposes.
# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: small
spec:
replicas: 2
devMode: true # ceramic will require specified resources but all other containers will be unconstrained
ceramic:
- resourceLimits:
cpu: "1"
memory: "1Gi"
storage: "1Gi"
# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: small
spec:
replicas: 2
ceramic:
- resourceLimits:
cpu: "4"
memory: "8Gi"
storage: "2Gi"
The above yaml will provide each ceramic pod with 4 cpu cores, 8GB of memory, and 2GB of storage. Dependent on the system you are running on you may run out of resources. You can check your resource usage with
kubectl describe nodes
You can also set resources for IPFS within ceramic similarly.
# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: small
spec:
replicas: 2
ceramic:
- ipfs:
go:
resourceLimits:
cpu: "4"
memory: "8Gi"
storage: "2Gi"
storageClass: "fastDisk"
Additionally the storage class can be set. The storage class must be created out of band but can be referenced as above.
Setting resources for CAS is slightly different, using casResourceLimits
to set CAS resources
# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: small
spec:
replicas: 2
cas:
image: ceramicnetwork/ceramic-anchor-service:latest
casResourceLimits:
cpu: "250m"
memory: "1Gi"
CAS API Configuration
The CAS API environment variables can be set or overridden through the network configuration.
# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: small
spec:
replicas: 0
cas:
api:
env:
APP_PORT: "8080"
Enabling Recon
You can also use Recon for reconciliation by setting 'CERAMIC_ONE_RECON' env variable to true.
# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: small
spec:
replicas: 2
ceramic:
- ipfs:
rust:
env:
CERAMIC_ONE_RECON: "true"
Monitoring
You can enable monitoring on a network to deploy jaeger, prometheus and an opentelemetry collector into the network namespace. This is not the only way to monitor network resources but it is built in.
Metrics from all pods in the network will be collected.
Sample network resource with monitoring enabled.
# basic.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: network-with-monitoring
spec:
replicas: 2
monitoring:
namespaced: true
podMonitor: true
To view the metrics and traces port-forward the services:
kubectl port-forward prometheus-0 9090
kubectl port-forward jaeger-0 16686
Then navigate to http://localhost:9090 for metrics and http://localhost:16686 for traces.
Exposed Metrics
The opentelemetry collector exposes metrics on two different ports under the otel
service:
- otel:9464 - All metrics collected
- otel:9465 - Only simulation metrics
Simulations will publish specific summary metrics about the simulation run. This is typically a collection of metrics per simulation run and is much lighter weight than all metrics from all pods in the network.
Scrape the otel:9465
endpoint if you want on the simulation metrics.
NOTE: The prometheus-0 pod will scrape all metrics so you can easily inspect all activity on the network.
Pod Monitoring
This option expects the PodMonitor
custom resource definition to already be installed in the network namespace.
If podMonitor
is enabled, the operator will create podmonitors.monitoring.coreos.com
resources for collecting the metrics from the pods in the network.
If you're using something like the grafana cloud agent, or prometheus-operator, the podmonitors.monitoring.coreos.com
will be installed already.
You can install the CRD directly from the operator:
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml
IPFS
The IPFS behavior used by CAS and Ceramic can be customized using the same IPFS spec.
Rust IPFS
Ceramic
Example network config that uses Rust based IPFS (i.e. ceramic-one) with its defaults for Ceramic.
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: example-vanilla-ceramic-one
spec:
replicas: 5
ceramic:
- ipfs:
rust: {}
Example network config that uses Rust based IPFS (i.e. ceramic-one) with a specific image for Ceramic.
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: example-custom-ceramic-one
spec:
replicas: 5
ceramic:
- ipfs:
rust:
image: rust-ceramic/ceramic-one:dev
imagePullPolicy: IfNotPresent
CAS
Example network config that uses Rust based IPFS (i.e. ceramic-one) with a specific image for CAS.
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: example-vanilla-ceramic-one
spec:
replicas: 5
cas:
ipfs:
rust: {}
Example network config that uses Rust based IPFS (i.e. ceramic-one) with a specific image for CAS.
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: example-custom-ceramic-one
spec:
replicas: 5
cas:
ipfs:
rust:
image: rust-ceramic/ceramic-one:dev
imagePullPolicy: IfNotPresent
Kubo IPFS
Ceramic
Example network config that uses Go based IPFS (i.e. Kubo) with its defaults for Ceramic.
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: example-vanilla-kubo
spec:
replicas: 5
ceramic:
- ipfs:
go: {}
Example network config that uses Go based IPFS (i.e. Kubo) with a specific image for Ceramic.
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: example-custom-kubo
spec:
replicas: 5
ceramic:
- ipfs:
go:
image: ceramicnetwork/go-ipfs-daemon:develop
imagePullPolicy: IfNotPresent
Example network config that uses Go based IPFS (i.e. Kubo) with extra configuration commands for Ceramic.
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: example-custom-kubo
spec:
replicas: 5
ceramic:
- ipfs:
go:
image: ceramicnetwork/go-ipfs-daemon:develop
imagePullPolicy: IfNotPresent
commands:
- ipfs config --json Swarm.RelayClient.Enabled false
CAS
Example network config that uses Go based IPFS (i.e. Kubo) with its defaults for CAS.
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: example-vanilla-kubo
spec:
replicas: 5
cas:
ipfs:
go: {}
Example network config that uses Go based IPFS (i.e. Kubo) with a specific image for CAS.
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: example-custom-kubo
spec:
replicas: 5
cas:
ipfs:
go:
image: ceramicnetwork/go-ipfs-daemon:develop
imagePullPolicy: IfNotPresent
Example network config that uses Go based IPFS (i.e. Kubo) with extra configuration commands for CAS.
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: example-custom-kubo
spec:
replicas: 5
cas:
ipfs:
go:
image: ceramicnetwork/go-ipfs-daemon:develop
imagePullPolicy: IfNotPresent
commands:
- ipfs config --json Swarm.RelayClient.Enabled false
Migration from Kubo to Ceramic One
A Kubo blockstore can be migrated to Ceramic One by specifying the migration command in the IPFS configuration.
Example network config that uses Go based IPFS (i.e. Kubo) with its defaults for Ceramic (including a default
blockstore path of /data/ipfs
) and the Ceramic network set to dev-unstable
.
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: basic-network
spec:
replicas: 5
ceramic:
- ipfs:
go: {}
networkType: dev-unstable
Example network config that uses Ceramic One and specifies what migration command to run before starting up the node.
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: basic-network
spec:
replicas: 5
ceramic:
- ipfs:
rust:
migrationCmd:
- from-ipfs
- -i
- /data/ipfs/blocks
- -o
- /data/ipfs/
- --network
- dev-unstable
Mixed Networks
It is possible to configure multiple sets of Ceramic nodes that different from one another. For example a network where half of the nodes are running a different version of js-ceramic or IPFS.
Examples
Mixed IPFS
The following config creates a network with half of the nodes running Rust based IPFS and the other half Go.
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: mixed
spec:
replicas: 5
ceramic:
- ipfs:
rust: {}
- ipfs:
go: {}
Mixed js-ceramic
The following config creates a network with half of the nodes running a dev-0
of js-ceramic and the other half dev-1
.
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: mixed
spec:
replicas: 5
ceramic:
- image: ceramicnetwork/composedb:dev-0
- image: ceramicnetwork/composedb:dev-1
Weights
Weights can be used to determine how many replicas of each Ceramic spec are created. The total network replicas are spread across each Ceramic spec according to its relative weight.
The default weight
is 1
.
The simplist way to get exact replica counts is to have the weights sum to the replica count.
Then each Ceramic spec will have a number of replicas equal to its weight.
However it can be tedious to ensure weights always add up to the replica count so this is not required.
The total replicas all Ceramic specs will always sum to the configured replica count. As such some rounding will be applied to get a good approximation of the relative weights.
Examples
Create 2/3rd nodes with dev-0
and 1/3rd with dev-1
.
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: mixed
spec:
replicas: 3
ceramic:
- weight: 2
image: ceramicnetwork/composedb:dev-0 # 2 replicas
- image: ceramicnetwork/composedb:dev-1 # 1 replica
Create 3/4ths nodes with dev-0
and 1/4rd with dev-1
.
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: mixed
spec:
replicas: 24
ceramic:
- weight: 3
image: ceramicnetwork/composedb:dev-0 # 18 replicas
- image: ceramicnetwork/composedb:dev-1 # 6 replicas
Create three different version each having half the previous. In this case weights do not devide evenly so a close approximation is achived.
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: mixed
spec:
replicas: 16
ceramic:
- weight: 4
image: ceramicnetwork/composedb:dev-0 # 10 replicas
- weight: 2
image: ceramicnetwork/composedb:dev-1 # 4 replicas
- weight: 1
image: ceramicnetwork/composedb:dev-2 # 2 replicas
Specifying a Ceramic admin secret
You can choose to specify a private key for the Ceramic nodes to use as their admin secret. This will allow you to set up the corresponding DID with CAS Auth.
Leaving the private key unspecified will cause a new key to be randomly generated. This can be fine for simulation runs against CAS/Ganache running locally within the cluster but not for simulations that hit CAS running behind the AWS API Gateway. Using an unauthorized DID in that case will prevent the Ceramic nodes from starting up.
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: small
spec:
replicas: 2
privateKeySecret: "small"
Note that privateKeySecret
is the name of another k8s secret in the keramik
namespace that has already been
populated beforehand with the desired hex-encoded private key. This source secret MUST exist before it can be used to
populate the Ceramic admin secret.
kubectl create secret generic small --from-literal=private-key=0e3b57bb4d269b6707019f75fe82fe06b1180dd762f183e96cab634e38d6e57b
The secret can also be created from a file containing the private key.
kubectl create secret generic small --from-file=private-key=./my_secret
Here's an example of the contents of the my_secret
file. Please make sure that there are no newlines at the end of
the file.
0e3b57bb4d269b6707019f75fe82fe06b1180dd762f183e96cab634e38d6e57b
Alternatively, you can use a kustomization.yml
file to create the secret from a file before creating the network, and
using the name of the new secret in the network configuration.
---
namespace: keramik
secretGenerator:
- name: small
envs:
- .env.secret
Here's an example of the contents of the .env.secret
file.
private-key=0e3b57bb4d269b6707019f75fe82fe06b1180dd762f183e96cab634e38d6e57b
Operator Patterns
This document discusses some of the designs patterns of the operator.
Specs, Statuses, and Configs
The operator is responsible to managing many resources and controlling how those resources can be customized. As a result the operator adopts a specs, statuses and configs pattern.
- Specs - Defines the desired state.
- Statuses - Reports the current state.
- Configs - Custom configuration to control creating a Spec.
Both specs and statuses are native concepts to Kubernetes. A spec provides the user facing API for defining their desired state. A status reports on the actual state. This code base introduces the concept of a config.
Naturally, operators wrap existing specs and hide some of their details. However some of those details should be exposed to the user. A config defines how the parts of a spec owned by the operator can be exposed. In turn the configs themselves have their own specs, i.e. the API into how to customize internal specs of the operator.
For example the bootstrap
job requires JobSpec
to run the job.
The bootstrap job is responsible for telling new peers in the network about existing peers.
Exposing the JobSpec
to the user puts too much onus on the user to create a functional job.
Instead we define a BootstrapSpec
, a BootstrapConfig
and a function that can create the necessary JobSpec
given a BootstrapConfig
.
The BootstrapSpec
is the user API for controlling the bootstrap job.
The BootstrapConfig
controls which properties of the JobSpec
can be customized and provides sane defaults.
Let's see how this plays out in the code. Here is a simplified example of the bootstrap job that allows customizing only the image and bootstrap method:
#![allow(unused)] fn main() { // BootstrapSpec defines how the network bootstrap process should proceed. #[derive(Serialize, Deserialize, Debug, PartialEq, Clone, JsonSchema)] pub struct BootstrapSpec { // Note, both image and method are optional as the user // may want to specify only one or the other or both. pub image: Option<String>, pub method: Option<String>, } // BootstrapConfig defines which properties of the JobSpec can be customized. pub struct BootstrapConfig { // Note, neither image nor method are optional as we need // valid values in order to build the JobSpec. pub image: String, pub method: String, } // Define clear defaults for the config. impl Default for BootstrapConfig { fn default() -> Self { Self { image: "public.ecr.aws/r5b3e0r5/3box/keramik-runner".to_owned(), method: "ring".to_owned(), } } } // Implement a conversion from the spec to the config applying defaults. impl From<BootstrapSpec> for BootstrapConfig { fn from(value: BootstrapSpec) -> Self { let default = Self::default(); Self { image: value.image.unwrap_or(default.image), method: value.method.unwrap_or(default.method), } } } // Additionally implement the conversion for the case we the entire spec was left undefined. impl From<Option<BootstrapSpec>> for BootstrapConfig { fn from(value: Option<BootstrapSpec>) -> Self { match value { Some(spec) => spec.into(), None => BootstrapConfig::default(), } } } // Define a function that can produce a JobSpec from a config. pub fn bootstrap_job_spec(config: impl Into<BootstrapConfig>) -> JobSpec { let config: BootstrapConfig = config.into(); // Define the JobSpec using the config, implementation elided. } }
Now for the operator reconcile loop we can simply add the BootstrapSpec
spec to the top level NetworkSpec
and construct a JobSpec
to apply.
#![allow(unused)] fn main() { pub struct NetworkSpec { pub replicas: i32, pub bootstrap: Option<BootstrapSpec>, // ... } pub async fn reconcile(network: Arc<Network>, cx: Arc<ContextData>) -> Result<Action, Error> { // ... // Now with a single line we go from user defined spec to complete JobSpec let spec: JobSpec = bootstrap_job_spec(network.spec().bootstrap); apply_job(cx.clone(), ns, network.clone(), BOOTSTRAP_JOB_NAME, spec).await?; // ... } }
With this pattern it now becomes easy to add more functionallity to the operator by adding a new field to the config and mapping it to the spec. Additionally by defining the defaults on the config type there is one clear location where defaults are defined and applied, instead of scattering them through the implementation of the spec construction function or elsewhere.
Assembled Nodes
Another pattern the operator leverages is to assemble the set of nodes instead of relying on determistic behaviors to assume information about nodes. Assembly is more robust as it is explicit about node information.
In practice this means the operator produces a keramik-peers
config map for each network.
The config map contains a key peers.json
which is the JSON serialization of all ready peers with their p2p address and rpc address.
It is expected that other systems consume that config map in order to learn about peers in the network.
The runner
does exactly this inorder to bootstrap the network.
Migration Tests
The Rust Ceramic migration tests can be executed against a Keramik network by applying the configuration from
/k8s/tests
. The network and tests run in the keramik-migration-tests
namespace but this can be easily changed.
The URLs of the Ceramic nodes in the network are injected into the test environment so that tests are able to hit the Ceramic API endpoints.
These tests are intended to cover things like Kubo vs. Rust Ceramic API correctness/compatibility, mixed network operation, longevity tests across updates and releases, etc. Eventually, they can be used to run smoke tests, additional e2e tests, etc.
Advanced Bootstrap Configuration
Disable Bootstrap
By default, Keramik will connect all IPFS peers to each other. This can be disabled using specific bootstrap configuration:
# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
name: small
spec:
replicas: 2
bootstrap:
enabled: false