Note
- Web Console: console.8bit.ws
- Network: 8bit.ws
In this article, I’ll go over how to use the LLM (or GPT) capabilities of 8ws. I’m assuming that you already know how to create a project and a function on a Taubyte-based Cloud Computing Network. If not, please refer to Taubyte’s Documentation.
LLAMA Satellite
Our Cloud Computing Network provides LLM capabilities through what we call a Satellite. It does so by exporting llama.cpp capabilities to the Taubyte Virtual Machine, which powers Serverless Functions (or DFunctions, as per Taubyte’s terminology). The source code for the Satellite can be found here.
LLAMA SDK
Satellites export low-level functions that aren’t very intuitive to use directly. Fortunately, it’s possible to address that with a user-friendly SDK. As of today, we offer a Go SDK. The source code can be found here.
Get Ready
Before proceeding, let’s ensure you have a project and a DFunction ready to go. If not, please refer to “Create a Function”.
Let’s Code!
A good practice is to clone your code locally using git or the tau command-line. Make sure you have Go installed, then run:
go get github.com/samyfodil/taubyte-llama-satellite
Our Basic Function
If you followed the steps from Taubyte’s Documentation, your basic function should look something like this:
package lib
import (
"github.com/taubyte/go-sdk/event"
)
//export ping
func ping(e event.Event) uint32 {
h, err := e.HTTP()
if err != nil {
return 1
}
h.Write([]byte("PONG"))
return 0
}
Let’s modify it so it uses the POST body as the prompt. Note: I’ve changed the function’s name to predict
. Ensure this change is reflected in your configuration by setting the entry point
to predict
and modifying the method
from GET
to POST
.
package lib
import (
"github.com/taubyte/go-sdk/event"
"io"
)
//export predict
func predict(e event.Event) uint32 {
h, err := e.HTTP()
if err != nil {
return 1
}
defer h.Body().Close()
prompt, err := io.ReadAll(h.Body())
if err != nil {
panic(err)
}
return 0
}
Predict
The LLAMA SDK exports two main methods, Predict
and Next
. Let’s start by creating a prediction:
package lib
import (
"github.com/taubyte/go-sdk/event"
"github.com/samyfodil/taubyte-llama-satellite/sdk"
"io"
)
//export predict
func predict(e event.Event) uint32 {
h, err := e.HTTP()
if err != nil {
return 1
}
defer h.Body().Close()
prompt, err := io.ReadAll(h.Body())
if err != nil {
panic(err)
}
p, err := sdk.Predict(
string(prompt),
)
if err != nil {
panic(err)
}
return 0
}
This code will submit a request for a prediction to the satellite, which will put it in a queue because predictions are resource-intensive (especially on the GPU), and return a prediction.
Just like when interacting with any LLM, you can customize the request like so:
p, err := sdk.Predict(
string(prompt),
sdk.WithTopK(90),
sdk.WithTopP(0.86),
sdk.WithBatch(512),
)
You can find all possible options here.
Get Tokens
After submitting a prediction to the satellite, you need to collect tokens. You can do so by calling p.Next()
, which will block until a new token is available or the prediction is completed or canceled. Note that you can use NextWithTimeout
if you’d like to set a deadline.
Now, let’s wrap up our function:
package lib
import (
"github.com/taubyte/go-sdk/event"
"github.com/samyf
odil/taubyte-llama-satellite/sdk"
"io"
)
//export predict
func predict(e event.Event) uint32 {
h, err := e.HTTP()
if err != nil {
return 1
}
defer h.Body().Close()
prompt, err := io.ReadAll(h.Body())
if err != nil {
panic(err)
}
p, err := sdk.Predict(
string(prompt),
sdk.WithTopK(90),
sdk.WithTopP(0.86),
sdk.WithBatch(512),
)
if err != nil {
panic(err)
}
for {
token, err := p.Next()
if err == io.EOF {
break
} else if err != nil {
panic(err)
}
h.Write([]byte(token))
h.Flush() //flush
}
return 0
}
The call to h.Flush()
will send the token to the client (browser) immediately. If you’d like to recreate the AI typing experience provided by ChatGPT, you can use something like:
await axios({
method: "post",
data: prompt,
url: "<URL>",
onDownloadProgress: (progressEvent) => {
const chunk = progressEvent.currentTarget.responseText;
gotToken(chunk);
},
})
Conclusion
In this guide, we’ve walked through how to leverage the LLM (or GPT) capabilities provided by 8ws on a Taubyte-based Cloud Computing Network. We’ve explored the concept of a LLAMA Satellite and its role in exporting LLM capabilities to the Taubyte Virtual Machine. Furthermore, we’ve discussed the importance and functionality of the LLAMA SDK, which makes interacting with the Satellite’s low-level functions more intuitive.
We’ve gone through a practical example of how to use these tools in a Taubyte project, specifically demonstrating how to fetch tokens and use the Predict method. We’ve also shown how you can fine-tune your requests to the SDK and manage the tokens returned by the Satellite. By the end of the guide, you should be equipped to create a serverless function on Taubyte that can generate predictions from user-provided prompts, similar to how AI like ChatGPT works.
Harnessing the power of Taubyte and the LLAMA Satellite, you’re now ready to incorporate large language model capabilities into your projects, bringing a new level of interactivity and AI-driven responses to your applications.
If you’d like to see these tools in action, check out chat.keenl.ink, a practical implementation of the principles outlined in this guide. It’s a great demonstration of the interactive possibilities these technologies provide. Happy coding!