Quick Start

This document aims to help new users quickly understand how to deploy inference services in Alauda AI. By deploying a simple "text generation" inference service and experiencing it, you can quickly grasp the main features and usage methods of the platform.

Estimated Reading Time

It is estimated that completing the reading and operations in this document will take approximately 20 minutes.

Notes

This document only demonstrates the basic process. For detailed parameter configurations, please refer to the complete documentation.

Prerequisites

  • You already have a platform administrator account (used to create and manage namespaces).
  • You have prepared the model file to be deployed (you can download it in advance from websites such as Hugging Face or ModelScope).
  • If you need to use GPU inference, please ensure that the GPU plugin is installed. If not, please install the GPU plugin in the platform management plugin center.
  • You understand the basic concepts of Kubernetes and machine learning models.

Step Overview

StepOperationDescriptionNotes
1Create NamespaceCreate a namespace in Alauda AI or the container platformSkip this step if you already have a namespace
2Manage Namespace and Add UserInclude the namespace in Alauda AI management and add users to the namespaceSkip this step if the namespace is already managed and user permissions are configured
3Upload ModelUpload the model file to the model repositorySkip this step if you have already uploaded the model or are using a platform-shared model
4Publish Inference ServicePublish the model as an online inference service
5Invoke Inference ServiceInvoke the inference service via API or the "Experience" feature

Operation Steps

Step 1: Create Namespace

Note:Skip this step if you already have a namespace

Namespaces are the foundation for multi-tenant isolation in Alauda AI, and each project should use an independent namespace. You can create a namespace directly in Alauda AI or import an existing namespace from the container platform.

  1. Enter Alauda AI and switch to Admin View from the top navigation.
  2. Click Namespaces in the left navigation bar.
  3. Click Create Namespace and enter a name, such as "text-classification-demo".
  4. Click Create to complete the namespace creation.

If the namespace already exists in the container platform, click Import Namespace on the Namespaces page and select the namespace to import it into Alauda AI.

Step 2: Manage Namespace and Add User

Note:Skip this step if the namespace is already managed and user permissions are configured

Include the created namespace in Alauda AI management and add users to the namespace:

  1. Enter Alauda AI and switch to Admin View from the top navigation.
  2. Click Namespaces in the left navigation bar.
  3. If the namespace is not managed by Alauda AI, click Import Namespace, select the newly created "text-classification-demo" namespace, and complete the import.
  4. Open the namespace edit page and click Member Management.
  5. Import the user who needs to use this namespace and assign the user one of the following roles:
    • Owner: Can manage the namespace and import other members as editors or viewers. An owner cannot import another owner.
    • Editor: Can use the namespace to create and manage AI resources.
    • Viewer: Can view resources in the namespace.

Step 3: Upload Model

Note:Skip this step if you have already uploaded the model or are using a platform-shared model

Upload the text classification model to the model repository:

  1. Enter Alauda AI, select User View in the top navigation, and select the managed namespace from the previous step.
  2. Click Model Repository in the left navigation bar, click Create Model Repository, and enter the prepared model name, such as "Meta-Llama-3-8B-Instruct".
  3. To complete model uploading, refer to the Create Model Repository.
  4. In the File Management tab, click Update metadata and select the correct "Task Type" and "Framework" according to the attributes of the large model.
    • Task Type: It is an attribute of the model itself and can be obtained by viewing the label on the model download details page. It is divided into "Text Generation", "Image Generation", etc.
    • Framework: It is also an attribute of the model itself and can be obtained by viewing the label on the model download details page. It is divided into "Transformers", "MLflow", etc. Most popular open-source Large Language Models are of the "Transformers" type.

Step 4: Publish Inference Service

Publish the model as an online inference service:

  1. On the model details page, click Publish inference API > Custom publishing.
  2. Configure service parameters:
    • Name: meta-llama-3-8b-service
    • Model: Meta-Llama-3-8B-Instruct
    • Version: Branch-main
    • Inference Runtimes: Needs to be selected based on the cuda version installed in the GPU node. For example,if cuda12.6 or later is installed, select "vllm-cuda12.6-x86".
    • Resource Requests: 2CPU/20Gi Memory
    • Resource Limits: 2CPU/20Gi Memory
    • GPU Acceleration: HAMi NVIDIA
      • gpu number: 1
      • vgpu cores: 50
      • GPU vmemory: 23552
    • Storage: Mount existing PVC/Capacity 10Gi
    • Auto Scaling: Off
    • Number of instances: 1
  3. Click Publish and wait for the service to start.
  4. View the service status on the Inference Services page.

Step 5: Invoke Inference Service

Test the published inference service:

  1. Click Inference Services in the left navigation bar, click the name of the "Published Inference Service", and click Experience on the inference service details page.
  2. Enter the test text, such as "Recommend a few good books".
  3. View the generated text and generation parameters returned by the model.