TensorPool Logo

Documentation

FeaturesPrerequisitesQuick StartCore CommandsInstance TypesCommand ReferenceNFS CommandsStorage LocationsBest PracticesGetting HelpWhy TensorPool?

TensorPool is the easiest way to deploy and manage GPU clusters, at a fraction of the cost of traditional cloud providers.

Features

Zero Cloud Setup: No GCP, no AWS, no Docker, no cloud accounts required

Instant GPU Clusters: Deploy multi-node GPU clusters with a single command

Flexible Storage: Attach and detach NFS volumes across your clusters

>50% cheaper than traditional cloud providers: TensorPool aggregates demand across multiple cloud provides and thus offers GPUs at a fraction of market price. Check out our pricing here

High-Performance Networking: All clusters come with high-speed interconnects for distributed training

Prerequisites

  1. Create an account at tensorpool.dev
  2. Get your API key from the dashboard
  3. Install the CLI:
    pip install tensorpool
  4. Generate SSH keys (if you don't have them):
    ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa

Quick Start

1. Create Your First GPU Cluster

Deploy a single H100 node cluster:

tp cluster create -i ~/.ssh/id_rsa.pub -t 1xH100 --name my-training-cluster

For multi-node training, create a 4-node H100 cluster:

tp cluster create -i ~/.ssh/id_rsa.pub -t 8xH100 -n 4 --name distributed-training

2. List Your Clusters

tp cluster list

3. SSH Into Your Cluster

Once your cluster is ready, you'll receive the connection details. SSH into your nodes and start training!

4. Clean Up

When you're done, destroy your cluster:

tp cluster destroy <cluster_id>

Core Commands

Cluster Management

  • tp cluster create - Deploy a new GPU cluster
  • tp cluster list - View all your clusters
  • tp cluster info <cluster_id> - Get detailed information about a cluster
  • tp cluster destroy <cluster_id> - Terminate a cluster

Network File System (NFS)

  • tp nfs create - Create a new NFS volume
  • tp nfs list - View all your NFS volumes
  • tp nfs attach <storage_id> <cluster_ids> - Attach storage to one or more clusters
  • tp nfs detach <storage_id> <cluster_ids> - Detach storage from one or more clusters
  • tp nfs destroy <storage_id> - Delete an NFS volume

Account Management

  • tp me - View your account information and usage

Supported Instance Types

Instance TypeGPUsGPU Model
1xH1001H100
2xH1002H100
4xH1004H100
8xH1008H100

More instance types coming soon!

Command Reference

Creating Clusters

tp cluster create -i <public_key_path> -t <instance_type> [options]

Required Arguments:

  • -i, --public-key: Path to your public SSH key (e.g., ~/.ssh/id_rsa.pub)
  • -t, --instance-type: Instance type (1xH100, 2xH100, 4xH100, 8xH100)

Optional Arguments:

  • --name: Custom cluster name
  • -n, --num-nodes: Number of nodes (must be 8xH100 instance type for multi-node)

Examples:

# Single node H100
tp cluster create -i ~/.ssh/id_rsa.pub -t 1xH100 --name dev-cluster

# 2-node cluster with 8xH100 each (16 GPUs total)
tp cluster create -i ~/.ssh/id_rsa.pub -t 8xH100 -n 2 --name large-training

Listing Clusters

tp cluster list [--org]

Optional Arguments:

  • --org, --organization: List all clusters in your organization

Destroying Clusters

tp cluster destroy <cluster_id>

Arguments:

  • cluster_id: The ID of the cluster to destroy

NFS Storage Commands

Creating NFS Volumes

tp nfs create -s <size_in_gb> [--name <name>]

Required Arguments:

  • -s, --size: Size of the NFS volume in GB

Optional Arguments:

  • --name: Custom volume name

Examples:

# Create a 500GB volume
tp nfs create -s 500 --name training-data

# Create a 1TB volume with auto-generated name
tp nfs create -s 1000

NFS Storage Example Workflow

# 1. Create a 1TB NFS volume named "shared-datasets"
tp nfs create -s 1000 --name shared-datasets

# 2. Attach the volume to a single cluster
tp nfs attach <storage_id> <cluster_id>

# 3. Attach the volume to multiple clusters
tp nfs attach <storage_id> <cluster_id_1> <cluster_id_2> <cluster_id_3>

# 4. Detach the volume from a single cluster
tp nfs detach <storage_id> <cluster_id>

# 5. Detach the volume from multiple clusters
tp nfs detach <storage_id> <cluster_id_1> <cluster_id_2>

You replace <storage_id> and <cluster_id> with your actual IDs as needed.

Storage Locations (Multi-node Clusters)

Local NVMe Storage

Each cluster node comes with high-performance local NVMe storage mounted at:

/mnt/local

NFS Volume Mount Points

When you attach an NFS volume to your cluster, it will be mounted at:

/mnt/nfs

Convenient Symlinks

For easy access, the storage locations are also symlinked in your home directory:

  • Local storage: ~/local/mnt/local
  • NFS storage: ~/nfs/mnt/nfs

Best Practices

  • SSH Key Management: Always use strong SSH keys and keep your private keys secure
  • Cluster Naming: Use descriptive names for your clusters to easily identify them
  • Cost Management: Destroy clusters when not in use to avoid unnecessary charges
  • Data Persistence: Use NFS volumes for important data that needs to persist across cluster lifecycles
  • Multi-Node Training: For distributed training, ensure your training scripts are configured for multi-node setups
  • Monitoring: Regularly check tp cluster list to monitor your active resources

Getting Help

Why TensorPool?

  • Simplicity: Deploy GPU clusters without the complexity of cloud setup, networking, or quota management
  • Flexibility: Scale from single GPUs to massive multi-node clusters instantly
  • Cost Effective: Aggregated GPU capacity from multiple providers means better pricing
  • Performance: High-speed networking and optimized configurations for ML workloads
  • No Lock-in: Standard SSH access means you can use any tools and frameworks you prefer

Ready to scale your ML training? Get started at tensorpool.dev!