Documentation

Features Prerequisites Quick Start Core Commands Instance Types Command Reference NFS Commands Storage Locations Best Practices Getting Help Why TensorPool?

TensorPool is the easiest way to deploy and manage GPU clusters, at a fraction of the cost of traditional cloud providers.

Features

• Zero Cloud Setup: No GCP, no AWS, no Docker, no cloud accounts required

• Instant GPU Clusters: Deploy multi-node GPU clusters with a single command

• Flexible Storage: Attach and detach NFS volumes across your clusters

• >50% cheaper than traditional cloud providers: TensorPool aggregates demand across multiple cloud provides and thus offers GPUs at a fraction of market price. Check out our pricing here

• High-Performance Networking: All clusters come with high-speed interconnects for distributed training

Prerequisites

Create an account at tensorpool.dev
Get your API key from the dashboard

Install the CLI:

pip install tensorpool

Generate SSH keys (if you don't have them):
```
ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa
```

Quick Start

1. Create Your First GPU Cluster

Deploy a single H100 node cluster:

tp cluster create -i ~/.ssh/id_rsa.pub -t 1xH100 --name my-training-cluster

For multi-node training, create a 4-node H100 cluster:

tp cluster create -i ~/.ssh/id_rsa.pub -t 8xH100 -n 4 --name distributed-training

2. List Your Clusters

tp cluster list

3. SSH Into Your Cluster

Once your cluster is ready, you'll receive the connection details. SSH into your nodes and start training!

4. Clean Up

When you're done, destroy your cluster:

tp cluster destroy <cluster_id>

Core Commands

Cluster Management

tp cluster create - Deploy a new GPU cluster
tp cluster list - View all your clusters
tp cluster info <cluster_id> - Get detailed information about a cluster
tp cluster destroy <cluster_id> - Terminate a cluster

Network File System (NFS)

tp nfs create - Create a new NFS volume
tp nfs list - View all your NFS volumes
tp nfs attach <storage_id> <cluster_ids> - Attach storage to one or more clusters
tp nfs detach <storage_id> <cluster_ids> - Detach storage from one or more clusters
tp nfs destroy <storage_id> - Delete an NFS volume

Account Management

tp me - View your account information and usage

Supported Instance Types

Instance Type	GPUs	GPU Model
`1xH100`	1	H100
`2xH100`	2	H100
`4xH100`	4	H100
`8xH100`	8	H100

More instance types coming soon!

Command Reference

Creating Clusters

tp cluster create -i <public_key_path> -t <instance_type> [options]

Required Arguments:

-i, --public-key: Path to your public SSH key (e.g., ~/.ssh/id_rsa.pub)
-t, --instance-type: Instance type (1xH100, 2xH100, 4xH100, 8xH100)

Optional Arguments:

--name: Custom cluster name
-n, --num-nodes: Number of nodes (must be 8xH100 instance type for multi-node)

Examples:

# Single node H100
tp cluster create -i ~/.ssh/id_rsa.pub -t 1xH100 --name dev-cluster

# 2-node cluster with 8xH100 each (16 GPUs total)
tp cluster create -i ~/.ssh/id_rsa.pub -t 8xH100 -n 2 --name large-training

Listing Clusters

tp cluster list [--org]

Optional Arguments:

--org, --organization: List all clusters in your organization

Destroying Clusters

tp cluster destroy <cluster_id>

Arguments:

cluster_id: The ID of the cluster to destroy

NFS Storage Commands

Creating NFS Volumes

tp nfs create -s <size_in_gb> [--name <name>]

Required Arguments:

-s, --size: Size of the NFS volume in GB

Optional Arguments:

--name: Custom volume name

Examples:

# Create a 500GB volume
tp nfs create -s 500 --name training-data

# Create a 1TB volume with auto-generated name
tp nfs create -s 1000

NFS Storage Example Workflow

# 1. Create a 1TB NFS volume named "shared-datasets"
tp nfs create -s 1000 --name shared-datasets

# 2. Attach the volume to a single cluster
tp nfs attach <storage_id> <cluster_id>

# 3. Attach the volume to multiple clusters
tp nfs attach <storage_id> <cluster_id_1> <cluster_id_2> <cluster_id_3>

# 4. Detach the volume from a single cluster
tp nfs detach <storage_id> <cluster_id>

# 5. Detach the volume from multiple clusters
tp nfs detach <storage_id> <cluster_id_1> <cluster_id_2>

You replace <storage_id> and <cluster_id> with your actual IDs as needed.

Storage Locations (Multi-node Clusters)

Local NVMe Storage

Each cluster node comes with high-performance local NVMe storage mounted at:

/mnt/local

NFS Volume Mount Points

When you attach an NFS volume to your cluster, it will be mounted at:

/mnt/nfs

Convenient Symlinks

For easy access, the storage locations are also symlinked in your home directory:

Local storage: ~/local → /mnt/local
NFS storage: ~/nfs → /mnt/nfs

Best Practices

SSH Key Management: Always use strong SSH keys and keep your private keys secure
Cluster Naming: Use descriptive names for your clusters to easily identify them
Cost Management: Destroy clusters when not in use to avoid unnecessary charges
Data Persistence: Use NFS volumes for important data that needs to persist across cluster lifecycles
Multi-Node Training: For distributed training, ensure your training scripts are configured for multi-node setups
Monitoring: Regularly check tp cluster list to monitor your active resources

Getting Help

Documentation: tensorpool.dev
Community: Join our Discord
Support: team@tensorpool.dev
Updates: Follow us on Twitter/X

Why TensorPool?

Simplicity: Deploy GPU clusters without the complexity of cloud setup, networking, or quota management
Flexibility: Scale from single GPUs to massive multi-node clusters instantly
Cost Effective: Aggregated GPU capacity from multiple providers means better pricing
Performance: High-speed networking and optimized configurations for ML workloads
No Lock-in: Standard SSH access means you can use any tools and frameworks you prefer

Ready to scale your ML training? Get started at tensorpool.dev!