-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Create fly.io template to launch beta9 in fly.io machines #722
base: main
Are you sure you want to change the base?
Conversation
9d2da82
to
14cb30d
Compare
sdk/src/beta9/cli/machine.py
Outdated
@@ -177,7 +177,8 @@ def create_machine(service: ServiceClient, pool: str): | |||
--tailscale-url "{res.machine.tailscale_url}" \\ | |||
--tailscale-auth "{res.machine.tailscale_auth}" \\ | |||
--pool-name "{res.machine.pool_name}" \\ | |||
--provider-name "{res.machine.provider_name}" | |||
--provider-name "{res.machine.provider_name}" \\ | |||
--gateway-url "https://{service._config.gateway_host}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would need to conditionally added since in beam
the grpc host and the http host are on different subdomains
pkg/scheduler/pool_external.go
Outdated
@@ -274,7 +274,8 @@ func (wpc *ExternalWorkerPoolController) createWorkerJob(workerId, machineId str | |||
workerImage := fmt.Sprintf("%s/%s:%s", | |||
wpc.config.Worker.ImageRegistry, | |||
wpc.config.Worker.ImageName, | |||
wpc.config.Worker.ImageTag, | |||
// wpc.config.Worker.ImageTag, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/scheduler/pool_external.go
Outdated
@@ -370,6 +371,11 @@ func (wpc *ExternalWorkerPoolController) getWorkerEnvironment(workerId, machineI | |||
podHostname = fmt.Sprintf("machine-%s.%s.%s", machineId, wpc.config.Tailscale.User, wpc.config.Tailscale.HostName) | |||
} | |||
|
|||
grpcPort := 443 | |||
if wpc.config.GatewayService.GRPC.ExternalPort != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just set this in the config to 443 and remove the if clause?
pkg/types/config.go
Outdated
Hostname string `key:"hostname" json:"hostname"` | ||
} | ||
|
||
type RootAgentConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rename this?
pkg/types/config.go
Outdated
|
||
type RootAgentConfig struct { | ||
ControlPlaneRedis AgentRedisConfig `key:"controlPlaneRedis" json:"control_plane_redis"` | ||
JuicefsRedis AgentRedisConfig `key:"juicefsRedis" json:"juicefs_redis"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I like this config structure. We're basically only using it to override the URLs, feels like there's a more elegant approach here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not a fan either. especially since we don't pass this to the agent.
we do this
remoteConfig.Database.Redis.Addrs[0] = fmt.Sprintf("%s:%d", redisHostname, 6379)
remoteConfig.Storage.JuiceFS.RedisURI = fmt.Sprintf("%s://:%s@%s", scheme, juicefsRedisPassword, juiceFsRedisHostname)
remoteConfig.BlobCache.BlobFs.Sources[idx].JuiceFS.RedisURI = remoteConfig.Storage.JuiceFS.RedisURI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new variable config and removed the juicefs redis, controlplane redis and blobcache urls.
The new veriable is just called
DynamicServiceHosts
that encapsulate all of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically if its true, we use tailscale for the agent configs, if not it will use the host currently specified in config in the places that tailscale would have overrided
pkg/worker/lifecycle.go
Outdated
@@ -364,6 +364,7 @@ func (s *Worker) getContainerEnvironment(request *types.ContainerRequest, option | |||
fmt.Sprintf("CONTAINER_ID=%s", request.ContainerId), | |||
fmt.Sprintf("BETA9_GATEWAY_HOST=%s", os.Getenv("BETA9_GATEWAY_HOST")), | |||
fmt.Sprintf("BETA9_GATEWAY_PORT=%s", os.Getenv("BETA9_GATEWAY_PORT")), | |||
fmt.Sprintf("BETA9_GATEWAY_TLS=%s", os.Getenv("BETA9_GATEWAY_TLS")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just detect if a host has TLS enabled programatically, something like this:
def is_tls_enabled(url: str) -> bool:
"""Check if a given URL supports TLS.
Args:
url (str): The URL to check.
Returns:
bool: True if TLS is enabled, False otherwise.
"""
try:
# Extract hostname and default to port 443 for HTTPS
hostname = url.split("//")[-1].split("/")[0]
port = 443
# Create SSL context and test the connection
context = ssl.create_default_context()
with socket.create_connection((hostname, port), timeout=5) as sock:
with context.wrap_socket(sock, server_hostname=hostname):
return True
except (ssl.SSLError, socket.error):
return False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we can remove the env var
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sense. the code here works
pkg/scheduler/pool_external.go
Outdated
@@ -274,7 +274,8 @@ func (wpc *ExternalWorkerPoolController) createWorkerJob(workerId, machineId str | |||
workerImage := fmt.Sprintf("%s/%s:%s", | |||
wpc.config.Worker.ImageRegistry, | |||
wpc.config.Worker.ImageName, | |||
wpc.config.Worker.ImageTag, | |||
// wpc.config.Worker.ImageTag, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sdk/src/beta9/config.py
Outdated
@@ -176,6 +180,7 @@ def prompt_for_config_context( | |||
prompt_gateway_port = functools.partial( | |||
terminal.prompt, text="Gateway Port", default=gateway_port or settings.gateway_port | |||
) | |||
prompt_tls = functools.partial(terminal.prompt, text="TLS", default=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this prompted? shouldn't it be using that function below?
is_tls_enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
Add templates and setup file to spin up an entire beta9 cluster in fly.io.
This includes templates for:
Currently, there is no built in provider to automatically provision nodes. You would have to use the
beta9 agent
to manually add a node to thebeta9
cluster.