This is a follow-up to my previous post where Testcontainers was not freeing up ports in build pipelines.
I am using .NET PostgreSQL Testcontainers to ensure my application runs end-to-end. My integration tests would always run fine on my local machine but occasionally in the GitHub runners environment, I would see a Connection refused
error ("It worked on my computer!").
My guess is the container did not finish the PostgreSQL start up processes by the time the tests started.
System.InvalidOperationException: An exception has been raised that is likely due to a transient failure.
---> Npgsql.NpgsqlException (0x80004005): Failed to connect to 127.0.0.1:32804
---> System.Net.Sockets.SocketException (111): Connection refused
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|285_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
at Npgsql.TaskTimeoutAndCancellation.ExecuteAsync(Func`2 getTaskFunc, NpgsqlTimeout timeout, CancellationToken cancellationToken)
at Npgsql.Internal.NpgsqlConnector.ConnectAsync(NpgsqlTimeout timeout, CancellationToken cancellationToken)
at Npgsql.Internal.NpgsqlConnector.ConnectAsync(NpgsqlTimeout timeout, CancellationToken cancellationToken)
at Npgsql.Internal.NpgsqlConnector.RawOpen(SslMode sslMode, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken, Boolean isFirstAttempt)
at Npgsql.Internal.NpgsqlConnector.<Open>g__OpenCore|213_1(NpgsqlConnector conn, SslMode sslMode, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken, Boolean isFirstAttempt)
at Npgsql.Internal.NpgsqlConnector.Open(NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.UnpooledDataSource.Get(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.NpgsqlConnection.<Open>g__OpenAsync|42_0(Boolean async, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternalAsync(Boolean errorsExpected, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternalAsync(Boolean errorsExpected, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenAsync(CancellationToken cancellationToken, Boolean errorsExpected)
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken)
at Npgsql.EntityFrameworkCore.PostgreSQL.Storage.Internal.NpgsqlExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
How to dynamically wait until a test finishes
There are a couple solutions I’ve used to dynamically adjust the time spent on a test that may run differently on slower test runners. The goal here is obviously to minimize the time spent on the test. So if a fast machine can finish it in under a second, we don’t want to change that time to accommodate test runners that might take 5 seconds.
For integration tests that don’t use Testcontainers at all and rely on I/O or CPU tasks that may vary in response times, my solution to this was to set a time limit for the tests and let it use only the minimum amount of time required to properly assert the test conditions.
[Fact]
[Trait("Category", "Integration")]
public async Task SomeIntegrationTest()
{
// Arrange
...
// Act
...
/*
* Assert
*/
var conditionsMet = false; // Once this is true, the test will finish
var timeLimit = new TimeSpan(0, 1, 0); // Hard limit, fails if conditions not met within the limit
var stopwatch = Stopwatch.StartNew();
while (!conditionsMet)
{
await Task.Delay(TimeSpan.FromMilliseconds(500), CancellationToken.None);
// Was the database updated?
var dbUpdated = await AssertDb();
// Were events sent?
var eventsSent = await AssertEvents();
// The test did not finish within the limit
if (stopwatch.Elapsed > timeLimit)
{
throw new TimeoutException($"Timeout waiting {nameof(SomeIntegrationTest)}.
dbUpdated: {dbUpdated}, eventsSent: {eventsSent}");
}
conditionsMet = dbUpdated && eventsSet;
}
stopwatch.Stop();
}
The test sets a timeLimit
of 1 minute to allow all resources to spin up if needed and will stop when conditionsMet = true
. This means the test dynamically adjusts the time spent based on what is required by the environment. Anything longer than 1 minute (or whatever time limit you set) means there is something wrong with your test.
Testcontainers built-in readiness checks
For integration tests that use Testcontainers to set up the testing environment and require a certain state, I added wait strategies which can be used to check the health of a container.
Similar to a healthcheck
in a docker-compose
file:
postgres-db:
image: postgres:16-alpine
restart: always
ports:
- "54321:5432"
networks:
- public
- backend
healthcheck:
test: [ "CMD-SHELL", "pg_isready -U postgres -d test-db" ]
interval: 10s
retries: 5
start_period: 30s
timeout: 10s
…the same behavior can be mimicked with wait strategies. When building the container below, UntilCommandIsCompleted
will wait for the provided command to return a successful exit code (0
), which pg_isready
will do when PostgreSQL has finished starting up within the container.
_dbContainer = new PostgreSqlBuilder()
.WithName(GetType().Name + Guid.NewGuid())
.WithUsername("postgres")
.WithPassword("password99")
.WithPortBinding(PostgreSqlPort, true)
.WithWaitStrategy(Wait.ForUnixContainer().UntilCommandIsCompleted(
["pg_isready", "-U", "postgres", "-d", "test-db"], o => o.WithTimeout(TimeSpan.FromMinutes(1))))
.Build();
There is a specific way to check if the container is healthy, but that requires that the HEALTHCHECK is actually baked into the Dockerfile. To me, this is inconvenient because most of the official images don’t have a health check specified and would require extending the image.