Australia’s
foundational AI

Australia has entered the foundational large
language model (LLM) era. We’re shaping what
comes next. Transparently. Ethically. Locally.

CHAT TO US

Our AI Models

Australia needs an LLM designed
to comply with our laws.

A model that is based on transparency around training data sets and recognises our
unique culture and sense of humour. Most importantly, a model that is trained, housed
and run for inference right here, on Australian soil.

To prove that it can be done, at Sovereign Australia AI we have already completed step
one in building our proof of concept research model known as Ginan.

Our larger goal is the creation of a full new foundational LLM, known as the Australis
Model. We’re building it from the ground up. The Australian way.

Ginan, our research model

Our first model is named Ginan after the smallest star in the Southern Cross constellation. With just 8 billion parameters, it’s a fitting name.

We set out to test the hypothesis that re-weighting an existing language model – through targeted fine-tuning – on curated Australian data could deliver significant performance improvements. And we did it!

To achieve Ginan with speed and minimal compute power, we utilised the existing Llama 3.1 model, available under license from Meta. With about 2 billion tokens of Australian data, we fine-tuned the model so that the outputs were biased toward this new data, while still retaining its global body of knowledge.

To ensure we are helping Australia’s digital
sovereignty, we’ll make this research model available
to all Australians, absolutely free.

The dataset will also be released so technically minded Australians can build on our work for their specific research or use. We’ll publish research papers explaining how we tuned it, too. 100% transparency.

To show our transparency, here’s what we used to shape
the Ginan model:

AustLII (Australasian Legal Information Institute) to provide formal Australian English in its most precise form.

Reddit Australian Communities to provide Australian slang and colloquial expressions. Unique Aussie vernacular.

.au Domain Content from Common Crawl to provide a broad cross-section of Australian web content.

Project Gutenberg Australia to provide historical depth to our language understanding through Australian literary works.

You’ll get access to Ginan soon, once we’ve finished ensuring a minimum level of safety.

Ginan is our research model, and we will make it available to all Australians to test and use soon. But Ginan is only the beginning!

Australis, the model that changes
everything for Australian AI

We want a model everyone can be proud of. Utilising the largest single-purpose AI computer in Australia, we will build the Australis model on the latest, fastest and most efficient technology available globally.

Starting with a vast corpus of ethically sourced Australian data, we will incorporate data from around the world to create a globally aware, Australian-oriented model that defaults to Australian answers over USA answers. When you ask about the Navy, it defaults to the Australian Navy, not the US Navy.

The Australis model will also encapsulate our values around diversity, equity and inclusion and be culturally aware of what makes Australians Australian.

The model will be designed and built to comply with Australia’s laws, especially around privacy and data sovereignty. It will also comply with Australia’s guidelines around the creation, adoption and use of AI. We will work hand-in-hand with government and academia at every step.

We take transparency seriously, and so we will always be open and honest about the data used to train the Australis model. We will provide a robust and meaningful opt-out process for those who do not want their data used to further Australia’s digital sovereignty.

Ethical data acquisition

We recognise the years of hard work and creative energy that go into making something new. Whether a book, a song, a news article, or any other creative work, we value that effort.

That’s why we pledge to source all training data for our model ethically. Beyond simply complying with ‘robot.txt’ files, we will add a meta tag to every piece of data we acquire, recording where it came from and how it was sourced.

Further, we have earmarked $10 million to acquire data in ways that guarantee creatives are compensated for their work. This includes working with news services under a paid model and buying books and music where needed.

To be truly sovereign, we need to make sure that the country we are building for is made whole and that creatives can continue their vital work, the very thing that sets humans apart from AI.

And, if by chance, we inadvertently capture data that the owner does not want included in Australia’s model, we will have a robust contestability process to ensure that it is removed from future versions.

We’ve built the foundations.
Now we’re scaling.

Would you like to work with us? Chat to us to find out more about joining the team that will
create a new foundation for Australian technology.

Australia’s foundational AI