• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Stop wasting time looking for files and revisions. Connect your Gmail, DriveDropbox, and Slack accounts and in less than 2 minutes, Dokkio will automatically organize all your file attachments. Learn more and claim your free account.


Conlang Database Project (redirected from Conlang Database)

Page history last edited by Matthew McVeagh 1 month, 1 week ago

Conlang Database Project




A group of us have initiated a project to create and maintain a database of the world's conlangs. This database will aim to collect as many conlangs as possible in one searchable catalogue, with useful information about them.


Conlanging has increased exponentially in recent decades. There are many lists of conlangs, for instance in the pages of prominent conlangers, or in the few books that have been written about the subject; however there is no comprehensive catalogue or database that gathers information about as many of them as possible in one place, for the benefit of anyone interested in knowing about them. There are resources that aim to feature many conlangs, such as the wikis like FrathWiki and Linguifex; or CALS, the Conlang Atlas of Linguistic Structures; or ConWorkShop which is a one-stop shop resource for conlangers. However these do not try to collect all conlangs simply because they're conlangs, and that's what we want the database to do.


It has been claimed that at this point it's extremely unlikely that we could record all conlangs, or even all those of which there is findable evidence on the internet. There are simply too many now, and more and more are being created all the time; finding them and adding their information to a database would take too much time to catch up. This may or may not be true, but even if true it's not a reason not to have the database. Instead the database would simply aim to record more and more languages over time; most of the more notable ones would be included quite soon, and there would simply be more and more other ones added thereafter. It's also possible that we could add all findable conlangs, if we had enough volunteers, with enough work time, and organised well enough with division of labour.


Online project locations


Currently this database project has the following locations online:


Discord channel: https://discord.com/channels/748166870787555448/

(invite link: https://discord.gg/J55RyKd)

Facebook group: https://www.facebook.com/groups/2757660151170529

Reddit subreddit: https://www.reddit.com/r/ConlangDatabase/ 

Planning GDoc: https://docs.google.com/document/d/1y5QhCZpn3XcaosbP3wAEtoNGluLHHLzvateNZNAI6Ek/

Data Input Spreadsheet: https://docs.google.com/spreadsheets/d/1uulyGvrhxHxqiY9LnQ8e5hGbENzmEqTwEr0qeyWtsSo/




The project as it stands is shaped by certain principles and intentions:

  • The database is not meant to be private, but publicly viewable and searchable.

  • It should be open to any constructed language, even at a minimal level of development.

  • It is only intended to include public domain information, not copyright material such as the details of languages that creators have written.

  • It should be based in one location, but data should be backed up in other places to secure it against future loss of its specific host.

  • It should be searchable for specific character string content (e.g. to find a conlang of a particular name) but also for lists of all conlangs of particular types or combinations thereof.

  • The work of adding entries should be done by an organised and overseen team of volunteers working in co-ordination.

  • There are many different possible sources of conlang data and different volunteers could concentrate on different ones at the same time - they might also have different priorities which is fine as long as there is a coherent overall policy.

  • Besides volunteer work building the database, there should be a website input form for people to add conlangs, such as their own or ones they know about - this would probably greatly speed up its growth.

  • There are likely to be disagreements over some content such as classifications of conlangs; these should be resolved as amicably as possible with a mind to both honour creators' feelings and wishes and maintain accuracy in the database.

  • If a creator really doesn't want their language included in the database, it should be OK to remove it... however all languages featured would be in the public domain so there shouldn't be any legal or moral issue around including them.


Plan for implementation


The Language Creation Society (https://conlang.org/) have agreed to host the database, on a subdomain such as "database.conlang.org". We need to discuss the following issues with them:

  1. Can we use the same database software the LCS already uses, e.g. for CALS or other things? If not we will have to think about buying some such software, although the LCS have said they may be able to pay for some such costs.

  2. What provision can we make for backing up the data? We want to be able to avoid the problem of the data being lost if an individual website or data store is lost.

  3. Who would own the data? It has been suggested that we use a "CC-BY-SA" licence (https://creativecommons.org/licenses/by-sa/2.0/).


The website location of the database would have:

  1. an explanation of what it is about, perhaps with some examples

  2. a search form for people browsing to search for conlangs

  3. an input form for them to enter details of conlangs

There could also be links to automated pages listing all the database's conlangs in alphabetical order.


Those adding languages to the database should be called 'compilers'; they would all have some level of admin access to the database 'behind the scenes' in order to be able to add languages more quickly or in more versatile ways (e.g. being able to copy identical information from one entry to another). There will need to be one or two in overall charge who have the capacity to edit, correct, revert or undo what the compilers and (public form fillers) add, in case of mistake, bias, vandalism etc. The public contributors' information should be checked before it is added to the publicly visible database, rather than going straight there.


There are many sources for compilers to find conlangs from. If any compilers want to focus on particular sources – for instance ones they are already associated with, or those of particular kinds they are interested in – they ideally should be able to do so. Those organising the project should make and keep lists of all sources used, and assess and measure progress as time goes on. Priorities may need to change, new sources may be discovered, compilers may come and go and with them different focuses. The organisers will also have to field and respond to queries, suggestions and complaints on the decisions of the database project, e.g. if a conlanger disagrees with how their language is represented, if compilers disagree on criteria, or if someone can supply missing data.


When users use the database they will be able to search it for specific criteria, including mixed criteria. They can search for a list of conlangs with details, or statistics on how many/what proportion of conlangs fulfil the search criteria. We will have to determine exactly how the results print out. If there is a directory of the database's conlangs that would be another way to find them, although its usefulness depends on the querent knowing the names of any languages they're interested in, or being prepared to trawl through an alphabetical list.


Data fields


A couple of us created a spreadsheet to demonstrate the database idea and as a start to gathering conlang data - see above for the link. It contains the following fields:



Name of conlang in English/interlingual version


Name of the conlang in itself


Conlanger's name. In the case of languages created by a group, the name of the group.


Clickable link to the most important page/site for the language

Start Year

The year the creator started creating the language


What physical media the language is expressed in, such as speech, writing, signing, other rarer media.

Purpose Type

Classification of the conlang by the purpose for its creation – at the most basic level, e.g. auxiliary, artistic, engineered

Purpose Subtype

Sub-classification by purpose, at the next level – e.g. fantasy, alternate history, logical

Source Type

Classification of the conlang by the origin of its content (especially vocabulary) - whether it's 'a priori' (made up from scratch), 'a posteriori' (derived from previous languages), or a mixture.


Classification of the conlang according to the organisation of its vocabulary in relation to areas of meaning. E.g. taxonomic vs. naturalistic

Development Level

How developed the language is, from sketchlang at the minimum to native speakers at the other.


These fields are of two types:

  1. Basic identifying information of the language, such as name, name of creator, date of creation etc. These are generally factual (right or wrong) and should be public domain information. The database input style should be open, although the Date should of course be a number field etc.

  2. Classification categories for the language, such as purpose or source types, development level, physical modes used. These are more a matter of judgement or classificatory choice, and are not necessarily explicitly stated by the creator or in documentation (but it is better if they are). It might be best if the database input style for these fields is a selection from a limited list, or else contributors might try all sorts of random entries and compilers will only have to change it later.


The above set of fields was OK for an initial version, but isn't necessarily the best for the database. It would undoubtedly be better if we could finalise the fields before we start the database proper, as it would require a huge amount of reorganisation and extra work if we changed them after adding lots of conlangs.


One factor is: if there are too many fields it might put people off from bothering to enter data at all. It is likely that many fields will have to be open to being left empty, as the contributor may not know or care about the answer, or there might not be a clear answer. If compilers could fill these in later it would be good, but it may be the case that some are left empty permanently. What fields are chosen and how many is a balance between including as much interesting info on the one hand and keeping the scale of input down on the other.


There have been the following recommendations from different people as to changes to the fields:


  • Add a general Notes field at the end for creators or compilers to add noteworthy information that wouldn't be contained.

  • Remove Lexico-Semantics – it is only really there to identify taxonomic languages, that info can go in Notes.

  • Collapse Purpose Type and Purpose Subtype into one. The latter is determined partly by the former anyway, so the subtype categories can just have the type ones added to the beginning, e.g. "Artistic – Fantasy".

  • Make sure the Page field can contain several links, or else have a second field for extra links.

  • Add a field for Pseudonyms of creators. This would be useful when someone only knows a pseudonym and not a real name.

  • Add a Group field for conlangs in either a natlang or conworld genealogical grouping. Could also be used for different languages in the same suite.

  • Add a field for what Scripts the language uses.

  • Add a field for Previous names of the conlang.

  • The Endonym field could include an option to upload an image of the language's name in its own conscript.

  • Add a field for Conlang Code Registry (CLCR – https://www.kreativekorp.com/clcr/)

  • Add a "Feature Summary" field for notable features of the language - or else make clear that the Notes field can contain "features of interest". These features are not meant to be the WALS-type features of CALS, but more intriguing and peculiar qualities of the language. 


Data field settings


The data fields will have the following settings for possible entries:



Text string


Text string, possibly with non-standard characters. May include an image of the conlang name in its own conscript.


Text string


Clickable URL link, possibly several (potential multiple lines)

Start Year

Year number


Options from menu:

"Phonic-Scriptic" (presented as spoken and then written)
"Scriptic (+Phonic)" (presented as written, with the implication that it is or could also be spoken)
"Phonic only" (spoken without written version; only written in IPA)

"Pasigraphic" (written language directly representing concept rather than speech)
"Sign" or "Chiric" (manual sign language)

Purpose Type

Options from menu:

"Artistic" (artlangs, for aesthetic purposes)

"Auxiliary" (auxlangs, to help language communities)

"Engineered" (engelangs, to test theories, principles and designs)

"Personal" (created only for the creator's own enjoyment)

"Jokelang" (created purely for amusement)

"Mystical" (inspired mystically or used for ritual purposes)

"Secret" (stealthlangs, for clandestine communication)


Purpose Subtype

Options from menu:


  • "Fictional" (created to feature in a work of fiction)

  • "Conworld" (created as part of imaginary (constructed) world)

  • "Geofictional" (imagined as being in fictional part of our world)

  • "Alternate History" (imagined as part of alternative timeline)

  • "Lostlang" (imagined as undiscovered part of our timeline)

  • "Other"


  • "Global" (intended for use by whole world)

  • "Zonal" (intended for use by limited area of the world)

  • "Specialist" (intended for use by specific section of humanity)

  • "Other"


  • "Ideal" (created to attempt an optimum representation of ideas)

  • "Philosophical" (expressing a philosophical viewpoint)

  • "Logical" (aiming to express meaning without ambiguity)

  • "Experimental" (testing or demonstrating linguistic possibilities)

  • "Other"

"Other Type" (setting for the other purpose types besides the main three)

Source Type

Options from menu:

"A posteriori" (derived from other languages)

"A priori" (not derived from other languages)

"A priori" could be divided into:

  • "Randomly generated" (e.g. using a word gen script)

  • "Personally generated" (as in the creator choosing everything individually)

"Randomly generated" itself could be divided into:

  • "Randomly assigned"

  • "Personally assigned" (if the creator generates words randomly and then assigns meanings by choice)

"Mixed" (a mixture of derived and not derived from other languages)



Options from menu:



"Other" (maybe there are other types to be named?)

Development Level

Options from menu:


"Beginning" or "Incipient" (minimal progress, either a sketchlang or superficial fictional language)

"Developed" (the language has been built to a usable level)

"Learners" (the language has attracted people to learn it besides its creator)

"Active Community" (users are conversing and developing the language)

"Fluent Users" (some learners have achieved fluency)

"Native Users" (some children have learned the language as a mother tongue)



Open content (potential multiple lines)


Text string (potential multiple lines)


Text string


Text string (potential multiple lines)

Previous names

Text string (potential multiple lines)


3-letter text

Feature Summary

Text string (multiple lines)


Conlang source list


We will need to record and work through places to find conlangs and their information. This is the beginning of a list of them, though it will probably have to be added to and revised over time:


Wikis and other documentation websites:


Forums and mailing lists:


Lists (Non-personal):


Lists (Personal):




Publicity list


Those of us involved in the project need to publicise it around the many conlang communities - the following is a list of such places:


Mailing lists:




FB groups:


AV media:








Comments (0)

You don't have permission to comment on this page.