Databases API¶

API for different linguistic databases can be accessed with lingtypology.datasets.

import lingtypology.datasets

1. General¶

Lingtypology attempts to provide unified API for given language databases. Therefore, classes in this module share some common attributes and methods. In this paragraph I will describe them and provide examples for Autotyp, Wals and Phoible.

from lingtypology.datasets import Autotyp, Wals, Phoible

1.1. `features_list`¶

You can get the list of available features from the database using this attribute.

Autotyp().features_list[:10] #It's cutoff in order not to take took much space

['Agreement',
 'Alienability',
 'Alignment',
 'Alignment_case_splits',
 'Alignment_per_language',
 'Clause_linkage',
 'Clause_word_order',
 'Clusivity',
 'GR_per_language',
 'Gender']

Note: Phoible has no features_list attribute because there are no features. However, it has subsets_list that shows list of available subsets of Phoible data.

Phoible().subsets_list

['all', 'UPSID', 'SPA', 'AA', 'PH', 'GM', 'RA', 'SAPHON']

1.2. `get_df` and `get_json`¶

These two methods access the database and return data as pandas.Series or dict. Example of usage:

Autotyp('Agreement', 'Clusivity').get_df().head()

Bickel, Balthasar, Johanna Nichols, Taras Zakharko,
Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler,
Lennart Bierkandt, Fernando Zúñiga & John B. Lowe.
2017. The AUTOTYP typological databases.
Version 0.1.0 https://github.com/autotyp/autotyp-data/tree/0.1.0

	language	LID	VPolyagreement.Presence.v2	VPolyagreement.Presence.v1	InclExclAsPerson.Presence	InclExclAny.Presence	InclExclType	InclExclAsMinAug.Presence
0	Ambulas	6	False	False	False	False	no i/e	False
1	Abkhazian	7	True	True	False	False	no i/e	False
2	Acehnese	9	True	False	False	True	plain i/e type	False
3	Western Keres	10	True	True	False	False	no i/e	False
4	Hokkaido Ainu	12	True	True	False	True	plain i/e type	False

Note: for Phoible and Autotyp you can use strip_na parameter (list, default: []) to strip rows in which there is empty cell in the given columns. Compare the following.

No strip_na (empty cells are replaced with '~N/A~'):

Phoible().get_df().head()

Moran, Steven & McCloy, Daniel (eds.) 2019.
PHOIBLE 2.0.
Jena: Max Planck Institute for the Science of Human History.
(Available online at http://phoible.org, Accessed on 2019-06-13.)

	contribution_name	language	coordinates	glottocode	macroarea	phonemes	consonants	vowels	tones	source	inventory_page
0	Korean (SPA 1)	Korean	(37.5, 128.0)	kore1280	Eurasia	40	22	18	0	https://archive.org/details/kor_SPA1979_phon	https://phoible.org/languages/kore1280
1	KOREAN (UPSID 423)	Korean	(37.5, 128.0)	kore1280	Eurasia	32	21	11	~N/A~	http://web.phonetik.uni-frankfurt.de/L/L2170.html	https://phoible.org/languages/kore1280
2	Ket (SPA 2)	Ket	(63.7551, 87.5466)	kett1243	Eurasia	32	18	14	0	https://archive.org/details/ket_SPA1979_phon	https://phoible.org/languages/kett1243
3	KET (UPSID 399)	Ket	(63.7551, 87.5466)	kett1243	Eurasia	25	18	7	~N/A~	http://web.phonetik.uni-frankfurt.de/L/L2706.html	https://phoible.org/languages/kett1243
4	Lak (SPA 3)	Lak	(42.1328, 47.0809)	lakk1252	Eurasia	69	60	9	0	https://archive.org/details/lbe_SPA1979_phon	https://phoible.org/languages/lakk1252

tones column given to strip_na:

Phoible().get_df(strip_na=['tones']).head()

Moran, Steven & McCloy, Daniel (eds.) 2019.
PHOIBLE 2.0.
Jena: Max Planck Institute for the Science of Human History.
(Available online at http://phoible.org, Accessed on 2019-06-13.)

	contribution_name	language	coordinates	glottocode	macroarea	phonemes	consonants	vowels	source	inventory_page
0	Korean (SPA 1)	Korean	(37.5, 128.0)	kore1280	Eurasia	40	22	18	https://archive.org/details/kor_SPA1979_phon	https://phoible.org/languages/kore1280
2	Ket (SPA 2)	Ket	(63.7551, 87.5466)	kett1243	Eurasia	32	18	14	https://archive.org/details/ket_SPA1979_phon	https://phoible.org/languages/kett1243
4	Lak (SPA 3)	Lak	(42.1328, 47.0809)	lakk1252	Eurasia	69	60	9	https://archive.org/details/lbe_SPA1979_phon	https://phoible.org/languages/lakk1252
6	Kabardian (SPA 4)	Kabardian	(43.5082, 43.3918)	kaba1278	Eurasia	56	49	7	https://archive.org/details/kbd_SPA1979_phon	https://phoible.org/languages/kaba1278
8	Georgian (SPA 5)	Georgian	(41.850396999999994, 43.78613)	nucl1302	Eurasia	35	29	6	https://archive.org/details/kat_SPA1979_phon	https://phoible.org/languages/nucl1302

Note: By default when you call get_df or get_json it prints the citation. If you want to disable it, you shoud set the show_citation to False.

p = Phoible()
p.show_citation = False
p.get_df(strip_na=['tones']).head()

	contribution_name	language	coordinates	glottocode	macroarea	phonemes	consonants	vowels	source	inventory_page
0	Korean (SPA 1)	Korean	(37.5, 128.0)	kore1280	Eurasia	40	22	18	https://archive.org/details/kor_SPA1979_phon	https://phoible.org/languages/kore1280
2	Ket (SPA 2)	Ket	(63.7551, 87.5466)	kett1243	Eurasia	32	18	14	https://archive.org/details/ket_SPA1979_phon	https://phoible.org/languages/kett1243
4	Lak (SPA 3)	Lak	(42.1328, 47.0809)	lakk1252	Eurasia	69	60	9	https://archive.org/details/lbe_SPA1979_phon	https://phoible.org/languages/lakk1252
6	Kabardian (SPA 4)	Kabardian	(43.5082, 43.3918)	kaba1278	Eurasia	56	49	7	https://archive.org/details/kbd_SPA1979_phon	https://phoible.org/languages/kaba1278
8	Georgian (SPA 5)	Georgian	(41.850396999999994, 43.78613)	nucl1302	Eurasia	35	29	6	https://archive.org/details/kat_SPA1979_phon	https://phoible.org/languages/nucl1302

1.3. `citation`¶

You can get the citation for each database using citation attribute. E.g.:

from lingtypology.datasets import Autotyp
print(Autotyp().citation)

Bickel, Balthasar, Johanna Nichols, Taras Zakharko,
Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler,
Lennart Bierkandt, Fernando Zúñiga & John B. Lowe.
2017. The AUTOTYP typological databases.
Version 0.1.0 https://github.com/autotyp/autotyp-data/tree/0.1.0

Note: if you use Wals, citation will be shown for every feature. If you want general citation for the whole Wals, use general_citation.

w = Wals('1a', '2a')
print(w.citation)

Citation for feature 1A:
Ian Maddieson. 2013. Consonant Inventories.
In: Dryer, Matthew S. & Haspelmath, Martin (eds.)
The World Atlas of Language Structures Online.
Leipzig: Max Planck Institute for Evolutionary Anthropology.
(Available online at http://wals.info/chapter/1, Accessed on 2019-06-13.)

Citation for feature 2A:
Ian Maddieson. 2013. Vowel Quality Inventories.
In: Dryer, Matthew S. & Haspelmath, Martin (eds.)
The World Atlas of Language Structures Online.
Leipzig: Max Planck Institute for Evolutionary Anthropology.
(Available online at http://wals.info/chapter/2, Accessed on 2019-06-13.)

print(w.general_citation)

Dryer, Matthew S. & Haspelmath, Martin (eds.) 2013.
The World Atlas of Language Structures Online.
Leipzig: Max Planck Institute for Evolutionary Anthropology.
(Available online at http://wals.info, Accessed on 2019-06-13.)

2. Wals¶

It is possible to access Wals data (online) using lingtypology.datasets.Wals

from lingtypology.datasets import Wals

wals_page = Wals('1a', '2a').get_df()
wals_page.head()

Citation for feature 1A:
Ian Maddieson. 2013. Consonant Inventories.
In: Dryer, Matthew S. & Haspelmath, Martin (eds.)
The World Atlas of Language Structures Online.
Leipzig: Max Planck Institute for Evolutionary Anthropology.
(Available online at http://wals.info/chapter/1, Accessed on 2019-06-13.)

Citation for feature 2A:
Ian Maddieson. 2013. Vowel Quality Inventories.
In: Dryer, Matthew S. & Haspelmath, Martin (eds.)
The World Atlas of Language Structures Online.
Leipzig: Max Planck Institute for Evolutionary Anthropology.
(Available online at http://wals.info/chapter/2, Accessed on 2019-06-13.)

	wals_code	language	genus	family	coordinates	_1A_area	_1A	_1A_num	_1A_desc	_2A_area	_2A	_2A_num	_2A_desc
0	kiw	Kiwai (Southern)	Kiwaian	Kiwaian	(-8.0, 143.5)	Phonology	1. Small	1	Small	Phonology	2. Average (5-6)	2	Average (5-6)
1	xoo	!Xóõ	Tu	Tu	(-24.0, 21.5)	Phonology	5. Large	5	Large	Phonology	2. Average (5-6)	2	Average (5-6)
2	ani	//Ani	Khoe-Kwadi	Khoe-Kwadi	(-18.9166666667, 21.9166666667)	Phonology	5. Large	5	Large	Phonology	2. Average (5-6)	2	Average (5-6)
3	abi	Abipón	South Guaicuruan	Guaicuruan	(-29.0, -61.0)	Phonology	2. Moderately small	2	Moderately small	Phonology	2. Average (5-6)	2	Average (5-6)
4	abk	Abkhaz	Northwest Caucasian	Northwest Caucasian	(43.0833333333, 41.0)	Phonology	5. Large	5	Large	Phonology	1. Small (2-4)	1	Small (2-4)

Map example for feature 1A:

m = lingtypology.LingMap(wals_page.language)
m.add_custom_coordinates(wals_page.coordinates)
m.add_features(
    wals_page._1A,
    colors=lingtypology.gradient(5, 'yellow', 'green')
)
m.legend_title = 'Consonant Inventory'
m.create_map()

3. Autotyp¶

It is possible to access Autotyp data (online) using lingtypology.datasets.Autotyp.

Unlike in Wals, each new tablename passed into Autotyp gives several additional columns:

Autotyp_table = Autotyp('Gender', 'Agreement').get_df(strip_na=['Gender.binned4'])
Autotyp_table.head()

Bickel, Balthasar, Johanna Nichols, Taras Zakharko,
Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler,
Lennart Bierkandt, Fernando Zúñiga & John B. Lowe.
2017. The AUTOTYP typological databases.
Version 0.1.0 https://github.com/autotyp/autotyp-data/tree/0.1.0

	language	LID	Gender.n	Gender.binned4	Gender.Presence	VPolyagreement.Presence.v2	VPolyagreement.Presence.v1
0	Godoberi	1531	3	3 genders	True	False	False
1	Bininj Kun-Wok	655	4	4 genders	True	True	True
2	Luvale	553	10	more than 4 genders	True	True	False
3	North-Central Dargwa	2949	3	3 genders	True	True	True
4	Gaagudju	82	4	4 genders	True	True	True

Now we can draw a map out of gender data from multiple languages.

m = lingtypology.LingMap(Autotyp_table.language)
m.add_features(
    Autotyp_table['Gender.binned4'],
    colors=lingtypology.gradient(4, color1='yellow', color2='red')
)
m.legend_title = 'Genders'
m.create_map()

4. AfBo¶

from lingtypology.datasets import AfBo

adj = AfBo('adjectivizer').get_df()
adj.head()

Seifart, Frank. 2013.
AfBo: A world-wide survey of affix borrowing.
Leipzig: Max Planck Institute for Evolutionary Anthropology.
(Available online at http://afbo.info, Accessed on 2019-06-13.)

	language_recipient	language_donor	reliability	adjectivizer
0	Resígaro	Bora	high	0
1	Gurindji Kriol	Gurindji	high	0
2	Copper Island Aleut	Russian	high	0
3	Sakha	Mongolian	high	4
4	Kalderash Romani	Romanian	high	1

m = lingtypology.LingMap(adj.language_recipient)
m.add_features(adj['adjectivizer'], numeric=True)
m.legend_title = 'Adj'
m.create_map()

5. SAILS¶

from lingtypology.datasets import Sails

To get a pandas.DataFrame of features and descriptions:

Sails().features_descriptions.head()

	Feature	Description
0	ICU17	Is plurality in independent pronouns expressed...
1	ICU16	Is plurality in independent pronouns expressed...
2	ICU15	Is plurality in independent pronouns expressed...
3	ICU14	Is an associative or collective plural disting...
4	ICU13	Are nouns denoting inanimates marked for plural?

Get description for particular features:

Sails().feature_descriptions('ICU10', 'ICU11')

	Feature	Description
0	ICU10	Is nominal plural marking obligatory?
1	ICU11	Are nouns denoting humans marked for plural?

To get the SAILS data as dict, you can use get_json method. To get data as pandas.DataFrame you can run:

sails = Sails('ICU3', 'ICU4')
df = sails.get_df()
df.head()

You probably should cite it, but I don't understand how. Please, consult https://sails.clld.org/

	language	coordinates	ICU3	ICU3_desc	ICU4	ICU4_desc
0	Baniva	(5.26123, -67.56326999999999)	1	Yes	0	No
1	Apolista	(-14.83, -68.66)	0	No	?	?
2	Yavitero	(2.800281, -68.08421899999999)	1	Yes	0	No
3	Resígaro	(-2.48139, -71.35778)	0	No	0	No
4	Tol	(14.66859, -87.03719)	0	No	0	No

Map example:

m = lingtypology.LingMap(df.language)
m.add_features(df.ICU3_desc)
m.legend_title = sails.feature_descriptions('ICU3').Description.at[0]
m.start_location = (9, -79)
m.start_zoom = 5
m.legend_position = 'bottomleft'
m.create_map()

6. Phoible¶

from lingtypology.datasets import Phoible

Unlike in other databases you do not pass features into Phoible. You should pass the subset. Take a look:

p = Phoible()
p.get_df().head()

Moran, Steven & McCloy, Daniel (eds.) 2019.
PHOIBLE 2.0.
Jena: Max Planck Institute for the Science of Human History.
(Available online at http://phoible.org, Accessed on 2019-06-13.)

	contribution_name	language	coordinates	glottocode	macroarea	phonemes	consonants	vowels	tones	source	inventory_page
0	Korean (SPA 1)	Korean	(37.5, 128.0)	kore1280	Eurasia	40	22	18	0	https://archive.org/details/kor_SPA1979_phon	https://phoible.org/languages/kore1280
1	KOREAN (UPSID 423)	Korean	(37.5, 128.0)	kore1280	Eurasia	32	21	11	~N/A~	http://web.phonetik.uni-frankfurt.de/L/L2170.html	https://phoible.org/languages/kore1280
2	Ket (SPA 2)	Ket	(63.7551, 87.5466)	kett1243	Eurasia	32	18	14	0	https://archive.org/details/ket_SPA1979_phon	https://phoible.org/languages/kett1243
3	KET (UPSID 399)	Ket	(63.7551, 87.5466)	kett1243	Eurasia	25	18	7	~N/A~	http://web.phonetik.uni-frankfurt.de/L/L2706.html	https://phoible.org/languages/kett1243
4	Lak (SPA 3)	Lak	(42.1328, 47.0809)	lakk1252	Eurasia	69	60	9	0	https://archive.org/details/lbe_SPA1979_phon	https://phoible.org/languages/lakk1252

There are several entries for different languages: it happens because Phoible data consists of several different subsets. You can get the list of available subsets:

p.subsets_list

['all', 'UPSID', 'SPA', 'AA', 'PH', 'GM', 'RA', 'SAPHON']

… and pass them into the class:

p = Phoible(subset='SPA')
df = p.get_df(strip_na=['tones'])
df.head()

Moran, Steven & McCloy, Daniel (eds.) 2019.
PHOIBLE 2.0.
Jena: Max Planck Institute for the Science of Human History.
(Available online at http://phoible.org, Accessed on 2019-06-13.)

	contribution_name	language	coordinates	glottocode	macroarea	phonemes	consonants	vowels	source	inventory_page
0	Korean (SPA 1)	Korean	(37.5, 128.0)	kore1280	Eurasia	40	22	18	https://archive.org/details/kor_SPA1979_phon	https://phoible.org/languages/kore1280
1	Ket (SPA 2)	Ket	(63.7551, 87.5466)	kett1243	Eurasia	32	18	14	https://archive.org/details/ket_SPA1979_phon	https://phoible.org/languages/kett1243
2	Lak (SPA 3)	Lak	(42.1328, 47.0809)	lakk1252	Eurasia	69	60	9	https://archive.org/details/lbe_SPA1979_phon	https://phoible.org/languages/lakk1252
3	Kabardian (SPA 4)	Kabardian	(43.5082, 43.3918)	kaba1278	Eurasia	56	49	7	https://archive.org/details/kbd_SPA1979_phon	https://phoible.org/languages/kaba1278
4	Georgian (SPA 5)	Georgian	(41.850396999999994, 43.78613)	nucl1302	Eurasia	35	29	6	https://archive.org/details/kat_SPA1979_phon	https://phoible.org/languages/nucl1302

You can also get non-aggregated data by setting aggregated to False while initializing the class.

Phoible(aggregated=False).get_df().head()

Moran, Steven & McCloy, Daniel (eds.) 2019.
PHOIBLE 2.0.
Jena: Max Planck Institute for the Science of Human History.
(Available online at http://phoible.org, Accessed on 2019-06-13.)

	InventoryID	Glottocode	ISO6393	LanguageName	SpecificDialect	GlyphID	Phoneme	Allophones	Marginal	SegmentClass	...	retractedTongueRoot	advancedTongueRoot	periodicGlottalSource	epilaryngealSource	spreadGlottis	constrictedGlottis	raisedLarynxEjective	loweredLarynxImplosive
0	1	kore1280	kor	Korean	~N/A~	0061	a	a	~N/A~	vowel	...	-	-	+	-	-	-	-	-
1	1	kore1280	kor	Korean	~N/A~	0061+02D0	aː	aː	~N/A~	vowel	...	-	-	+	-	-	-	-	-
2	1	kore1280	kor	Korean	~N/A~	00E6	æ	ɛ æ	~N/A~	vowel	...	-	-	+	-	-	-	-	-
3	1	kore1280	kor	Korean	~N/A~	00E6+02D0	æː	æː	~N/A~	vowel	...	-	-	+	-	-	-	-	-
4	1	kore1280	kor	Korean	~N/A~	0065	e	e	~N/A~	vowel	...	-	-	+	-	-	-	-	-

5 rows × 48 columns

Map example:

m = lingtypology.LingMap(df.language)
m.colormap_colors = ('white', 'red')
m.add_features(df.tones, numeric=True)
m.legend_title = 'Tones'
m.legend_position = 'bottomleft'
m.create_map()

Another example (slow due to large amount of data):

df = Phoible(subset='UPSID', aggregated=False).get_df()
#Get all languages with ejectives
df = df[df.raisedLarynxEjective == '+']
#Remove duplicates
df = df.drop_duplicates(subset='Glottocode')
df.head()

Moran, Steven & McCloy, Daniel (eds.) 2019.
PHOIBLE 2.0.
Jena: Max Planck Institute for the Science of Human History.
(Available online at http://phoible.org, Accessed on 2019-06-13.)

	InventoryID	Glottocode	ISO6393	LanguageName	SpecificDialect	GlyphID	Phoneme	Allophones	Marginal	SegmentClass	...	periodicGlottalSource	epilaryngealSource	spreadGlottis	constrictedGlottis	fortis	raisedLarynxEjective	loweredLarynxImplosive	click
7570	198	afad1236	aal	KOTOKO	~N/A~	0063+02BC	cʼ	~N/A~	False	consonant	...	-	-	-	+	-	+	-	-
7802	206	ahte1237	aht	AHTNA	~N/A~	006B+02BC	kʼ	~N/A~	False	consonant	...	-	-	-	+	-	+	-	-
7920	211	qawa1238	alc	QAWASQAR	~N/A~	006B+02BC	kʼ	~N/A~	False	consonant	...	-	-	-	+	-	+	-	-
8131	218	hame1242	amf	HAMER	~N/A~	0071+02BC	qʼ	~N/A~	False	consonant	...	-	-	-	+	-	+	-	-
8157	219	amha1245	amh	AMHARIC	~N/A~	006B+02B7+02BC	kʷʼ	~N/A~	False	consonant	...	-	-	-	+	-	+	-	-

5 rows × 48 columns

m = lingtypology.LingMap(df.Glottocode, glottocode=True)
m.title = 'Languages with Ejectives'
m.tiles = 'Stamen Terrain'
m.radius = 5
m.opacity = 0.5
m.colors = ('blue',)
m.create_map()

Go back up

Databases API¶

1. General¶

1.1. `features_list`¶

1.2. `get_df` and `get_json`¶

1.3. `citation`¶

2. Wals¶

3. Autotyp¶

4. AfBo¶

5. SAILS¶

6. Phoible¶

Table of Contents

Previous topic

Next topic

This Page

Databases API¶

1. General¶

1.1. features_list¶

1.2. get_df and get_json¶

1.3. citation¶

2. Wals¶

3. Autotyp¶

4. AfBo¶

5. SAILS¶

6. Phoible¶

1.1. `features_list`¶

1.2. `get_df` and `get_json`¶

1.3. `citation`¶