Databases API ============== API for different linguistic databases can be accessed with ``lingtypology.datasets``. .. code:: ipython3 import lingtypology.datasets 1. General ----------- Lingtypology attempts to provide unified API for given language databases. Therefore, classes in this module share some common attributes and methods. In this paragraph I will describe them and provide examples for Autotyp, Wals and Phoible. .. code:: ipython3 from lingtypology.datasets import Autotyp, Wals, Phoible 1.1. ``features_list`` ~~~~~~~~~~~~~~~~~~~~~~~ You can get the list of available features from the database using this attribute. .. code:: ipython3 Autotyp().features_list[:10] #It's cutoff in order not to take took much space .. parsed-literal:: ['Agreement', 'Alienability', 'Alignment', 'Alignment_case_splits', 'Alignment_per_language', 'Clause_linkage', 'Clause_word_order', 'Clusivity', 'GR_per_language', 'Gender'] **Note**: ``Phoible`` has no ``features_list`` attribute because there are no features. However, it has ``subsets_list`` that shows list of available subsets of Phoible data. .. code:: ipython3 Phoible().subsets_list .. parsed-literal:: ['all', 'UPSID', 'SPA', 'AA', 'PH', 'GM', 'RA', 'SAPHON'] 1.2. ``get_df`` and ``get_json`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These two methods access the database and return data as ``pandas.Series`` or ``dict``. Example of usage: .. code:: ipython3 Autotyp('Agreement', 'Clusivity').get_df().head() .. parsed-literal:: Bickel, Balthasar, Johanna Nichols, Taras Zakharko, Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler, Lennart Bierkandt, Fernando Zúñiga & John B. Lowe. 2017. The AUTOTYP typological databases. Version 0.1.0 https://github.com/autotyp/autotyp-data/tree/0.1.0 .. raw:: html

	language	LID	VPolyagreement.Presence.v2	VPolyagreement.Presence.v1	InclExclAsPerson.Presence	InclExclAny.Presence	InclExclType	InclExclAsMinAug.Presence
0	Ambulas	6	False	False	False	False	no i/e	False
1	Abkhazian	7	True	True	False	False	no i/e	False
2	Acehnese	9	True	False	False	True	plain i/e type	False
3	Western Keres	10	True	True	False	False	no i/e	False
4	Hokkaido Ainu	12	True	True	False	True	plain i/e type	False

| **Note**: for ``Phoible`` and ``Autotyp`` you can use ``strip_na`` parameter (``list``, default: ``[]``) to strip rows in which there is empty cell in the given columns. Compare the following. | No ``strip_na`` (empty cells are replaced with ``'~N/A~'``): .. code:: ipython3 Phoible().get_df().head() .. parsed-literal:: Moran, Steven & McCloy, Daniel (eds.) 2019. PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://phoible.org, Accessed on 2019-06-13.) .. raw:: html

	contribution_name	language	coordinates	glottocode	macroarea	phonemes	consonants	vowels	tones	source	inventory_page
0	Korean (SPA 1)	Korean	(37.5, 128.0)	kore1280	Eurasia	40	22	18	0	https://archive.org/details/kor_SPA1979_phon	https://phoible.org/languages/kore1280
1	KOREAN (UPSID 423)	Korean	(37.5, 128.0)	kore1280	Eurasia	32	21	11	~N/A~	http://web.phonetik.uni-frankfurt.de/L/L2170.html	https://phoible.org/languages/kore1280
2	Ket (SPA 2)	Ket	(63.7551, 87.5466)	kett1243	Eurasia	32	18	14	0	https://archive.org/details/ket_SPA1979_phon	https://phoible.org/languages/kett1243
3	KET (UPSID 399)	Ket	(63.7551, 87.5466)	kett1243	Eurasia	25	18	7	~N/A~	http://web.phonetik.uni-frankfurt.de/L/L2706.html	https://phoible.org/languages/kett1243
4	Lak (SPA 3)	Lak	(42.1328, 47.0809)	lakk1252	Eurasia	69	60	9	0	https://archive.org/details/lbe_SPA1979_phon	https://phoible.org/languages/lakk1252

``tones`` column given to ``strip_na``: .. code:: ipython3 Phoible().get_df(strip_na=['tones']).head() .. parsed-literal:: Moran, Steven & McCloy, Daniel (eds.) 2019. PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://phoible.org, Accessed on 2019-06-13.) .. raw:: html

	contribution_name	language	coordinates	glottocode	macroarea	phonemes	consonants	vowels	source	inventory_page
0	Korean (SPA 1)	Korean	(37.5, 128.0)	kore1280	Eurasia	40	22	18	https://archive.org/details/kor_SPA1979_phon	https://phoible.org/languages/kore1280
2	Ket (SPA 2)	Ket	(63.7551, 87.5466)	kett1243	Eurasia	32	18	14	https://archive.org/details/ket_SPA1979_phon	https://phoible.org/languages/kett1243
4	Lak (SPA 3)	Lak	(42.1328, 47.0809)	lakk1252	Eurasia	69	60	9	https://archive.org/details/lbe_SPA1979_phon	https://phoible.org/languages/lakk1252
6	Kabardian (SPA 4)	Kabardian	(43.5082, 43.3918)	kaba1278	Eurasia	56	49	7	https://archive.org/details/kbd_SPA1979_phon	https://phoible.org/languages/kaba1278
8	Georgian (SPA 5)	Georgian	(41.850396999999994, 43.78613)	nucl1302	Eurasia	35	29	6	https://archive.org/details/kat_SPA1979_phon	https://phoible.org/languages/nucl1302

**Note:** By default when you call ``get_df`` or ``get_json`` it prints the citation. If you want to disable it, you shoud set the ``show_citation`` to ``False``. .. code:: ipython3 p = Phoible() p.show_citation = False p.get_df(strip_na=['tones']).head() .. raw:: html

	contribution_name	language	coordinates	glottocode	macroarea	phonemes	consonants	vowels	source	inventory_page
0	Korean (SPA 1)	Korean	(37.5, 128.0)	kore1280	Eurasia	40	22	18	https://archive.org/details/kor_SPA1979_phon	https://phoible.org/languages/kore1280
2	Ket (SPA 2)	Ket	(63.7551, 87.5466)	kett1243	Eurasia	32	18	14	https://archive.org/details/ket_SPA1979_phon	https://phoible.org/languages/kett1243
4	Lak (SPA 3)	Lak	(42.1328, 47.0809)	lakk1252	Eurasia	69	60	9	https://archive.org/details/lbe_SPA1979_phon	https://phoible.org/languages/lakk1252
6	Kabardian (SPA 4)	Kabardian	(43.5082, 43.3918)	kaba1278	Eurasia	56	49	7	https://archive.org/details/kbd_SPA1979_phon	https://phoible.org/languages/kaba1278
8	Georgian (SPA 5)	Georgian	(41.850396999999994, 43.78613)	nucl1302	Eurasia	35	29	6	https://archive.org/details/kat_SPA1979_phon	https://phoible.org/languages/nucl1302

1.3. ``citation`` ~~~~~~~~~~~~~~~~~~ You can get the citation for each database using ``citation`` attribute. E.g.: .. code:: ipython3 from lingtypology.datasets import Autotyp print(Autotyp().citation) .. parsed-literal:: Bickel, Balthasar, Johanna Nichols, Taras Zakharko, Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler, Lennart Bierkandt, Fernando Zúñiga & John B. Lowe. 2017. The AUTOTYP typological databases. Version 0.1.0 https://github.com/autotyp/autotyp-data/tree/0.1.0 **Note**: if you use ``Wals``, citation will be shown for every feature. If you want general citation for the whole Wals, use ``general_citation``. .. code:: ipython3 w = Wals('1a', '2a') print(w.citation) .. parsed-literal:: Citation for feature 1A: Ian Maddieson. 2013. Consonant Inventories. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wals.info/chapter/1, Accessed on 2019-06-13.) Citation for feature 2A: Ian Maddieson. 2013. Vowel Quality Inventories. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wals.info/chapter/2, Accessed on 2019-06-13.) .. code:: ipython3 print(w.general_citation) .. parsed-literal:: Dryer, Matthew S. & Haspelmath, Martin (eds.) 2013. The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wals.info, Accessed on 2019-06-13.) 2. Wals -------- It is possible to access Wals data (online) using ``lingtypology.datasets.Wals`` .. code:: ipython3 from lingtypology.datasets import Wals .. code:: ipython3 wals_page = Wals('1a', '2a').get_df() wals_page.head() .. parsed-literal:: Citation for feature 1A: Ian Maddieson. 2013. Consonant Inventories. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wals.info/chapter/1, Accessed on 2019-06-13.) Citation for feature 2A: Ian Maddieson. 2013. Vowel Quality Inventories. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wals.info/chapter/2, Accessed on 2019-06-13.) .. raw:: html

	wals_code	language	genus	family	coordinates	_1A_area	_1A	_1A_num	_1A_desc	_2A_area	_2A	_2A_num	_2A_desc
0	kiw	Kiwai (Southern)	Kiwaian	Kiwaian	(-8.0, 143.5)	Phonology	1. Small	1	Small	Phonology	2. Average (5-6)	2	Average (5-6)
1	xoo	!Xóõ	Tu	Tu	(-24.0, 21.5)	Phonology	5. Large	5	Large	Phonology	2. Average (5-6)	2	Average (5-6)
2	ani	//Ani	Khoe-Kwadi	Khoe-Kwadi	(-18.9166666667, 21.9166666667)	Phonology	5. Large	5	Large	Phonology	2. Average (5-6)	2	Average (5-6)
3	abi	Abipón	South Guaicuruan	Guaicuruan	(-29.0, -61.0)	Phonology	2. Moderately small	2	Moderately small	Phonology	2. Average (5-6)	2	Average (5-6)
4	abk	Abkhaz	Northwest Caucasian	Northwest Caucasian	(43.0833333333, 41.0)	Phonology	5. Large	5	Large	Phonology	1. Small (2-4)	1	Small (2-4)

Map example for feature **1A**: .. code:: ipython3 m = lingtypology.LingMap(wals_page.language) m.add_custom_coordinates(wals_page.coordinates) m.add_features( wals_page._1A, colors=lingtypology.gradient(5, 'yellow', 'green') ) m.legend_title = 'Consonant Inventory' m.create_map() .. raw:: html

3. Autotyp ----------- It is possible to access Autotyp data (online) using ``lingtypology.datasets.Autotyp``. Unlike in Wals, each new tablename passed into ``Autotyp`` gives several additional columns: .. code:: ipython3 Autotyp_table = Autotyp('Gender', 'Agreement').get_df(strip_na=['Gender.binned4']) Autotyp_table.head() .. parsed-literal:: Bickel, Balthasar, Johanna Nichols, Taras Zakharko, Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler, Lennart Bierkandt, Fernando Zúñiga & John B. Lowe. 2017. The AUTOTYP typological databases. Version 0.1.0 https://github.com/autotyp/autotyp-data/tree/0.1.0 .. raw:: html

	language	LID	Gender.n	Gender.binned4	Gender.Presence	VPolyagreement.Presence.v2	VPolyagreement.Presence.v1
0	Godoberi	1531	3	3 genders	True	False	False
1	Bininj Kun-Wok	655	4	4 genders	True	True	True
2	Luvale	553	10	more than 4 genders	True	True	False
3	North-Central Dargwa	2949	3	3 genders	True	True	True
4	Gaagudju	82	4	4 genders	True	True	True

Now we can draw a map out of gender data from multiple languages. .. code:: ipython3 m = lingtypology.LingMap(Autotyp_table.language) m.add_features( Autotyp_table['Gender.binned4'], colors=lingtypology.gradient(4, color1='yellow', color2='red') ) m.legend_title = 'Genders' m.create_map() .. raw:: html

4. AfBo -------- .. code:: ipython3 from lingtypology.datasets import AfBo .. code:: ipython3 adj = AfBo('adjectivizer').get_df() adj.head() .. parsed-literal:: Seifart, Frank. 2013. AfBo: A world-wide survey of affix borrowing. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://afbo.info, Accessed on 2019-06-13.) .. raw:: html

	language_recipient	language_donor	reliability	adjectivizer
0	Resígaro	Bora	high	0
1	Gurindji Kriol	Gurindji	high	0
2	Copper Island Aleut	Russian	high	0
3	Sakha	Mongolian	high	4
4	Kalderash Romani	Romanian	high	1

.. code:: ipython3 m = lingtypology.LingMap(adj.language_recipient) m.add_features(adj['adjectivizer'], numeric=True) m.legend_title = 'Adj' m.create_map() .. raw:: html

5. SAILS --------- .. code:: ipython3 from lingtypology.datasets import Sails To get a ``pandas.DataFrame`` of features and descriptions: .. code:: ipython3 Sails().features_descriptions.head() .. raw:: html

	Feature	Description
0	ICU17	Is plurality in independent pronouns expressed...
1	ICU16	Is plurality in independent pronouns expressed...
2	ICU15	Is plurality in independent pronouns expressed...
3	ICU14	Is an associative or collective plural disting...
4	ICU13	Are nouns denoting inanimates marked for plural?

Get description for particular features: .. code:: ipython3 Sails().feature_descriptions('ICU10', 'ICU11') .. raw:: html

	Feature	Description
0	ICU10	Is nominal plural marking obligatory?
1	ICU11	Are nouns denoting humans marked for plural?

To get the SAILS data as ``dict``, you can use ``get_json`` method. To get data as ``pandas.DataFrame`` you can run: .. code:: ipython3 sails = Sails('ICU3', 'ICU4') df = sails.get_df() df.head() .. parsed-literal:: You probably should cite it, but I don't understand how. Please, consult https://sails.clld.org/ .. raw:: html

	language	coordinates	ICU3	ICU3_desc	ICU4	ICU4_desc
0	Baniva	(5.26123, -67.56326999999999)	1	Yes	0	No
1	Apolista	(-14.83, -68.66)	0	No	?	?
2	Yavitero	(2.800281, -68.08421899999999)	1	Yes	0	No
3	Resígaro	(-2.48139, -71.35778)	0	No	0	No
4	Tol	(14.66859, -87.03719)	0	No	0	No

Map example: .. code:: ipython3 m = lingtypology.LingMap(df.language) m.add_features(df.ICU3_desc) m.legend_title = sails.feature_descriptions('ICU3').Description.at[0] m.start_location = (9, -79) m.start_zoom = 5 m.legend_position = 'bottomleft' m.create_map() .. raw:: html

6. Phoible ----------- .. code:: ipython3 from lingtypology.datasets import Phoible Unlike in other databases you do not pass features into Phoible. You should pass the subset. Take a look: .. code:: ipython3 p = Phoible() p.get_df().head() .. parsed-literal:: Moran, Steven & McCloy, Daniel (eds.) 2019. PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://phoible.org, Accessed on 2019-06-13.) .. raw:: html

	contribution_name	language	coordinates	glottocode	macroarea	phonemes	consonants	vowels	tones	source	inventory_page
0	Korean (SPA 1)	Korean	(37.5, 128.0)	kore1280	Eurasia	40	22	18	0	https://archive.org/details/kor_SPA1979_phon	https://phoible.org/languages/kore1280
1	KOREAN (UPSID 423)	Korean	(37.5, 128.0)	kore1280	Eurasia	32	21	11	~N/A~	http://web.phonetik.uni-frankfurt.de/L/L2170.html	https://phoible.org/languages/kore1280
2	Ket (SPA 2)	Ket	(63.7551, 87.5466)	kett1243	Eurasia	32	18	14	0	https://archive.org/details/ket_SPA1979_phon	https://phoible.org/languages/kett1243
3	KET (UPSID 399)	Ket	(63.7551, 87.5466)	kett1243	Eurasia	25	18	7	~N/A~	http://web.phonetik.uni-frankfurt.de/L/L2706.html	https://phoible.org/languages/kett1243
4	Lak (SPA 3)	Lak	(42.1328, 47.0809)	lakk1252	Eurasia	69	60	9	0	https://archive.org/details/lbe_SPA1979_phon	https://phoible.org/languages/lakk1252

There are several entries for different languages: it happens because Phoible data consists of several different subsets. You can get the list of available subsets: .. code:: ipython3 p.subsets_list .. parsed-literal:: ['all', 'UPSID', 'SPA', 'AA', 'PH', 'GM', 'RA', 'SAPHON'] … and pass them into the class: .. code:: ipython3 p = Phoible(subset='SPA') df = p.get_df(strip_na=['tones']) df.head() .. parsed-literal:: Moran, Steven & McCloy, Daniel (eds.) 2019. PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://phoible.org, Accessed on 2019-06-13.) .. raw:: html

	contribution_name	language	coordinates	glottocode	macroarea	phonemes	consonants	vowels	source	inventory_page
0	Korean (SPA 1)	Korean	(37.5, 128.0)	kore1280	Eurasia	40	22	18	https://archive.org/details/kor_SPA1979_phon	https://phoible.org/languages/kore1280
1	Ket (SPA 2)	Ket	(63.7551, 87.5466)	kett1243	Eurasia	32	18	14	https://archive.org/details/ket_SPA1979_phon	https://phoible.org/languages/kett1243
2	Lak (SPA 3)	Lak	(42.1328, 47.0809)	lakk1252	Eurasia	69	60	9	https://archive.org/details/lbe_SPA1979_phon	https://phoible.org/languages/lakk1252
3	Kabardian (SPA 4)	Kabardian	(43.5082, 43.3918)	kaba1278	Eurasia	56	49	7	https://archive.org/details/kbd_SPA1979_phon	https://phoible.org/languages/kaba1278
4	Georgian (SPA 5)	Georgian	(41.850396999999994, 43.78613)	nucl1302	Eurasia	35	29	6	https://archive.org/details/kat_SPA1979_phon	https://phoible.org/languages/nucl1302

You can also get non-aggregated data by setting ``aggregated`` to ``False`` while initializing the class. .. code:: ipython3 Phoible(aggregated=False).get_df().head() .. parsed-literal:: Moran, Steven & McCloy, Daniel (eds.) 2019. PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://phoible.org, Accessed on 2019-06-13.) .. raw:: html

	InventoryID	Glottocode	ISO6393	LanguageName	SpecificDialect	GlyphID	Phoneme	Allophones	Marginal	SegmentClass	...	retractedTongueRoot	advancedTongueRoot	periodicGlottalSource	epilaryngealSource	spreadGlottis	constrictedGlottis	raisedLarynxEjective	loweredLarynxImplosive
0	1	kore1280	kor	Korean	~N/A~	0061	a	a	~N/A~	vowel	...	-	-	+	-	-	-	-	-
1	1	kore1280	kor	Korean	~N/A~	0061+02D0	aː	aː	~N/A~	vowel	...	-	-	+	-	-	-	-	-
2	1	kore1280	kor	Korean	~N/A~	00E6	æ	ɛ æ	~N/A~	vowel	...	-	-	+	-	-	-	-	-
3	1	kore1280	kor	Korean	~N/A~	00E6+02D0	æː	æː	~N/A~	vowel	...	-	-	+	-	-	-	-	-
4	1	kore1280	kor	Korean	~N/A~	0065	e	e	~N/A~	vowel	...	-	-	+	-	-	-	-	-

5 rows × 48 columns

Map example: .. code:: ipython3 m = lingtypology.LingMap(df.language) m.colormap_colors = ('white', 'red') m.add_features(df.tones, numeric=True) m.legend_title = 'Tones' m.legend_position = 'bottomleft' m.create_map() .. raw:: html

Another example (slow due to large amount of data): .. code:: ipython3 df = Phoible(subset='UPSID', aggregated=False).get_df() #Get all languages with ejectives df = df[df.raisedLarynxEjective == '+'] #Remove duplicates df = df.drop_duplicates(subset='Glottocode') df.head() .. parsed-literal:: Moran, Steven & McCloy, Daniel (eds.) 2019. PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://phoible.org, Accessed on 2019-06-13.) .. raw:: html

	InventoryID	Glottocode	ISO6393	LanguageName	SpecificDialect	GlyphID	Phoneme	Allophones	Marginal	SegmentClass	...	periodicGlottalSource	epilaryngealSource	spreadGlottis	constrictedGlottis	fortis	raisedLarynxEjective	loweredLarynxImplosive	click
7570	198	afad1236	aal	KOTOKO	~N/A~	0063+02BC	cʼ	~N/A~	False	consonant	...	-	-	-	+	-	+	-	-
7802	206	ahte1237	aht	AHTNA	~N/A~	006B+02BC	kʼ	~N/A~	False	consonant	...	-	-	-	+	-	+	-	-
7920	211	qawa1238	alc	QAWASQAR	~N/A~	006B+02BC	kʼ	~N/A~	False	consonant	...	-	-	-	+	-	+	-	-
8131	218	hame1242	amf	HAMER	~N/A~	0071+02BC	qʼ	~N/A~	False	consonant	...	-	-	-	+	-	+	-	-
8157	219	amha1245	amh	AMHARIC	~N/A~	006B+02B7+02BC	kʷʼ	~N/A~	False	consonant	...	-	-	-	+	-	+	-	-

5 rows × 48 columns

.. code:: ipython3 m = lingtypology.LingMap(df.Glottocode, glottocode=True) m.title = 'Languages with Ejectives' m.tiles = 'Stamen Terrain' m.radius = 5 m.opacity = 0.5 m.colors = ('blue',) m.create_map() .. raw:: html

`Go back up <#up>`__