Databases API ============== API for different linguistic databases can be accessed with ``lingtypology.datasets``. .. code:: ipython3 import lingtypology.datasets 1. General ----------- Lingtypology attempts to provide unified API for given language databases. Therefore, classes in this module share some common attributes and methods. In this paragraph I will describe them and provide examples for Autotyp, Wals and Phoible. .. code:: ipython3 from lingtypology.datasets import Autotyp, Wals, Phoible 1.1. ``features_list`` ~~~~~~~~~~~~~~~~~~~~~~~ You can get the list of available features from the database using this attribute. .. code:: ipython3 Autotyp().features_list[:10] #It's cutoff in order not to take took much space .. parsed-literal:: ['Agreement', 'Alienability', 'Alignment', 'Alignment_case_splits', 'Alignment_per_language', 'Clause_linkage', 'Clause_word_order', 'Clusivity', 'GR_per_language', 'Gender'] **Note**: ``Phoible`` has no ``features_list`` attribute because there are no features. However, it has ``subsets_list`` that shows list of available subsets of Phoible data. .. code:: ipython3 Phoible().subsets_list .. parsed-literal:: ['all', 'UPSID', 'SPA', 'AA', 'PH', 'GM', 'RA', 'SAPHON'] 1.2. ``get_df`` and ``get_json`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These two methods access the database and return data as ``pandas.Series`` or ``dict``. Example of usage: .. code:: ipython3 Autotyp('Agreement', 'Clusivity').get_df().head() .. parsed-literal:: Bickel, Balthasar, Johanna Nichols, Taras Zakharko, Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler, Lennart Bierkandt, Fernando Zúñiga & John B. Lowe. 2017. The AUTOTYP typological databases. Version 0.1.0 https://github.com/autotyp/autotyp-data/tree/0.1.0 .. raw:: html
language LID VPolyagreement.Presence.v2 VPolyagreement.Presence.v1 InclExclAsPerson.Presence InclExclAny.Presence InclExclType InclExclAsMinAug.Presence
0 Ambulas 6 False False False False no i/e False
1 Abkhazian 7 True True False False no i/e False
2 Acehnese 9 True False False True plain i/e type False
3 Western Keres 10 True True False False no i/e False
4 Hokkaido Ainu 12 True True False True plain i/e type False
| **Note**: for ``Phoible`` and ``Autotyp`` you can use ``strip_na`` parameter (``list``, default: ``[]``) to strip rows in which there is empty cell in the given columns. Compare the following. | No ``strip_na`` (empty cells are replaced with ``'~N/A~'``): .. code:: ipython3 Phoible().get_df().head() .. parsed-literal:: Moran, Steven & McCloy, Daniel (eds.) 2019. PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://phoible.org, Accessed on 2019-06-13.) .. raw:: html
contribution_name language coordinates glottocode macroarea phonemes consonants vowels tones source inventory_page
0 Korean (SPA 1) Korean (37.5, 128.0) kore1280 Eurasia 40 22 18 0 https://archive.org/details/kor_SPA1979_phon https://phoible.org/languages/kore1280
1 KOREAN (UPSID 423) Korean (37.5, 128.0) kore1280 Eurasia 32 21 11 ~N/A~ http://web.phonetik.uni-frankfurt.de/L/L2170.html https://phoible.org/languages/kore1280
2 Ket (SPA 2) Ket (63.7551, 87.5466) kett1243 Eurasia 32 18 14 0 https://archive.org/details/ket_SPA1979_phon https://phoible.org/languages/kett1243
3 KET (UPSID 399) Ket (63.7551, 87.5466) kett1243 Eurasia 25 18 7 ~N/A~ http://web.phonetik.uni-frankfurt.de/L/L2706.html https://phoible.org/languages/kett1243
4 Lak (SPA 3) Lak (42.1328, 47.0809) lakk1252 Eurasia 69 60 9 0 https://archive.org/details/lbe_SPA1979_phon https://phoible.org/languages/lakk1252
``tones`` column given to ``strip_na``: .. code:: ipython3 Phoible().get_df(strip_na=['tones']).head() .. parsed-literal:: Moran, Steven & McCloy, Daniel (eds.) 2019. PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://phoible.org, Accessed on 2019-06-13.) .. raw:: html
contribution_name language coordinates glottocode macroarea phonemes consonants vowels tones source inventory_page
0 Korean (SPA 1) Korean (37.5, 128.0) kore1280 Eurasia 40 22 18 0 https://archive.org/details/kor_SPA1979_phon https://phoible.org/languages/kore1280
2 Ket (SPA 2) Ket (63.7551, 87.5466) kett1243 Eurasia 32 18 14 0 https://archive.org/details/ket_SPA1979_phon https://phoible.org/languages/kett1243
4 Lak (SPA 3) Lak (42.1328, 47.0809) lakk1252 Eurasia 69 60 9 0 https://archive.org/details/lbe_SPA1979_phon https://phoible.org/languages/lakk1252
6 Kabardian (SPA 4) Kabardian (43.5082, 43.3918) kaba1278 Eurasia 56 49 7 0 https://archive.org/details/kbd_SPA1979_phon https://phoible.org/languages/kaba1278
8 Georgian (SPA 5) Georgian (41.850396999999994, 43.78613) nucl1302 Eurasia 35 29 6 0 https://archive.org/details/kat_SPA1979_phon https://phoible.org/languages/nucl1302
**Note:** By default when you call ``get_df`` or ``get_json`` it prints the citation. If you want to disable it, you shoud set the ``show_citation`` to ``False``. .. code:: ipython3 p = Phoible() p.show_citation = False p.get_df(strip_na=['tones']).head() .. raw:: html
contribution_name language coordinates glottocode macroarea phonemes consonants vowels tones source inventory_page
0 Korean (SPA 1) Korean (37.5, 128.0) kore1280 Eurasia 40 22 18 0 https://archive.org/details/kor_SPA1979_phon https://phoible.org/languages/kore1280
2 Ket (SPA 2) Ket (63.7551, 87.5466) kett1243 Eurasia 32 18 14 0 https://archive.org/details/ket_SPA1979_phon https://phoible.org/languages/kett1243
4 Lak (SPA 3) Lak (42.1328, 47.0809) lakk1252 Eurasia 69 60 9 0 https://archive.org/details/lbe_SPA1979_phon https://phoible.org/languages/lakk1252
6 Kabardian (SPA 4) Kabardian (43.5082, 43.3918) kaba1278 Eurasia 56 49 7 0 https://archive.org/details/kbd_SPA1979_phon https://phoible.org/languages/kaba1278
8 Georgian (SPA 5) Georgian (41.850396999999994, 43.78613) nucl1302 Eurasia 35 29 6 0 https://archive.org/details/kat_SPA1979_phon https://phoible.org/languages/nucl1302
1.3. ``citation`` ~~~~~~~~~~~~~~~~~~ You can get the citation for each database using ``citation`` attribute. E.g.: .. code:: ipython3 from lingtypology.datasets import Autotyp print(Autotyp().citation) .. parsed-literal:: Bickel, Balthasar, Johanna Nichols, Taras Zakharko, Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler, Lennart Bierkandt, Fernando Zúñiga & John B. Lowe. 2017. The AUTOTYP typological databases. Version 0.1.0 https://github.com/autotyp/autotyp-data/tree/0.1.0 **Note**: if you use ``Wals``, citation will be shown for every feature. If you want general citation for the whole Wals, use ``general_citation``. .. code:: ipython3 w = Wals('1a', '2a') print(w.citation) .. parsed-literal:: Citation for feature 1A: Ian Maddieson. 2013. Consonant Inventories. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wals.info/chapter/1, Accessed on 2019-06-13.) Citation for feature 2A: Ian Maddieson. 2013. Vowel Quality Inventories. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wals.info/chapter/2, Accessed on 2019-06-13.) .. code:: ipython3 print(w.general_citation) .. parsed-literal:: Dryer, Matthew S. & Haspelmath, Martin (eds.) 2013. The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wals.info, Accessed on 2019-06-13.) 2. Wals -------- It is possible to access Wals data (online) using ``lingtypology.datasets.Wals`` .. code:: ipython3 from lingtypology.datasets import Wals .. code:: ipython3 wals_page = Wals('1a', '2a').get_df() wals_page.head() .. parsed-literal:: Citation for feature 1A: Ian Maddieson. 2013. Consonant Inventories. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wals.info/chapter/1, Accessed on 2019-06-13.) Citation for feature 2A: Ian Maddieson. 2013. Vowel Quality Inventories. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wals.info/chapter/2, Accessed on 2019-06-13.) .. raw:: html
wals_code language genus family coordinates _1A_area _1A _1A_num _1A_desc _2A_area _2A _2A_num _2A_desc
0 kiw Kiwai (Southern) Kiwaian Kiwaian (-8.0, 143.5) Phonology 1. Small 1 Small Phonology 2. Average (5-6) 2 Average (5-6)
1 xoo !Xóõ Tu Tu (-24.0, 21.5) Phonology 5. Large 5 Large Phonology 2. Average (5-6) 2 Average (5-6)
2 ani //Ani Khoe-Kwadi Khoe-Kwadi (-18.9166666667, 21.9166666667) Phonology 5. Large 5 Large Phonology 2. Average (5-6) 2 Average (5-6)
3 abi Abipón South Guaicuruan Guaicuruan (-29.0, -61.0) Phonology 2. Moderately small 2 Moderately small Phonology 2. Average (5-6) 2 Average (5-6)
4 abk Abkhaz Northwest Caucasian Northwest Caucasian (43.0833333333, 41.0) Phonology 5. Large 5 Large Phonology 1. Small (2-4) 1 Small (2-4)
Map example for feature **1A**: .. code:: ipython3 m = lingtypology.LingMap(wals_page.language) m.add_custom_coordinates(wals_page.coordinates) m.add_features( wals_page._1A, colors=lingtypology.gradient(5, 'yellow', 'green') ) m.legend_title = 'Consonant Inventory' m.create_map() .. raw:: html
3. Autotyp ----------- It is possible to access Autotyp data (online) using ``lingtypology.datasets.Autotyp``. Unlike in Wals, each new tablename passed into ``Autotyp`` gives several additional columns: .. code:: ipython3 Autotyp_table = Autotyp('Gender', 'Agreement').get_df(strip_na=['Gender.binned4']) Autotyp_table.head() .. parsed-literal:: Bickel, Balthasar, Johanna Nichols, Taras Zakharko, Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler, Lennart Bierkandt, Fernando Zúñiga & John B. Lowe. 2017. The AUTOTYP typological databases. Version 0.1.0 https://github.com/autotyp/autotyp-data/tree/0.1.0 .. raw:: html
language LID Gender.n Gender.binned4 Gender.Presence VPolyagreement.Presence.v2 VPolyagreement.Presence.v1
0 Godoberi 1531 3 3 genders True False False
1 Bininj Kun-Wok 655 4 4 genders True True True
2 Luvale 553 10 more than 4 genders True True False
3 North-Central Dargwa 2949 3 3 genders True True True
4 Gaagudju 82 4 4 genders True True True
Now we can draw a map out of gender data from multiple languages. .. code:: ipython3 m = lingtypology.LingMap(Autotyp_table.language) m.add_features( Autotyp_table['Gender.binned4'], colors=lingtypology.gradient(4, color1='yellow', color2='red') ) m.legend_title = 'Genders' m.create_map() .. raw:: html
4. AfBo -------- .. code:: ipython3 from lingtypology.datasets import AfBo .. code:: ipython3 adj = AfBo('adjectivizer').get_df() adj.head() .. parsed-literal:: Seifart, Frank. 2013. AfBo: A world-wide survey of affix borrowing. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://afbo.info, Accessed on 2019-06-13.) .. raw:: html
language_recipient language_donor reliability adjectivizer
0 Resígaro Bora high 0
1 Gurindji Kriol Gurindji high 0
2 Copper Island Aleut Russian high 0
3 Sakha Mongolian high 4
4 Kalderash Romani Romanian high 1
.. code:: ipython3 m = lingtypology.LingMap(adj.language_recipient) m.add_features(adj['adjectivizer'], numeric=True) m.legend_title = 'Adj' m.create_map() .. raw:: html
5. SAILS --------- .. code:: ipython3 from lingtypology.datasets import Sails To get a ``pandas.DataFrame`` of features and descriptions: .. code:: ipython3 Sails().features_descriptions.head() .. raw:: html
Feature Description
0 ICU17 Is plurality in independent pronouns expressed...
1 ICU16 Is plurality in independent pronouns expressed...
2 ICU15 Is plurality in independent pronouns expressed...
3 ICU14 Is an associative or collective plural disting...
4 ICU13 Are nouns denoting inanimates marked for plural?
Get description for particular features: .. code:: ipython3 Sails().feature_descriptions('ICU10', 'ICU11') .. raw:: html
Feature Description
0 ICU10 Is nominal plural marking obligatory?
1 ICU11 Are nouns denoting humans marked for plural?
To get the SAILS data as ``dict``, you can use ``get_json`` method. To get data as ``pandas.DataFrame`` you can run: .. code:: ipython3 sails = Sails('ICU3', 'ICU4') df = sails.get_df() df.head() .. parsed-literal:: You probably should cite it, but I don't understand how. Please, consult https://sails.clld.org/ .. raw:: html
language coordinates ICU3 ICU3_desc ICU4 ICU4_desc
0 Baniva (5.26123, -67.56326999999999) 1 Yes 0 No
1 Apolista (-14.83, -68.66) 0 No ? ?
2 Yavitero (2.800281, -68.08421899999999) 1 Yes 0 No
3 Resígaro (-2.48139, -71.35778) 0 No 0 No
4 Tol (14.66859, -87.03719) 0 No 0 No
Map example: .. code:: ipython3 m = lingtypology.LingMap(df.language) m.add_features(df.ICU3_desc) m.legend_title = sails.feature_descriptions('ICU3').Description.at[0] m.start_location = (9, -79) m.start_zoom = 5 m.legend_position = 'bottomleft' m.create_map() .. raw:: html
6. Phoible ----------- .. code:: ipython3 from lingtypology.datasets import Phoible Unlike in other databases you do not pass features into Phoible. You should pass the subset. Take a look: .. code:: ipython3 p = Phoible() p.get_df().head() .. parsed-literal:: Moran, Steven & McCloy, Daniel (eds.) 2019. PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://phoible.org, Accessed on 2019-06-13.) .. raw:: html
contribution_name language coordinates glottocode macroarea phonemes consonants vowels tones source inventory_page
0 Korean (SPA 1) Korean (37.5, 128.0) kore1280 Eurasia 40 22 18 0 https://archive.org/details/kor_SPA1979_phon https://phoible.org/languages/kore1280
1 KOREAN (UPSID 423) Korean (37.5, 128.0) kore1280 Eurasia 32 21 11 ~N/A~ http://web.phonetik.uni-frankfurt.de/L/L2170.html https://phoible.org/languages/kore1280
2 Ket (SPA 2) Ket (63.7551, 87.5466) kett1243 Eurasia 32 18 14 0 https://archive.org/details/ket_SPA1979_phon https://phoible.org/languages/kett1243
3 KET (UPSID 399) Ket (63.7551, 87.5466) kett1243 Eurasia 25 18 7 ~N/A~ http://web.phonetik.uni-frankfurt.de/L/L2706.html https://phoible.org/languages/kett1243
4 Lak (SPA 3) Lak (42.1328, 47.0809) lakk1252 Eurasia 69 60 9 0 https://archive.org/details/lbe_SPA1979_phon https://phoible.org/languages/lakk1252
There are several entries for different languages: it happens because Phoible data consists of several different subsets. You can get the list of available subsets: .. code:: ipython3 p.subsets_list .. parsed-literal:: ['all', 'UPSID', 'SPA', 'AA', 'PH', 'GM', 'RA', 'SAPHON'] … and pass them into the class: .. code:: ipython3 p = Phoible(subset='SPA') df = p.get_df(strip_na=['tones']) df.head() .. parsed-literal:: Moran, Steven & McCloy, Daniel (eds.) 2019. PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://phoible.org, Accessed on 2019-06-13.) .. raw:: html
contribution_name language coordinates glottocode macroarea phonemes consonants vowels tones source inventory_page
0 Korean (SPA 1) Korean (37.5, 128.0) kore1280 Eurasia 40 22 18 0 https://archive.org/details/kor_SPA1979_phon https://phoible.org/languages/kore1280
1 Ket (SPA 2) Ket (63.7551, 87.5466) kett1243 Eurasia 32 18 14 0 https://archive.org/details/ket_SPA1979_phon https://phoible.org/languages/kett1243
2 Lak (SPA 3) Lak (42.1328, 47.0809) lakk1252 Eurasia 69 60 9 0 https://archive.org/details/lbe_SPA1979_phon https://phoible.org/languages/lakk1252
3 Kabardian (SPA 4) Kabardian (43.5082, 43.3918) kaba1278 Eurasia 56 49 7 0 https://archive.org/details/kbd_SPA1979_phon https://phoible.org/languages/kaba1278
4 Georgian (SPA 5) Georgian (41.850396999999994, 43.78613) nucl1302 Eurasia 35 29 6 0 https://archive.org/details/kat_SPA1979_phon https://phoible.org/languages/nucl1302
You can also get non-aggregated data by setting ``aggregated`` to ``False`` while initializing the class. .. code:: ipython3 Phoible(aggregated=False).get_df().head() .. parsed-literal:: Moran, Steven & McCloy, Daniel (eds.) 2019. PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://phoible.org, Accessed on 2019-06-13.) .. raw:: html
InventoryID Glottocode ISO6393 LanguageName SpecificDialect GlyphID Phoneme Allophones Marginal SegmentClass ... retractedTongueRoot advancedTongueRoot periodicGlottalSource epilaryngealSource spreadGlottis constrictedGlottis fortis raisedLarynxEjective loweredLarynxImplosive click
0 1 kore1280 kor Korean ~N/A~ 0061 a a ~N/A~ vowel ... - - + - - - 0 - - 0
1 1 kore1280 kor Korean ~N/A~ 0061+02D0 ~N/A~ vowel ... - - + - - - 0 - - 0
2 1 kore1280 kor Korean ~N/A~ 00E6 æ ɛ æ ~N/A~ vowel ... - - + - - - 0 - - 0
3 1 kore1280 kor Korean ~N/A~ 00E6+02D0 æː æː ~N/A~ vowel ... - - + - - - 0 - - 0
4 1 kore1280 kor Korean ~N/A~ 0065 e e ~N/A~ vowel ... - - + - - - 0 - - 0

5 rows × 48 columns

Map example: .. code:: ipython3 m = lingtypology.LingMap(df.language) m.colormap_colors = ('white', 'red') m.add_features(df.tones, numeric=True) m.legend_title = 'Tones' m.legend_position = 'bottomleft' m.create_map() .. raw:: html
Another example (slow due to large amount of data): .. code:: ipython3 df = Phoible(subset='UPSID', aggregated=False).get_df() #Get all languages with ejectives df = df[df.raisedLarynxEjective == '+'] #Remove duplicates df = df.drop_duplicates(subset='Glottocode') df.head() .. parsed-literal:: Moran, Steven & McCloy, Daniel (eds.) 2019. PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://phoible.org, Accessed on 2019-06-13.) .. raw:: html
InventoryID Glottocode ISO6393 LanguageName SpecificDialect GlyphID Phoneme Allophones Marginal SegmentClass ... retractedTongueRoot advancedTongueRoot periodicGlottalSource epilaryngealSource spreadGlottis constrictedGlottis fortis raisedLarynxEjective loweredLarynxImplosive click
7570 198 afad1236 aal KOTOKO ~N/A~ 0063+02BC ~N/A~ False consonant ... 0 0 - - - + - + - -
7802 206 ahte1237 aht AHTNA ~N/A~ 006B+02BC ~N/A~ False consonant ... 0 0 - - - + - + - -
7920 211 qawa1238 alc QAWASQAR ~N/A~ 006B+02BC ~N/A~ False consonant ... 0 0 - - - + - + - -
8131 218 hame1242 amf HAMER ~N/A~ 0071+02BC ~N/A~ False consonant ... 0 0 - - - + - + - -
8157 219 amha1245 amh AMHARIC ~N/A~ 006B+02B7+02BC kʷʼ ~N/A~ False consonant ... 0 0 - - - + - + - -

5 rows × 48 columns

.. code:: ipython3 m = lingtypology.LingMap(df.Glottocode, glottocode=True) m.title = 'Languages with Ejectives' m.tiles = 'Stamen Terrain' m.radius = 5 m.opacity = 0.5 m.colors = ('blue',) m.create_map() .. raw:: html
`Go back up <#up>`__