How to group wikipedia categories in python?

How to group wikipedia categories in python?

For each concept of my dataset I have stored the corresponding wikipedia categories. For example, consider the following 5 concepts and their corresponding wikipedia categories.

For each concept of my dataset I have stored the corresponding wikipedia categories. For example, consider the following 5 concepts and their corresponding wikipedia categories.

  • hypertriglyceridemia: ['Category:Lipid metabolism disorders', 'Category:Medical conditions related to obesity']
  • enzyme inhibitor: ['Category:Enzyme inhibitors', 'Category:Medicinal chemistry', 'Category:Metabolism']
  • bypass surgery: ['Category:Surgery stubs', 'Category:Surgical procedures and techniques']
  • perth: ['Category:1829 establishments in Australia', 'Category:Australian capital cities', 'Category:Metropolitan areas of Australia', 'Category:Perth, Western Australia', 'Category:Populated places established in 1829']
  • climate: ['Category:Climate', 'Category:Climatology', 'Category:Meteorological concepts']

As you can see, the first three concepts belong to medical domain (whereas the remaining two terms are not medical terms).

More precisely, I want to divide my concepts as medical and non-medical. However, it is very difficult to divide the concepts using the categories alone. For example, even though the two concepts enzyme inhibitor and bypass surgery are in medical domain, their categories are very different to each other.

Therefore, I would like to know if there is a way to obtain the parent category of the categories (for example, the categories of enzyme inhibitor and bypass surgery belong to medical parent category)

I am currently using pymediawiki and pywikibot. However, I am not restricted to only those two libraries and happy to have solutions using other libraries as well.

EDIT

As suggested by @IlmariKaronen I am also using the categories of categories and the results I got is as follows (The small font near the category is the categories of the category).

However, I still could not find a way to use these category details to decide if a given term is a medical or non-medical.

Moreover, as pointed by @IlmariKaronen using Wikiproject details can be potential. However, it seems like the Medicine wikiproject do not seem to have all the medical terms. Therefore we also need to check other wikiprojects as well.

EDIT: My current code of extracting categories from wikipedia concepts is as follows. This could be done using pywikibot or pymediawiki as follows.

  1. Using the librarary pymediawiki
  2. import mediawiki as pw
p = wikipedia.page('enzyme inhibitor')
print(p.categories)
  1. Using the library pywikibot
import pywikibot as pw

site = pw.Site('en', 'wikipedia')

print([ cat.title() for cat in pw.Page(site, 'support-vector machine').categories() if 'hidden' not in cat.categoryinfo ])

The categories of categories can also be done in the same way as shown in the answer by @IlmariKaronen.

If you are looking for longer list of concepts for testing I have mentioned more examples below.

['juvenile chronic arthritis', 'climate', 'alexidine', 'mouthrinse', 'sialosis', 'australia', 'artificial neural network', 'ricinoleic acid', 'bromosulfophthalein', 'myelosclerosis', 'hydrochloride salt', 'cycasin', 'aldosterone antagonist', 'fungal growth', 'describe', 'liver resection', 'coffee table', 'natural language processing', 'infratemporal fossa', 'social withdrawal', 'information retrieval', 'monday', 'menthol', 'overturn', 'prevailing', 'spline function', 'acinic cell carcinoma', 'furth', 'hepatic protein', 'blistering', 'prefixation', 'january', 'cardiopulmonary receptor', 'extracorporeal membrane oxygenation', 'clinodactyly', 'melancholic', 'chlorpromazine hydrochloride', 'level of evidence', 'washington state', 'cat', 'newyork', 'year elevan', 'trituration', 'gold alloy', 'hexoprenaline', 'second molar', 'novice', 'oxygen radical', 'subscription', 'ordinate', 'approximal', 'spongiosis', 'ribothymidine', 'body of evidence', 'vpb', 'porins', 'musculocutaneous']

For a very long list please check the link below. https://docs.google.com/document/d/1BYllMyDlw-Rb4uMh89VjLml2Bl9Y7oUlopM-Z4F6pN0/edit?usp=sharing

NOTE: I am not expecting the solution to work 100% (if the proposed algorithm is able to detect many of the medical concepts that is enough for me)

I am happy to provide more details if needed.

Angular 9 Tutorial: Learn to Build a CRUD Angular App Quickly

What's new in Bootstrap 5 and when Bootstrap 5 release date?

Brave, Chrome, Firefox, Opera or Edge: Which is Better and Faster?

How to Build Progressive Web Apps (PWA) using Angular 9

What is new features in Javascript ES2020 ECMAScript 2020

Top Python Development Companies | Hire Python Developers

After analyzing clients and market requirements, TopDevelopers has come up with the list of the best Python service providers. These top-rated Python developers are widely appreciated for their professionalism in handling diverse projects. When...

Python GUI Programming Projects using Tkinter and Python 3

Python GUI Programming Projects using Tkinter and Python 3

Guide to Python Programming Language

Guide to Python Programming Language