How to group wikipedia categories in python?

How to group wikipedia categories in python?

For each concept of my dataset I have stored the corresponding wikipedia categories. For example, consider the following 5 concepts and their corresponding wikipedia categories.

For each concept of my dataset I have stored the corresponding wikipedia categories. For example, consider the following 5 concepts and their corresponding wikipedia categories.

  • hypertriglyceridemia: ['Category:Lipid metabolism disorders', 'Category:Medical conditions related to obesity']
  • enzyme inhibitor: ['Category:Enzyme inhibitors', 'Category:Medicinal chemistry', 'Category:Metabolism']
  • bypass surgery: ['Category:Surgery stubs', 'Category:Surgical procedures and techniques']
  • perth: ['Category:1829 establishments in Australia', 'Category:Australian capital cities', 'Category:Metropolitan areas of Australia', 'Category:Perth, Western Australia', 'Category:Populated places established in 1829']
  • climate: ['Category:Climate', 'Category:Climatology', 'Category:Meteorological concepts']

As you can see, the first three concepts belong to medical domain (whereas the remaining two terms are not medical terms).

More precisely, I want to divide my concepts as medical and non-medical. However, it is very difficult to divide the concepts using the categories alone. For example, even though the two concepts enzyme inhibitor and bypass surgery are in medical domain, their categories are very different to each other.

Therefore, I would like to know if there is a way to obtain the parent category of the categories (for example, the categories of enzyme inhibitor and bypass surgery belong to medical parent category)

I am currently using pymediawiki and pywikibot. However, I am not restricted to only those two libraries and happy to have solutions using other libraries as well.


As suggested by @IlmariKaronen I am also using the categories of categories and the results I got is as follows (The small font near the category is the categories of the category).

However, I still could not find a way to use these category details to decide if a given term is a medical or non-medical.

Moreover, as pointed by @IlmariKaronen using Wikiproject details can be potential. However, it seems like the Medicine wikiproject do not seem to have all the medical terms. Therefore we also need to check other wikiprojects as well.

EDIT: My current code of extracting categories from wikipedia concepts is as follows. This could be done using pywikibot or pymediawiki as follows.

  1. Using the librarary pymediawiki
  2. import mediawiki as pw
p ='enzyme inhibitor')
  1. Using the library pywikibot
import pywikibot as pw

site = pw.Site('en', 'wikipedia')

print([ cat.title() for cat in pw.Page(site, 'support-vector machine').categories() if 'hidden' not in cat.categoryinfo ])

The categories of categories can also be done in the same way as shown in the answer by @IlmariKaronen.

If you are looking for longer list of concepts for testing I have mentioned more examples below.

['juvenile chronic arthritis', 'climate', 'alexidine', 'mouthrinse', 'sialosis', 'australia', 'artificial neural network', 'ricinoleic acid', 'bromosulfophthalein', 'myelosclerosis', 'hydrochloride salt', 'cycasin', 'aldosterone antagonist', 'fungal growth', 'describe', 'liver resection', 'coffee table', 'natural language processing', 'infratemporal fossa', 'social withdrawal', 'information retrieval', 'monday', 'menthol', 'overturn', 'prevailing', 'spline function', 'acinic cell carcinoma', 'furth', 'hepatic protein', 'blistering', 'prefixation', 'january', 'cardiopulmonary receptor', 'extracorporeal membrane oxygenation', 'clinodactyly', 'melancholic', 'chlorpromazine hydrochloride', 'level of evidence', 'washington state', 'cat', 'newyork', 'year elevan', 'trituration', 'gold alloy', 'hexoprenaline', 'second molar', 'novice', 'oxygen radical', 'subscription', 'ordinate', 'approximal', 'spongiosis', 'ribothymidine', 'body of evidence', 'vpb', 'porins', 'musculocutaneous']

For a very long list please check the link below.

NOTE: I am not expecting the solution to work 100% (if the proposed algorithm is able to detect many of the medical concepts that is enough for me)

I am happy to provide more details if needed.


Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Basic Data Types in Python | Python Web Development For Beginners

In the programming world, Data types play an important role. Each Variable is stored in different data types and responsible for various functions. Python had two different objects, and They are mutable and immutable objects.

How To Compare Tesla and Ford Company By Using Magic Methods in Python

Magic Methods are the special methods which gives us the ability to access built in syntactical features such as ‘<’, ‘>’, ‘==’, ‘+’ etc.. You must have worked with such methods without knowing them to be as magic methods. Magic methods can be identified with their names which start with __ and ends with __ like __init__, __call__, __str__ etc. These methods are also called Dunder Methods, because of their name starting and ending with Double Underscore (Dunder).

Python Programming: A Beginner’s Guide

Python is an interpreted, high-level, powerful general-purpose programming language. You may ask, Python’s a snake right? and Why is this programming language named after it?

Hire Python Developers

Are you looking for experienced, reliable, and qualified Python developers? If yes, you have reached the right place. At **[]( "")**, our full-stack Python development services...

Python any: How to Check If Element is Iterable or Not

Python any() function returns True if any element of an iterable is True otherwise any() function returns False. The syntax is any().