• About Luis Falcon
  • Cookies & GDPR Privacy Policy

MeanMicio

~ Medicine. Open Science. Animal Rights.

MeanMicio

Monthly Archives: May 2025

Parallel and distributed computing in GNU Health

27 Tuesday May 2025

Posted by Luis Falcon in GNU Health, HMIS, tryton

≈ Comments Off on Parallel and distributed computing in GNU Health

Tags

distributed computing, ehealth, federation, free software, gnu, GNU Health, GNU Health Federation, GNUHealth, parallel computing, performance, technology, thalamus, tryton, wordpress

When it comes to large volume of data management, health in general and health informatics in particular are in the top of the list. In this post I’d like to bring the attention on how we can create scalable models in GNU Health using parallel and distributed computing methods.

In the old days – and even today – large areas of the hospitals are dedicated exclusively to store patient medical records. Thousands of charts that make millions of pages.

A medical record officer pulls out a patient chart (source: https://catalog.archives.gov/id/6374585)

The advent of Hospital information systems (HIS) and Electronic Medical Records (EMR) are transforming those paper based records into bits and bytes. The GNU Health Hospital and Health Information System is one example.

GNU Health has many areas that involves loading, processing, searching and transforming large sets of data. Here are some examples that we use in GNU Health daily:

  • Demographics: Individual identification means, gender, addresses, occupations, domiciliary units, insurances, health professionals, institutions
  • Medical records: Patient evaluations, hospitalizations, laboratory and medical imaging orders, prescriptions, medication
  • Coding standards: Datasets that involve coding standards for interventions, procedures (ICPM, ICHI, ICPM..), pathology, health conditions (ICD10, ICD11..)
  • Genomics: Very large datasets involving DNA sequencing, natural variants, genes, …
  • Epidemiology: Statistics are key in early warning systems, outbreak detention and health promotion, disease prevention programs. Those reports can involve massive amount of data to be processed.

I would like to stress the importance of a good parallel or distributed computing model for maximum scalability and performance. One of the main problems is that we have the tendency to emulate in computing our linear lives. The society in which we live in make our daily activities are a set of sequential chronological (dull) tasks (wake up -> bathroom -> breakfast -> work -> […] -> dinner -> sleep) put into a loop.

Designing and Building Parallel Programs by Ian Foster. A great book I bought in 1995 for my Parallel and Distributed system class in computer science. The concepts are still very well alive and it’s part of my bookshelf.

Think parallel. Instead of that, I’d like to think in terms of how our body systems work internally. From the macroscopic organs to the minute hormones and neurons, working simultaneously in beautiful synchrony to maintain homeostasis, the internal equilibrium that keep us alive and well. It would be impossible to make a linear, sequential loop to process the events happening in a single second of our lives. Parallel processing makes the miracle. All the “workers”, “processes” and their signaling (“IPC” interprocess communication in computer science terms) make it happen.

A real life example: If we don’t do a good design, the project will not scale. Maybe, at the beginning, with a few records, our system will perform ok. With time, our database will become larger and if initially we had one hundred patients, and all of the sudden, we have reached 1 million. Each person and patient in that million population set has their own medical record, demographic history, lab tests.. you get the idea… doing analytic reporting, exporting or importing data will not scale if we don’t have a good design.

The following is a real life example that involved the migration to the latest version of GNU Health HIS of our community server. checking and syncing the values stored on the datasets residing on the filesystem (for instance, updating to the latest version of the UniProt human genes natural variants) with those in the database. In total, we had near 150,000 records to sync. GNU Health HIS uses Tryton, a great Free/Libre framework on top of Python and PostgreSQL. What it might seem a trivial task, it’s not. When we increase the verbosity, syncing each record involve a lot of tasks such as login in, checking user permissions on the model, status of the record, verify that it was not changed after the last update, etc.. If we had 100 records, we may afford linear processing. With a set of 150K, we must look for a parallel computing solution.

I have experienced similar situations when we have to migrate the medical records from another system to GNU Health. The initial batch input upload might contain thousands / millions of records. Making a good parallel model design will transform days into hours, hours into minutes and minutes into seconds.

Processing time of syncing a set of 500 records comparing and updating the values from the filesystem and the current database record. We compared the time using a sequential loop (first bar), the 8 processes corresponding to the (second bar) and finally 16 processes. The best result was achieved using eight processes (90 seconds). Sequential loop had the worst performance (318 seconds), followed by 16-parallel processes (97 seconds). I used the Proteus library for Tryton 7.0 and the Python 3.13 Multiprocessing package. The test was done on my small laptop running Void Linux, PostgreSQL 16, linux kernel 6.12. Hardware: 12GB of RAM and Intel i7 Thinkpad (8-core)

The GNU Health Federation: Distributed computing for large health networks.

The GNU Health Federation is another example of how to create scalable systems in health. In this case, instead of using multiple processes within a single computer, we are setting multiple “workers” that we call nodes across a province, country or region. A node can be an individual using MyGNUHealth personal health record, a laboratory or a hospital. Each of them work independently and they can communicate via the network. Data aggregation and reporting will happen at the GNUHealth Health Information System server, a special, document-oriented PostreSQL database.

Diagram of Thalamus, the GNU Health Federation message server and the different nodes that make a distributed health network

Summary: Make a big problem small. Think parallel.

In the end, whether you use multiple processes in the same computer or make different nodes in the health network, the concept is pretty much the same. Make a big problem small. The PCAM design methodology is a great start. PCAM stands for Partition, Communicate, Agglomerate and Map. Decompose the initial problem in smaller domains (data) and functional (computational) units, design the way they talk to each other, combine (agglomerate) the tasks and finally map those tasks to processors.

It is also important to know your resources so you can dimension and design the solution to the problem. For instance, in the sync data example, we can see that spawning too many processes will yield in a degraded system. We have saturated our resources and the system spends more time waiting for I/O or trying to make the processes communicate to each other. You may then use use processes, threads or even distributed computing, which are different implementation methods to fit the context and your resources.

Conclusion: As a final thought, I’d like to make emphasis not in the computing power, but in the power of open science and solidarity as a community. Computers can definitely help us achieve our goals, but the most efficient parallel / distributed model resides in the human factor. Today we are living in unjust a world ruled by a very few yet very powerful people and corporations. Concentration of power and computational resources will only benefit a few, creating more inequality and social gradient. Humanity is reaching a new low and we can not normalize the killing of thousands of innocent children that is happening in front of our very eyes. We can not permit our governments prioritizing the macabre business of war instead of the human rights flag. The scientific community must rise up and organize for peace, social justice and equity in our society.

Open science, cooperation, solidarity and empathy are they key to success to any problem, no matter how big they may be.

Happy hacking

Recent Posts

  • Gracias, India
  • Parallel and distributed computing in GNU Health
  • El cambio de horario es un atentado contra la salud. Rechazarlo es la mejor medicina
  • Tu finca, su infierno
  • La Cátedra de Animales y Sociedad de la URJC: Un referente de empatía y respeto hacia los animales.

Archives

  • September 2025
  • May 2025
  • March 2025
  • February 2025
  • May 2024
  • July 2023
  • March 2023
  • February 2023
  • January 2023
  • October 2022
  • August 2022
  • April 2022
  • February 2022
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • March 2021
  • December 2020
  • September 2020
  • July 2020
  • May 2020
  • March 2020
  • February 2020
  • November 2019
  • October 2019
  • June 2019
  • April 2019
  • May 2018
  • November 2017
  • October 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • May 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • October 2014
  • September 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • February 2009
  • November 2008
  • October 2008
  • September 2008
  • December 2007
  • October 2006

Categories

  • #FHIR
  • animal liberation
  • animal rights
  • embedded
  • events
  • gnu
  • GNU Health
  • GNU solidario
  • HMIS
  • KDE
  • Libre Software
  • LIMS
  • medical
  • MyGNUHealth
  • Public Health
  • thalamus
  • tryton
  • Uncategorized

Meta

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Blog at WordPress.com.

  • Subscribe Subscribed
    • MeanMicio
    • Already have a WordPress.com account? Log in now.
    • MeanMicio
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

You must be logged in to post a comment.