Table of Contents
- Upgrade and Fix tvb-gdist C++ Library
- 1. Contact Details
- 2. Project details
- 4. References
Upgrade and Fix tvb-gdist C++ Library
1. Contact Details
- Full name: Ayan Banerjee
- Email:
- Primary: ayanbn7@gmail.com
- Secondary: ayanbanerjee7777@gmail.com
- Location: Durgapur, India
- Blog: https://ayanb.me/blog
- Hangouts ID: ayanbn7@gmail.com
2. Project details
2.1. Project synopsis/summary: What is the project about?
The project is about upgrading and fixing tvb_geodesic
library which is used to calculate geodesic distance on cortical surfaces.
2.2. Why is it important?
In the mathematical simulation of the brain, geodesic distance is of utmost importance. Geodesic distance is used instead of Euclidean distance while running simulation on cortical surfaces. Thus, it is important to calculate the distance properly.
2.3. Project in Detail
While running simulation on cortical surfaces we need to calculate geodesic distance as opposed to Euclidean distance due to the shape of the cortical surface. The virtual brain uses geodesic_library
for this calculation. The library implements the original paper in C++. The original source code can be found in the Google Code Archive: https://code.google.com/archive/p/geodesic. tvb_geodesic
repository implements a cython wrapper on top of the C++ code which then is released to Pypi (tvb-gdist) and conda-forge (Tvb Gdist).
However, the code is now outdated and users have reported various issues. In this project, we aim to update the code and fix those issues.
2.4. How will you handle the project? Detailed description of your planned approach
2.4.1. Implementation
The project can be split into various parts and they are discussed below:
-
Making the Geodesic Library compatible with C++ 17
Some of the possible improvements:
- Remove the use of deprecated
auto_ptr
and replace it byunique_ptr
:auto_ptr
was deprecated in C++ 11 in favor ofunique_ptr
and will be deleted in a future version of C++. There are a number of instances whereauto_ptr
is used in the code. We need to replace it byunique_ptr
and test the code. - Include
cstring
wherememcpy
is used:cstring
needs to be included while usingmemcpy
. This is to be done for the filegeodesic_algorithm_exact.h
.
-
Fix warning while running setup.py install:
There are various warnings that come up while running python setup.py install, for example:
FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: tvb-geodesic/gdist.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
warning: gdist.pyx:277:47: local variable 'distance' referenced before assignment
warning: gdist.pyx:279:16: local variable 'distance' referenced before assignment
warning: gdist.pyx:279:51: local variable 'distance' referenced before assignment
warning: gdist.pyx:282:28: local variable 'distance' referenced before assignment
Fixing these warnings should be fairly straightforward.
-
Python 2 has reached EOL (end-of-life) in January 2020. We should update this package to be Python 3 compatible to make it future-proof.
-
Installation problem caused by numpy
The warnings that are coming while installing numpy is due to the usage of deprecated apis. This issue will be fixed after upgrading the
geodesic_library
to C++ 17. -
In order to fix these issues, we will first try the inputs on the C++
geodesic_library
itself and if the output is expected, we will then move forward to fix the cython wrapper.We will write at least one unit-test which is failing with the current code and should pass when this step is complete.
-
This is an old task, based on cython dependency status at the moment of reporting. We should first analyze if this is still valid.
Also, we can create simpler bindings, a small C++ library with a C API for using the geodesic functions. After that, we can create object files (with
.o
extension) and shared object files (with.so
extension) and then use it from Python usingctypes
module instead ofcython
. The resulting shared library can be built per platform into wheels and then put on the PyPI and conda-forge. -
We can use OpenMP with Cython to speed up the program and then test it to make sure everything works expectedly.
Here we will need at least one performance test, to prove that in parallel mode, it actually works faster. We should be able to increase the number of threads, and see how it scales.
-
Release to Pypi and Conda-forge
The TVB team already has a Jenkins build used for the automatic building of PyPi packages. The work in this project will stay compatible with those builds, and update the recipes, to be actual.
2.4.2. Testing
- The C++ library will have some basic test cases to make sure that any future API changes produce correct outputs.
- All the functions will have their own unit tests.
- We can use pytests to test the python code.
- Tests should run automatically on the Jenkins builds, on windows and Unix OS.
2.5. Project implementation and timeline:
2.5.1. Minimal set of deliverables
- Fix the 6 issues
- Make
geodesic_library
C++ 17 compatible - Update the package in conda-forge and pypi
2.5.2. Additional ‘if time allows’ deliverables
- Blog explaining the work done
- Auto-publishing whenever a new tag is added to the GitHub repo
2.5.3. Detailed timeline
Phase | Date | Tasks |
Community Bonding: May 4 - Jun 1 | May 4 - Jun 1 |
|
Phase 1: Jun 1 - Jul 3 | Week 1: Jun 1 - Jun 7 |
|
Week 2: Jun 8 - Jun 14 |
|
|
Week 3: Jun 15 - Jun 21 |
|
|
Week 4: Jun 22 - Jun 28 |
|
|
Jun 29 - Jul 3 | Phase 1 Evaluation | |
Phase 2: Jul 6 - Jul 31 | Week 5 - 6: Jul 6 - Jul 19 |
|
Week 7: Jul 20 - Jul 26 |
|
|
Jul 27 - Jul 31 | Phase 2 Evaluation | |
Phase 3: Aug 3 - Aug 31 | Week 8: Aug 3 - Aug 9 |
|
Week 9: Aug 10 - Aug 16 |
|
|
Week 10: Aug 17 - Aug 23 |
|
|
Aug 24 - Aug 31 | Final Evaluation | |
Post - GSoC | - | Maintain the package, fix bugs and implement feature requests |
2.5.4. Plan for communication with mentors: How will you and the mentors keep in contact?
I will keep in touch with mentors via Slack. I will also be having video chat via Google Hangouts with my mentors every week. I will submit a weekly report to the mentors in a shared Google Doc/GitHub Gist.
3. Candidate details
3.1. Motivation
For a while I wanted to see how C/C++ code can improve the performance of Python code. This project will give me the opportunity to do that.
Last summer, I worked on bioinformatics and my work is currently used by thousands of bioinformatics users worldwide. The virtual brain is also used by tens of thousands of users worldwide. Impacting thousands of virtual brain users’ work is an opportunity I would not want to miss!
3.2. Match
I have been coding in C++ and in Python for the last 3 years. I have also taken courses in data structures whose knowledge will be beneficial while fixing/updating the C++ geodesic library. Last year I was also a Google Summer of Code student under the UCSC Xena organization where I worked in a bioinformatics project. Before that I contributed to the organization coala for around 1.5 years (I have added more details on my open source experience below). I hope that these open-source experiences will help me to successfully complete the current project.
I am the maintainer of 6 projects in Pypi and also familiar with conda-forge.
I have also made some contributions to the INCF.
- the-virtual-brain/tvb-geodesic#25
- the-virtual-brain/tvb-geodesic#26
- the-virtual-brain/tvb-geodesic#27
- the-virtual-brain/tvb-geodesic#28
- pyxnat/pyxnat#127
3.3. Is this the only project that you will apply for?
I am also applying for this project.
3.4. Working time and commitments - will you be working full time on this?
My end semester examination ends in the first week of May. I have no other commitments this summer. Thus, I will be able to work full time on the project.
I will be able to devote approx 40 hours every week during the summer.
3.5. Do you have any other plans for the summer (school work, another job, planned vacation)?
No, I have no other plans for the summer.
3.6. Past experience
I was introduced to open source by my university club and since then I have been mesmerized by the impact open source software has made to our day-to-day life.
My first contribution to open source was to the organization coala, which is a linting tool built for most of the programming languages. I have started contributing there from September 2018 and in the process got elevated to the team of main developers at coala. I have submitted and reviewed various PRs, opened and reviewed many issues. Contributing to coala was a great learning experience, it prepared me in Git, Code Coverage and Open Source etiquette.
I have also been the Google Code-In 2018 mentor for coala where I guided many pre-university students to get them started with open source development. I also maintained 2 repositories for coala during that period. I mainly handled documentation (sphinx), moban, coAST, and artwork tasks.
I have also contributed to the moremoban organization for the moban and yehua project. “moban” is a CLI for static text generation and “yehua” is used to generate a Python package quickly. I made two plugins (https://github.com/moremoban/moban-velocity and https://github.com/moremoban/moban-haml) for the moremoban and released them to PyPi. I am a collaborator of 4 repositories in the organization.
My biggest contribution in open source comes in the form of being a Google Summer of Code developer at UCSC Xena. UCSC Xena builds the Xena browser, which allows users to explore functional genomic data sets for correlations between genomic and/or phenomic data. I worked on building an ETL pipeline for the browser which takes data from GDC and creates datasets that are digestible to the Xena browser. The report can be found here: https://github.com/ucscXena/ucsc-xena-client/wiki/Update-GDC-Data-Ingestion-Pipeline-and-Run.
In the 2018 summer, I was also an intern at IIT Bombay, where I worked on integrating a plagiarism detection tool with yaksh, a course, and coding test-taking web app. In my free time, I also love to make tiny useful projects. All of them can be found in my GitHub profile.
3.7. CV
My resume can be found here https://ayanb.me/profile.
4. References
- Original paper implementing geodesic distance: THE DISCRETE GEODESIC PROBLEM
- Wrapping C++ in Cython: https://cython.readthedocs.io/en/latest/src/userguide/wrapping_CPlusPlus.html