About Me

I am a Data Scientist, Research Associate, Software Developer. I love implementing interesting ideas and believe only by implementing them can I learn how good they are. My background is Information Retrieval (Computer Science). During my Ph.D., I studied location information, especially, social media users’ trail patterns. I love Python and most of the code I wrote for my Ph.D. projects is in Python. Sometimes I need the power from Hadoop/Spark to scale up my analysis to running on several hundreds cores.

Thoughts on data protection: Hashing!

TL;DL If you need to share a (key, value) dataset without sharing the whole key set, hash all the keys and give out only (hashed key, value) dataset.

CSV-Loader

My work requires me importing a variety tables from CSV to DBMS. However, importing CSV is not always an easy and documentable task, as every DBMS has its own way of doing the job. There are different tools, be it GUI or CUI, that can help the process but there lacks a tool generally available for the task. I needed a tool for importing CSV tables and I took this opportunity to make a general tool.

BakMan: A file oriented backup management tool.

If you ever had tons of backup files which quickly fill up all your disk space, this tool might be useful to you. Bakman is a BAcKup MANagement tool for automatically tracing unneeded backup files and generating management scripts. The core idea is to filtering backup files by a set of customizable rules. The rule file will define which files in the piles should be removed or stashed somewhere else.

Hello World From Me

I am fascinated in many things, like programming, physics, mathematics, music. These all have intrinsic patterns that render the beauty of the master piece. I really enjoy finding the patterns whenever possible.