I have been privileged to serve as a teacher in the Master Data Science for Economics and Business program at Strasbourg since 2020. In this capacity, I passionately contribute to the academic journey of students, guiding them through the intricacies of data science and its applications in economics and business domains. Currently, I teach two core courses:
- Advanced Programming: This course delves into the advanced aspects of programming, equipping students with the skills to tackle complex coding challenges and implement innovative solutions.
- SQL and NoSQL: Focused on databases, this course explores the structured query language (SQL) and non-relational databases (NoSQL), providing students with a solid foundation in database management and manipulation.
Please refer to the syllabus provided below for detailed information on each course.
SQL/NoSQL
>
The primary objective of this course is to equip students with the knowledge and skills required to store and process non-relational data. The choice of storage system is contingent on factors such as data size and specific problem contexts, guiding students to select appropriate formats for their non-relational data. Throughout the course, Python will serve as the primary programming language. The syllabus begins with an exploration of basic unstructured formats, including JSON, XML, and dictionaries. Subsequently, students delve into the study of leading NoSQL databases such as MongoDB and Neo4j. The course concludes with a brief overview of other database alternatives.
The course adopts a non-exhaustive approach, acknowledging the vastness of programming knowledge. Students will encounter novel challenges, prompting them to explore solutions independently. Resources like Stack Overflow, Stack Exchange, Quora, YouTube, and GitHub are emphasized as valuable tools for problem-solving. In instances where independent research proves insufficient, students are encouraged to reach out for guidance. However, the expectation is that students take responsibility for their coding tasks rather than requesting direct code solutions. The course encourages a collaborative learning environment, fostering both student-to-instructor and peer-to-peer knowledge exchange.
Grading and Evaluation
Final Exam
Content: Theoretical questions and a case study with a Python code component for interpretation and commentary (details to be determined).
Dossier
Individual work involving the completion of "todo" tasks within each chapter. These tasks serve as a bonus/malus towards the overall grade, emphasizing participation and engagement.
Group work at the end of Chapters III and IV, where students choose and submit specific homework assignments. This collaborative effort contributes significantly to the final grade.
Recommended Tool: Jupyter Notebook.
Resources
Students are encouraged to explore additional resources for enhancing their understanding:
- W3Schools Python MongoDB Tutorial
- YouTube Tutorial on Python MongoDB
Advanced Programming
>
The aim of this course is to teach students a solid set of programming tools to tackle real-world problems, especially in the context of big data. Answering complex questions may require cross-referencing multiple web sources, necessitating skills in fetching information (API/web scraping), cleaning diverse data formats (xml/json/pdf), feeding data to databases (SQL/noSQL), and utilizing available resources (CPU/GPU, cloud/notebook) for analysis (descriptive statistics, algorithms). Python will be the primary programming language throughout the course. Starting with a review of last year's programming concepts, including lists, dictionaries, functions, numpy, pandas, beautiful soup, and regex, the course swiftly progresses to scraping libraries and API usage, allowing students to dive into their projects early (see Grading for details). Advanced concepts such as Coocurrence, OOP, Parallelize, Decorators, Versioning, and an introduction to GPU programming and machine learning libraries will be covered in the later stages of the course. The objective is to provide insights into the entire process mentioned above.
The course is not intended to be exhaustive, acknowledging the vastness of programming knowledge. Students will encounter challenges beyond the scope of the course, prompting them to engage in independent problem-solving. Platforms like Stack Overflow, Stack Exchange, Quora, YouTube, and GitHub are recommended as valuable resources for self-guided research. In cases where independent research falls short, students are welcome to seek guidance. However, the expectation is that students take ownership of their coding tasks, avoiding direct requests for code solutions. The course promotes a collaborative learning environment, encouraging both student-to-instructor and peer-to-peer knowledge exchange.
Grading and Evaluation
Final Exam
Content: Theoretical questions and a case study with a Python code for interpretation and commentary. Details to be determined.
Project = Oral + Dossier
You will have a project to do, with a free-choice subject. The project structure should be uploaded to your GitHub following this format:
- README.md (explain your project, how to run it, issues faced, key results, etc.)
- scripts (folder containing all functions and tools used for the analysis)
- Results.ipynb (Jupyter notebook running the main analysis and explaining project results)
While this structure is not mandatory and may vary based on your project, a clean and organized approach is encouraged (structure evaluation is part of the assessment).
The entire GitHub repository will be considered as the Dossier.
For the oral, you can use the notebook or create a PowerPoint presentation. The oral will involve a 10-15 minute presentation followed by around 5 minutes of questions.
At the end of each chapter, there will be optional todos. Although not mandatory (emphasis is on the projects), they serve as additional practice if desired.
Resources
- Ramalho L. Fluent python: Clear, concise, and effective programming. O’Reilly Media, Inc.; 2015 Jul 30.
- Mitchell R. Web scraping with Python: Collecting more data from the modern web. O’Reilly Media, Inc.; 2018 Mar 21.
- Géron, A., 2019. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O'Reilly Media.
- [PythonProgramming.net](https://pythonprogramming.net/introduction-intermediate-python-tutorial/)
- [Real Python](https://realpython.com/)