DEGREE: BSc Computer Science and Digitisation
Module: Big Data Analytics using AI
Assignment Title:
Assignment Type: Report
Word Limit: 3000 words (+/- 300)
Weighting: 100%
Issue Date: 4/9/2025
Submission Date: 30/9/2025
Feedback Date: 21/10/2025
Plagiarism:
When submitting work for assessment, students should be aware of the InterActive/Canvas guidance and regulations in concerning plagiarism. All submissions should be your own, original work. Please note that you must not submit the same assignment for two different modules within your course.
You must submit an electronic copy of your work. Your submission will be electronically checked.
Harvard Referencing:
The Harvard Referencing System must be used. The Wikipedia, UKEssays.com or similar websites must not be used or referenced in your work.
Student signature: ______________________ Date: _______________
SmartCity UrbanMobilityAnalysisusingHadoopand Predictive AI
Introduction
Phase 2: Implementation & Analysis (LO 3) – 50% of Total Grad
LO1. Demonstrate the understanding of basic concepts of Big Data, its importance and need in business context.
LO2. Explain the various components of Hadoop and HFDS along with their role in the Big Data ecosystem.
LO3. Summarize the learning on Big Data analytics using Yarn, HDFS and MapReduce.
Assessment Criteria: Weighting 100%
3000 words
Task Description:
You are a Data Engineer tasked with designing and implementing a proof-of-concept Big Data analytics solution for a city’s transport authority.
Scenario:
The city council wants to analyse urban mobility patterns using data from road sensors, taxi trips, or public transit records. The objective is to identify congestion hotspots, understand their causes, and predict future traffic patterns to enable proactive traffic management and better infrastructure planning.
Phase 1: Conceptual Design & Architecture (LO 1, LO 2) – 20% of Total Grade
The goal of this assignment is to provide you with hands-on experience in designing and implementing a Big Data analytics solution that incorporates a predictive AI component. You will address a hypothetical smart city challenge by using the Hadoop ecosystem to process large-scale data and derive actionable insights for urban planning. This assignment requires you to design a solution using Hadoop, HDFS, YARN, and MapReduce to analyse transportation data. The final step involves using the processed data to train a simple predictive model, thereby connecting Big Data processing with AI applications. This will help you understand the end-to-end pipeline from raw data to business intelligence in a modern context.
Learning Outcomes:
1. Business Context and Problem Statement (5%)
• Describe the smart city scenario, focusing on the challenges of urban mobility.
• Define a clear problem statement.
• Explain how solving this problem provides tangible value to the city.
2. Hadoop Ecosystem and Architecture (15%)
• Explain why a Big Data approach is necessary for this scenario.
• Identify the roles of HDFS, YARN, and MapReduce in your proposed solution.
• Justify your choice of these components for the defined problem.
• Create a clear architectural diagram illustrating how data flows from source to HDFS, is processed by MapReduce managed by YARN, and is then used for analysis.
Submission Guidelines:
1. Data Acquisition & Preparation (5%)
• Select a suitable public dataset representing urban mobility.
• Describe the dataset’s structure, size, and key attributes relevant to your problem statement.
2. Hadoop Environment and Data Ingestion (10%)
• Set up a local single-node Hadoop cluster.
• Document the key steps of your setup process.
• Load your chosen dataset into HDFS.
3. Data Processing with MapReduce (20%)
• Write a MapReduce program in Java or Python to process the data.
• Perform data cleaning and feature engineering.
• Explain the logic of your Mapper and Reducer classes.
4. Predictive Analysis and Visualization (15%)
• Export the processed data from HDFS.
• Use the processed data to train a simple predictive model.
• Analyze and interpret the output.
• Create meaningful visualizations.
Phase 3: Reflection and Documentation (LO 1, LO 2, LO 3) – 30% of Total Grade
1. Critical Reflection (10%)
• Reflect on the key challenges encountered.
• Discuss performance and scalability.
2. Final Report Documentation (20%)
• Compile a detailed report of no more than 3000 words.
• Ensure proper structure and academic language.
• Include diagrams, code snippets, commands, and visualizations.
• Include a bibliography using Harvard referencing style.
Document Format:
Submit your assignment as a single document following the BSBI assignment template provided in Canvas.
Writing Quality:
Ensure clear and concise writing with proper grammar and spelling.
Visuals:
Include diagrams, tables, and graphs where appropriate.
Task Coverage:
Address each part thoroughly.
Implementation Details:
Provide relevant examples including code snippets and commands.
Referencing Style:
Use Harvard referencing style.
Discussion:
Discuss findings, insights, and implications. Reflect on challenges.
Submission:
Submit your assignment electronically via Canvas.
GUIDANCE ON ASSESSMENT
All materials must be properly referenced under Harvard conventions. The length required is 3000 words with tasks equally weighted. The writing style should be formal academic/report writing style with in-text referencing to support your comments and observations. Originality, quality of argument and good structure are required. The report should demonstrate sound understanding and ability to apply knowledge and theory.
Grading Criteria:
Knowledge of contexts, concepts, technologies and processes.
Understanding through application of knowledge.
Application of technical and professional skills.