Event 4
Data Science On AWS
Date: Thursday, October 16, 2025
Time: 9:30 AM – 11:45 AM
Location: Hall Academic – FPT University
Speakers: Van Hoang Kha, Bach Doan Vuong
Role: Attendee
Event Overview
Workshop exploring how to build modern Data Science systems on AWS, from data processing to ML model deployment.
Key Topics
Introduction & AWS Data Science Pipeline (9:30 – 10:05 AM)
Demo 1: Data Processing with AWS Glue (10:05 – 10:35 AM)
IMDb Dataset Processing:
- Data discovery with Glue Crawler
- Data cleaning and transformation
- Text preprocessing and feature extraction
- Format conversion (CSV to Parquet)
- PySpark for distributed processing
Key Benefits:
- Serverless, auto-scaling
- Pay only for job execution time
- Built-in data catalog
- Integration with AWS services
Demo 2: Sentiment Analysis with SageMaker (10:35 – 11:00 AM)
ML Workflow:
- Data preparation and EDA
- Model selection and training
- Hyperparameter tuning
- Model deployment as endpoint
- Real-time inference testing
Results:
- 90%+ accuracy on test set
- Low latency predictions
- Auto-scaling for production
- Cost-effective deployment
Cloud vs On-Premise Discussion (11:00 – 11:35 AM)
Cost Comparison:
- Small projects: Cloud 60-70% cheaper
- No upfront hardware investment
- Pay-as-you-go reduces waste
Performance:
- Access to latest hardware
- Distributed computing capabilities
- Global infrastructure
Flexibility:
- Scale up/down in minutes
- No capacity planning needed
- Easy experimentation
Post-Workshop Project (11:35 – 11:45 AM)
- Build complete Data Science pipeline
- Suggested projects: Product review analyzer, stock predictor, churn prediction
- 4-week implementation timeline
Key Takeaways
AWS Data Science Services:
- S3 provides scalable, durable storage for data lakes
- Glue enables serverless ETL and data cataloging
- SageMaker offers end-to-end ML capabilities
- Managed services reduce operational complexity
Practical Skills:
- Data cleaning and transformation with Glue
- Model training and deployment with SageMaker
- Cost optimization strategies
- Cloud vs on-premise decision framework
Cloud Benefits:
- Faster time-to-market
- Lower costs for most use cases
- Superior scalability and flexibility
- Access to latest technologies
Next Steps:
- Create AWS Free Tier account
- Complete post-workshop project
- Experiment with Glue and SageMaker
- Pursue AWS ML certification
Event Photos
Rating: ⭐⭐⭐⭐⭐ (5/5)
Excellent hands-on workshop demonstrating how cloud computing transforms Data Science workflows with practical AWS services.