Event 4

Data Science On AWS

Event Information

Date: Thursday, October 16, 2025
Time: 9:30 AM – 11:45 AM
Location: Hall Academic – FPT University
Speakers: Van Hoang Kha, Bach Doan Vuong
Role: Attendee


Event Overview

Workshop exploring how to build modern Data Science systems on AWS, from data processing to ML model deployment.

Key Topics

Introduction & AWS Data Science Pipeline (9:30 – 10:05 AM)

  • Why Cloud for Data Science:

    • Optimize performance and reduce costs
    • Scale flexibly with demand
    • Access to latest hardware (GPUs, TPUs)
    • Pay-as-you-go pricing model
  • AWS Data Science Ecosystem:

    • Amazon S3: Scalable data lake storage
    • AWS Glue: Serverless ETL service
    • Amazon SageMaker: End-to-end ML platform

Demo 1: Data Processing with AWS Glue (10:05 – 10:35 AM)

  • IMDb Dataset Processing:

    • Data discovery with Glue Crawler
    • Data cleaning and transformation
    • Text preprocessing and feature extraction
    • Format conversion (CSV to Parquet)
    • PySpark for distributed processing
  • Key Benefits:

    • Serverless, auto-scaling
    • Pay only for job execution time
    • Built-in data catalog
    • Integration with AWS services

Demo 2: Sentiment Analysis with SageMaker (10:35 – 11:00 AM)

  • ML Workflow:

    • Data preparation and EDA
    • Model selection and training
    • Hyperparameter tuning
    • Model deployment as endpoint
    • Real-time inference testing
  • Results:

    • 90%+ accuracy on test set
    • Low latency predictions
    • Auto-scaling for production
    • Cost-effective deployment

Cloud vs On-Premise Discussion (11:00 – 11:35 AM)

  • Cost Comparison:

    • Small projects: Cloud 60-70% cheaper
    • No upfront hardware investment
    • Pay-as-you-go reduces waste
  • Performance:

    • Access to latest hardware
    • Distributed computing capabilities
    • Global infrastructure
  • Flexibility:

    • Scale up/down in minutes
    • No capacity planning needed
    • Easy experimentation

Post-Workshop Project (11:35 – 11:45 AM)

  • Build complete Data Science pipeline
  • Suggested projects: Product review analyzer, stock predictor, churn prediction
  • 4-week implementation timeline

Key Takeaways

AWS Data Science Services:

  • S3 provides scalable, durable storage for data lakes
  • Glue enables serverless ETL and data cataloging
  • SageMaker offers end-to-end ML capabilities
  • Managed services reduce operational complexity

Practical Skills:

  • Data cleaning and transformation with Glue
  • Model training and deployment with SageMaker
  • Cost optimization strategies
  • Cloud vs on-premise decision framework

Cloud Benefits:

  • Faster time-to-market
  • Lower costs for most use cases
  • Superior scalability and flexibility
  • Access to latest technologies

Next Steps:

  • Create AWS Free Tier account
  • Complete post-workshop project
  • Experiment with Glue and SageMaker
  • Pursue AWS ML certification

Event Photos


Rating: ⭐⭐⭐⭐⭐ (5/5)

Excellent hands-on workshop demonstrating how cloud computing transforms Data Science workflows with practical AWS services.