Shicong Ying👨💻 | Career: Big Data Engineer |
![]() |
Work Experience: 2 years | Phone: 13005195292 | |
Date of Birth: May 1997 | Email: sc.Ying@outlook.com | |
Current Address: Shanghai(Open to Relocation Worldwide) | Hebei University of Technology, Master’s Degree in Electronic Information | |
LinkedIn Profile | GitHub Profile |
Built real-time data pipelines using Kafka, Flink, and StarRocks for image search models, improving model iteration efficiency and enhancing search accuracy and recommendation satisfaction, leading to a 3% increase in user NPS score .
Unified data pipelines for commercial ads and natural scenarios, standardized core metrics (e.g., PVR, ASN), and developed AB testing datasets and reporting models, improving analysis efficiency by 25% and increasing conversion rates by 1.5% .
Reconstructed the algorithm data warehouse using One Data theory (unified semantic layer and precise data models), collaborated with colleagues to implement a search and recommendation analysis framework, boosting BI analysis efficiency by 30% and reducing maintenance time by 10% .
Developed a Spark event log analysis tool using Python to identify data skew and resource bottlenecks, improving offline task optimization efficiency by 50% .
Participated in the migration of 500+ data processing tasks for the Galaxy big data development platform, optimized task logic, and improved task execution efficiency by 20% .
Ensured data accuracy post-migration by using SQL and Python scripts to validate and fix inconsistencies.
Supported the construction of search and recommendation algorithm data warehouse models and addressed complex business data requirements.
Developed core business metrics for dynamic posts and user behavior analysis, supporting community business growth strategies.
Designed and optimized event tracking for modules like "Hot Topics" and "Community Space," improving tracking processes.
Implemented cold data archiving strategies , reducing storage costs by 15% , and established data quality monitoring mechanisms to ensure data consistency and accuracy.
Background:
Traditional evaluation methods for image search strategies were cumbersome and unable to replicate real user experiences, leading to low evaluation accuracy and slow iteration. Additionally, the lack of an NPS feedback mechanism hindered user experience analysis and strategy optimization.
My Contributions:
Led the development of a snapshot and batch sampling platform for image search algorithms, improving strategy evaluation efficiency by 31 person-days/month and enabling fine-grained optimization for real-shot scenarios and top categories (e.g., apparel, accessories, bags, cosmetics).
Built the NPS feedback mechanism data pipeline from scratch , including event tracking design, data collection, data warehouse model development, and NPS dashboard creation, significantly enhancing user experience analysis capabilities.
Background:
The "Poizon Push" new product commercialization project required comprehensive data infrastructure, including product placement effect measurement, L2 threshold calculation, and internal operational reporting.
My Contributions:
Designed and developed 4 DWS summary tables and 15 ADS application reports based on a layered data warehouse architecture, covering product placement effects, commercialization success thresholds, and merchant data analysis, providing robust data support for commercialization operations.
Standardized core metrics (e.g., PVR, ASN) , resolving ambiguity issues and ensuring data consistency.
Optimized data pipelines and enhanced monitoring , achieving 97% SLA compliance and data availability for core reports, ensuring stable and reliable data services.
Background:
Frequent Spark task errors and performance bottlenecks in daily data warehouse operations relied on inefficient SparkUI-based troubleshooting, significantly delaying task optimization and development progress.
My Contributions:
Designed and implemented an automated log collection and parsing feature for efficient processing of Spark event logs.
Developed a task performance analysis module to identify the most time-consuming stages and provide targeted optimization suggestions.
Created an error localization module using rule-based analysis to pinpoint error causes at the stage and task level, improving task troubleshooting and optimization efficiency.
Data Warehousing & Modeling : Proficient in dimensional modeling (star/snowflake schema), designed and implemented billion-row data warehouse models; skilled in SQL for high-performance query design and optimization.
Big Data Technologies : Expertise in Hadoop, Spark, Flink, and distributed processing frameworks; experienced in PB-level data warehouse performance tuning.
Data Pipeline & Governance : Skilled in end-to-end data pipeline management (data ingestion, cleansing, transformation, storage, and monitoring); experienced in metadata management and data quality monitoring.
Programming & Tools : Proficient in Java, Python, SQL for big data and data warehouse development; skilled in MySQL and PostgreSQL for high-performance query design; familiar with CI/CD processes, Git version control, and Docker deployment.
Soft Skills : Experienced in Agile development; strong documentation and communication skills; adept at cross-functional collaboration with business, algorithm, and data science teams.