Logging and Monitoring Operations Engineer
CHICAGO-60602, IL, US
09/27/2018
-
Required Skills
Powershell, interpersonal skills
Company
Infinity Consulting Solutions, Inc
Experience
4 to 6 Year(s)
Job Description
Logging and Monitoring Operations Engineer
ICS is partnered with a Fortune 500 financial services organization in Chicago seeking a Logging and Monitoring Operations Engineer.
This Engineer will be responsible for leading an offshore team regarding enterprise logging and monitoring efforts.
They will improve operational stability, reduce the risk of experimentation, and increase overall function of technology assets by providing robust, comprehensive logging, monitoring, and notification solutions.
Oversee the technical implementation of the Enterprise Logging and Monitoring effort outlined below
Translate strategic monitoring goals into tactical plans, then execute them within a team
Build, maintain, and operate tools comprising the Enterprise Monitoring and Logging system to provide visibility into the operation of technology assets
Meet the monitoring, logging, and outage notification needs of system, application, and business owners
Work with teams to onboard specific applications with useful metrics and measures
Integrate logically-connected metrics to provide higher-level awareness of overall health
Assist in discovering root level problems not clearly visible from their initial impacts
Mature overall organizational awareness and increase incident response capability
Enable more efficient usage of development and operations staff time
Emphasize collaboration and automation
Become a force-multiplier within the organization by unlocking and sharing new capabilities
Relentless pursuit of process improvement
Primary Accountabilities/Responsibilities:
Work with leadership and colleagues to define and modify Logging & Monitoring strategy
Translate strategy into an actionable, tactical plan to accomplish high-level goals
Mentor and guide team members and colleagues in Logging & Monitoring tactics and operations
Make high-level decisions and perform low-level technical configurations to build and maintain a global monitoring system
Operate, maintain, and expand monitoring tools
Follow and execute change management procedures
Develop and leverage new technologies to improve IT situational awareness
Work with development teams, system owners, application owners, and business stakeholders to identify and monitor important infrastructure and business systems
Create methods to detect errors and outages
Create notifications to appropriate groups when issues occur
Provide expertise, tools, and assistance to operations, development, and support teams for monitoring IT systems, infrastructure, applications, tools, processes and tasks
Collaborate with support teams and business partners to ensure our business is operating and detect as quickly as possible if anything goes wrong
Collaborate with DevOps to automate monitoring capability as part of building any new project or system
Collaborate with IT Infrastructure to develop a comprehensive window into the global operation
Actively seek out improvements and solutions in IT operational awareness
Future expansion of this role may include opportunities for team leadership and management
Job Requirements:
Bachelor's degree in Computer Science or related field experience preferred
Must be authorized to work in the US for any employer
4-6 years of experience in Enterprise IT efforts to build, maintain, and deploy large-scale infrastructure or development projects, including:
2-4 years of direct experience working with monitoring, logging, or telemetry software such as: Splunk, Zabbix, Nagios, Solarwinds, SCOM, Pingdom, Graylog, LogEntries, metricBeats, Elastic/ELK, Grafana, NXLog, EventTracker, Prometheus, DataDog, PagerDuty, AlertOps, OpsGenie, or others
Experience working both independently and in a team-oriented, collaborative environment is essential
Demonstrated ability to conform to shifting priorities, demands and timelines through analytical and problem-solving capabilities
Ability to remain flexible during times of change and react to project adjustments and alterations promptly, efficiently and positively
Strong written and oral communication skills
Strong interpersonal skills
Must be able to learn, understand and apply new technologies
Strong customer orientation • Excellent analytical and problem-solving capability
Ability to effectively prioritize and execute tasks in a high-pressure environment is crucial
Ability to influence colleagues and communicate effectively across all levels of the organization
Ability to manage multiple projects and work effectively under time constraints as necessary
Excellent verbal, written and relationship skills used to interact with a global group of
technical and non-technical people
Attention to detail is a must
Ideal candidate will have the following additional experience:
Championing and driving an organization's logging and monitoring strategy
Implementing large-scale monitoring projects
Utilizing configuration as code or other strategies to bake monitoring into infrastructure at the earliest stages of implementation
Automating management, configuration, or other tasks for consistency and reliability
Scripting using Powershell or Bash
Software development, particularly in .NET or .NET Core
Using git for version control of software or scripts
Experience on or with a NoC-style 24-hour monitoring and response team
Creation of runbooks for controlled responses to incidents, errors, or problems
Disaster Recovery planning
Ability to speak to the applicability and potential value of the following concepts:
DevOps, Continuous Integration, Continuous Delivery, Configuration as Code, Cattle not Pets, Customer Value Stream, Iterative Improvement
Operations Manager
Information Technology
No Preference
FullTime Job
Other
1
Candidate Requirements
-
Bachelors
Walkin Information
-
-
-
Recruiter Details
Doug Klares
1350 Broadway, Suite 2205,
NEW YORK-10018, NY,
US
-